Labels and Roots 9781501502118, 9781501510588

This volume provides in-depth exploration of the issues of labeling and roots, with a balance of empirical and conceptua

173 61 4MB

English Pages 299 [300] Year 2017

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Contents
List of contributors
Introduction
Merge, labeling and their interactions
Merge(X,Y) = {X,Y}
Features and labeling: Label-driven movement
Labeling and other syntactic operations
Is Transfer strong enough to affect labels?
Clausal arguments as syntactic satellites: A reappraisal
A labelling-based account of the Head-Final Filter
The structural configurations of root categorization
How Unlabelled Nodes work: Morphological derivations and the subcomponents of UG operations
Exocentric root declaratives: Evidence from V2
Index
Recommend Papers

Labels and Roots
 9781501502118, 9781501510588

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Leah Bauke and Andreas Blümel (Eds.) Labels and Roots

Studies in Generative Grammar

Editors Norbert Corver Harry van der Hulst Roumyana Pancheva Founding editors Jan Koster Henk van Riemsdijk

Volume 128

Labels and Roots

Edited by Leah Bauke Andreas Blümel

ISBN 978-1-5015-1058-8 e-ISBN (PDF) 978-1-5015-0211-8 e-ISBN (EPUB) 978-1-5015-0213-2 ISSN 0167-4331 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2017 Walter de Gruyter, Inc., Boston/Berlin Typesetting: Compuscript Ltd., Shannon, Ireland Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany

www.degruyter.com

Contents List of contributors Introduction

vii

1

Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely Merge, labeling and their interactions 17 Chris Collins Merge(X,Y) = {X,Y}

47

Aleksandra Vercauteren Features and labeling: Label-driven movement Petr Biskup Labeling and other syntactic operations

69

91

Miki Obata Is Transfer strong enough to affect labels?

117

Dennis Ott Clausal arguments as syntactic satellites: A reappraisal Michelle Sheehan A labelling-based account of the Head-Final Filter

127

161

Artemis Alexiadou and Terje Lohndal The structural configurations of root categorization

203

Leah S Bauke and Tom Roeper How Unlabelled Nodes work: Morphological derivations and the subcomponents of UG operations 233 Andreas Blümel Exocentric root declaratives: Evidence from V2 Index

291

263

List of contributors Artemis Alexiadou Humboldt Universität zu Berlin

Miki Obata Hosei University

Leah S Bauke Bergische Universität Wuppertal

Dennis Ott University of Ottawa

Petr Biskup Universität Leipzig

Tom Roeper University of Massachusetts – Amherst

Andreas Blümel Georg-August-Universität Göttingen

T. Daniel Seely Eastern Michigan University

Chris Collins New York University

Michelle Sheehan Anglia Ruskin University

Samuel David Epstein University of Michigan

Aleksandra Vercauteren Ghent University

Hisatsugu Kitahara Keio University Terje Lohndal Norwegian University of Science and Technology UiT The Arctic University of Norway

Introduction In syntactic and morphological theory, complex structures are assumed to be composed of subparts: A = [B C]. If one member of A is a lexical item and contributes to its distribution, its interactions with other objects, etc., we can refer to this member as A’s head. Recent approaches (Collins 2002, Chomsky 2013, 2015) seek to derive endocentricity of phrases (Chomsky 1970, 1986; Jackendoff 1977) by a general labeling algorithm, delving more deeply into the role syntactic category values (v, n, C, T, etc.) play in syntax and in interpretation. A related branch of research attempts to determine semantic and syntactic contributions lexical categories make for the combinatorial system, as in the study of argument structure. How do we account for the morphosyntactic behavior of “root” elements, and what role do labels play in their behavior? In the exoskeletal approach to syntax (Borer 2005a, 2005b et seq, De Belder 2011), lexical roots have no intrinsic categorial label: rather, the syntactic configuration in which a root is found determines its category and its specific semantic contribution. Distributive Morphology approaches (Marantz 1997 et seq) similarly argue that categorial functional heads determine the category of roots they merge with (cf. a.o. Arad 2003, 2005). The workshop “Labels and Roots”, which took place at the conference of the German linguistic society (Deutsche Gesellschaft für Sprachwissenschaft) in 2014, aimed at addressing issues surrounding the above topics and sought to bring together a variety of syntactic and morphological research whose focus is on empirical and conceptual arguments (in any and all current theoretical frameworks) concerning fundamental questions regarding labeling/endocentricity and roots, both to assess the state of the art in the field, and to inspire further research. The papers contributed for real were mostly minimalist, broadly construed. Most papers in this volume stem from this inspiring workshop and address some key questions: 1. What work do syntactic labels do, within the syntax and morphology, and for semantics and phonology? 2. What is the possible content of a label? Is the content drawn from one or both parts of a complex object? 3. How generally predictable are labels? 4. Are labels components of a structure, or determined by a labeling procedure (Boeckx 2009, Chomsky 2008, 2013), or can they be outsourced completely from the syntax (Adger 2013)?

DOI 10.1515/9781501502118-001

2 

 Introduction

5. Do lexical roots have morpho-syntactic features which serve as their labels (Harley, 2014), or are their syntactic/morphological categories provided by formal elements with which they combine (Marantz 2008, Irwin 2012)? 6. How might labeling be involved in more diverse syntactic phenomena, such as Agreement, Case, movement, and so on? The following considerations regarding how categorial and/or featural information of the most prominent lexical item in a given phrase comes about encompass major steps in the development of headedness in generative grammar. While most papers in this volume addressing labels directly or indirectly touch on Chomsky 2013, 2015, some pursue alternative conceptions altogether. Labelling has been a prominent concept in syntax ever since the formulation of the Standard Theory (Chomsky 1965). Labels are the key conceptual address in any phrase structure rule formulated at this time. However, the nature of labels is hardly ever scrutinized then. Labels exist by mere stipulation and similarly the distinction between a well-formed phrase-structure rule of the type VP → V NP and a phrase-structure rule of the kind PP → V NP, which derives ungrammatical structures, is based on stipulation. To this present day, the introduction of X’-theory in Remarks on Nominalization (Chomsky 1970) is celebrated as one of the hallmarks of generative syntax that finally remedied this unfortunate affair. Phrase structure rules that did neither derive all nor only the grammatical sentences of a language and thus failed to meet the key criterion for any theory of generative grammar were abandoned and replaced by X’-structures. So the numerous phrase structure rules of old were replaced by one uniform schema: XP → X’ (YP) and X’ → X° (ZP). Along the way, labelling was not so arbitrary and stipulative any more, the assumption that heads are the endocentric core of all structures, which in turn emerge from projection, provided a clear role for labels, which are likewise percolated along the projection line from the endocentric head. This did not solve the question of how labels are created but it provided a mechanistic purpose to labelling that was completely syntax internal. For the interfaces, the role of labelling still played virtually no role. In minimalist syntax the role of labels changed once again. Also this time, the change did not come about by questioning the status of labels per se, instead it was again a consequence of a conceptual change in how syntactic derivation proceeds that also affected the status of labels. In bare phrase structure syntax all derivations are characterized by recursive application of Merge, either external or internal, which is a set-creating operation. Hence, X’-theoretic conceptions of structure building are replaced by operations of set-formation of



Introduction 

 3

the type Merge (α, β) → {α, β} = γ. The question of labelling here resurfaces (in the form of how does γ receive a label) and is more imminent than it has ever been before. Basically two answers are provided. The first is still based on projection ideas that are imported from X’-theory according to which the label of γ is determined by simple projection of the label of either α or β and thus e.g.: γ = {LB(α), {α, β}}. Here, projection is not only a part of the syntactic operation itself, but the label is crucially represented in the syntactic structure. Alternatively, labelling is determined by a third factor principle, Minimal Search: In any syntactic structure derived by recursive application of Merge that forms a set consisting of a lexical item and a (complex) syntactic object, the lexical item will provide the label (cf. Chomsky 2004, 2008, 2013, 2015). Hence: Merge (Head, XP) → {Head, XP} = HeadP. From this point of view and in contrast to projectionist conceptions, the label is crucially not part of the syntactic representation. For further elucidation of the historical development of the theoretical stages and for detailed description of this recent approach, we refer the reader to the contributions by Epstein: Kitahara and Seely (henceforth EKS), Collins and other papers in this volume. What we would like to emphasize is that this theoretical maneuver pushes further the idea that language narrowly construed (cf. Hauser, Chomsky & Fitch 2002) might consist of only the recursive operation Merge. This is what we would like to call the ontological part, which concerns the question of what universal grammar (UG) comprises at all. This question must be distinguished from the procedural question, how the operation is triggered:1 Chomsky (2004 et seq) suggests that the application of Merge as such is unrestrained, i.e. it’s application is free. There are many ramifications of this view of grammar. For one thing, lest linguistic research is empirically vacuous, one must not throw the baby out with the bath water: The task is to investigate and ultimately understand restrictive conditions of the grammar or other linguistically relevant cognitive properties in terms of properties of the interfaces and so-called Third Factor principles (Chomsky 2005). Thus, ungrammaticality and deviance of expressions are not reflexes of unlicensed syntactic operations – with Merge being free, they are licensed by definition –, but must ultimately receive explanations in terms of the named properties. Our sense is that an occasional objection against this view of grammar is that neither interface properties nor Third Factor principles are a priory formally defined and sufficiently understood to yield a workable and empirically

1 The wording “Merge comes for free“ blurs this distinction, we feel.

4 

 Introduction

meaty theory of UG. But we believe that the development of reasonably explicit hypotheses of both interface conditions and Third Factor principles is part of at least some strands of Minimalism: Needless to say, these “external” conditions are only partially understood: we have to learn about the conditions that set the problem in the course of trying to solve it. The research task is interactive: to clarify the nature of the interfaces and optimal computational principles through investigation of how language partially satisfies the conditions they impose, not an unfamiliar feature of rational inquiry. (Chomsky 2005)

In other words, hypotheses about the interfaces and operationalized Third Factor principles enter into specific empirical analyses, which in turn feeds back into a hopefully enhanced understanding of the architecture of the grammar as a whole. The view of grammar we end up with lays the ground for an architecture of the grammar which fits a plausible scenario of how UG evolved from our immediate ancestors, namely as the result of a “slight mutation…”. Anything except hierarchical structure building – binary Merge – then is attributed to peripheral areas of the grammar (cf. Hauser, Chomsky & Fitch 2002’s distinction between faculty of language/broad sense and faculty of language/narrow sense), many of which we arguably share with species not our own. Returning to the linguistic/grammatical aspects, from this perspective endocentricity is an epiphenomenon of two interacting principles: (a) the requirement by the Conceptual-Intentional systems that syntactic objects have a prominent element and (b) the labelling algorithm based on the notion Minimal Search that implements this requirement in a computationally efficient manner. This new approach to labelling in our view solves a number of deep-rooted empirical problems such as split topicalization in languages like German (Ott 2012), EPP-phenomena (Chomsky 2013, 2015), successive-cyclic A’-movement (Blümel 2012, Chomsky 2013), successive-cyclic A-movement (EKS 2014), and potentially others like coordinated structures (Chomsky 2013), the conjunct constraint (Ott 2013) and the subject-in-situ-generalization (Alexiadou & Anagnostopoulou 2001, Chomsky 2013). This paves the way to a number of new and exciting questions and provides a perspective from which putatively understood phenomena become problematic or at least puzzling, generating novel avenues of research (e.g. EKS 2014, 2015, 2016, van Gelderen 2015, Saito 2016). This volume contributes to the exploration of some of these issues. Some of these are reiterations of old questions that have evolved around the issue of labelling ever since the earliest accounts, some, however, we can only now begin to ask. Among these are questions that center around the role of labelling within syntax (and morphology) and the role labelling plays at the interfaces or more precisely the way in which labelling interacts with semantics and phonology.



Introduction 

 5

The answer to the former question should also determine, whether labels are components of a structure at all, or whether they are (still) externally determined by a separate labelling procedure, or (alternatively) whether they can be outsourced completely from the syntax. Provided that labels are an integral part of syntactic computation, the next obvious question to be asked is whether labelling might then also be involved in more diverse syntactic phenomena, such as Agreement, Case, movement, and so on. The answer to the second question should not only clarify whether labels play a role beyond narrow syntax but also if Transfer in a phase theory has any effect on labelling later in the derivation. Furthermore, the exact content of labels needs to be addressed. I.e. it needs to be clarified in how far the label of a complex syntactic object is determined by one or both parts (or possibly none of the parts) of the objects that it consists of and thus ultimately, the question of how generally predictable labels are still awaits a conclusive answer. All these issues are addressed in numerous ways by the contributions in this volume. The second major theme of this volume, which stands in close interaction with the question of labelling is the status of roots (i.e. open class lexical items, cf. Marantz 1997, Borer 2005a, 2005b) in syntactic derivation. Here again recent advancements in theoretical syntax finally put us in a position to begin to ask questions that could not even be formulated in syntactic theories ‘of old’. Under a lexicalist approach, which figured prominently in generative grammar right until the advancement of more minimalist accounts, roots played no role in syntax whatsoever. Lexical items are conceived of as complex syntactic objects that consist of a category label and an additional set of morphosyntactic features. The exact content of this feature set is frequently disputed and somewhat unclear but includes and possibly extends beyond categorial features, additionally comprising phi-features, case features, selectional features, phonological features and semantic features. With the introduction of Distributive Morphology in the 1990s the situation changed and syntactic approaches that base all computation on the initial status of roots moved to the center of attention. Yet, here as well, we are far from universally agreed upon accounts, despite intensive research; and every new insight only allows us to formulate basic and fundamental research questions with more precision than before. Among these are what the exact status of roots then is in syntax (and possibly beyond). What the featural composition of roots is, if any. Here opinions are particularly varied and range from the view that roots are equipped with phonological and semantic features, and possibly some additional features, such as selectional features (cf. Harley 2014) to the view that they may consist of nothing but phonological and semantic features (and maybe not even that) (cf. Marantz 1997, Borer 2005a, 2005b, de Belder 2011) or of phonological features only (Borer 2013).

6 

 Introduction

The latter approach provides what could probably be characterized as the most radical answer. However, in all of these approaches the status of roots constitutes a major shift away from lexicalist assumptions of old and opens up a whole new avenue of research. In a sense, roots can be regarded as the new open class, which in turn makes questions on the nature of closed class elements such as categorizers of any sort, e.g. n,v, a, … all the more pressing. So the ‘breaking up’ of what used to be considered as an atomic unit, i.e. lexical items, has had a number of implications beyond the atomic level, while at the same time research at this very level also boosted and provided some very intriguing answers to long standing puzzles. For instance, the mere fact that it is now possible to consider roots as nothing but phonological indices (cf. Borer 2013) that are devoid of any category information, argument structure, morphological marking or substantive semantic content, opened up a pathway for a new analysis of conversion (or zero derivation). Even in lexicalist approaches to morphology conversion has always been a tricky thing, because it was one of the few operations that seemed to operate bidirectionally in the sense that it was, for instance, possible to derive nouns from verbs (to run - a run) as well as verbs from nouns (a fish - to fish) (cf. e.g. Katamba 1993). Bidirectionality in and of itself is a suspicious concept and the problem is augmented by the fact that it is not always that easy and straightforward to determine in which direction the conversion goes, plus the fact that in both directions a zero-form is involved. On top of all this, conversion constitutes a major problem when it is combined with other processes. So, for instance, when it comes to nominalizations it is never clear why certain lexical items need to undergo conversion first, before they can serve as the input to a second operation. E.g. in formation it is standardly assumed that form is converted from a nominal form into a verbal form in order to undergo -ation nominalization. (cf. Borer 2013 for detailed discussion). This assumption is suspicious for a number of reasons (e.g. because there are obvious counterexamples, where it is certainly not a verb that provides the input: fiction) and it would be less problematic if all converted forms existed in their own right and not just as the input for further operations. However, other than the set of words of latinate origin illustrated by fiction above, which already undermine the rule, the nominalizations of incorporated verb-object constructions constitutes a major problem in this respect. A by now infamous example is truckdriver. It is well known that -er nominalizes verbal forms. So in the case at hand this would mean that truckdrive as the input to the nominalization operation should be analyzed as a verbal form. Unfortunately, however, the verb to truckdrive does not exist. Under the assumption that all these forms can be syntactically derived and that roots are uncategorized, conversion operations are obsolete. There is simply no zero-derivation any more. Roots are rendered category equivalent in the structural



Introduction 

 7

configuration in which they are inserted (cf. Borer 2013). So under merger with -ation, form is rendered noun equivalent and under merger with -er, truckdrive is rendered verb equivalent, which can, however, not be equated with a per se nominal or verbal status of form or truckdrive respectively. So with the elimination of conversion/zero derivation grammar is much more minimal, which is always a welcome sight, and peculiar questions such as why does zero-derivation not apply 10 times over converting a noun into a verb into an adjective, back into a noun and into a verb before it can serve as the input for overt nominalization need not be answered any more. Another so far inconclusively answered question is what the ways are in which roots are realized. Here particularly the question of whether they are relevant in the initial stages of the derivation and whether they necessarily correspond to well-formed phonological words (Borer 2013) or whether they are inserted in the syntactic derivation only relatively late or maybe even post-syntactically after Spell-out are relevant, just as much as the question whether roots are introduced into the derivation as complements of categorizers, as modifiers, as both or as none of these is taken up in Alexiadou & Lohndahl’s contribution (cf. EKS 2016 for a nuanced version of the logically conceivable merge variants). It is particularly in the context of this latter set of questions that the interaction between roots and labels plays a most prominent role. And again, the papers in this volume address questions of this sort and provide interesting answers that further enhance our understanding of the role of labels and roots in syntax. We decided to organize this volume in four parts. The chapters of the first two parts broadly pertain to the computational system of syntax, its workings, the motivation and background for its ingredients and its procedural/derivational character. The chapters of the latter two parts are for the most part interface-related, i.e. touch on how structures formed in a formal syntax receive an interpretation at the sensorimotor and meaning side. Naturally, the criteria for allotting a chapter a category were not always clear-cut and at times papers cross-classify different categories. Still we believe we found four sensible rubrics for the varied sections all of which touch on labels and a few of which deal with roots specifically: 1. A label-free Syntax: Problems of Projection (and extensions) and some ­Ramifications 2. Transfer and Labels 3. The sensorimotor Systems 4. Roots from Bottom to Top The chapters in the first part elucidate and clarify various syntactic and extrasyntactic aspects and consequences of Chomsky’s recent works Problems of Projection (POP, 2013) and its extensions (POP+, 2015). The first two recapitulate much of the history of generative syntax, emphasizing different aspects of its development,

8 

 Introduction

and show how POP/+ carries further a conception of the grammar in which Merge applies freely/optionally (envisaged in Chomsky 2004, explicitly and emphatically advertised in Boeckx 2015 as “Merge α”), while Third Factors (Chomsky 2005) and interface conditions tame the system. The final contribution in this part reflects on different touching points and indeed options of unification of POP with other branches of generative grammar such as cartography and nanosyntax. Merge, Labeling and their Interactions is Samuel Epstein, Hisatsugu Kitahara and Daniel Seely’s (EKS) contribution, which rehashes much of the history of the phrase structural component within generative grammar, from the Aspects-era with Phrase Structure Rules up to a ‘bare’ system based on simplest Merge. It traces the status of “labels” in syntactic theorizing, changing from being stipulated constructs to effects of Third Factor principles and the CI-requirement that demands syntactic objects to be labelled. EKS reviews (EKS’ 2014) suggestions to extend POP(+)’s solution to intermediate movement steps in successive-cyclic A’-movement, the A-movement counterpart, without invoking principles like “Merge over Move”. They consider Rizzi’s (2014) attempts to deduce Criterial Freezing by means of the “shared label” conception in POP: Accordingly, if e.g. a WHphrase ends up in an interrogative C-position, the configuration is {WH, CP} = α. If α is labelled by the prominent feature on WH and CP alike, each term in α is rendered “intermediate” – and intermediate projection units cannot move for independent reasons. EKS argues that this analysis is inconsistent with POP(+) in that it presupposes labels (and projection levels) to be part of Narrow Syntax, which, EKS contend, POP(+) in fact seeks to do away with. Secondly and referring to EKS (2015), EKS criticizes that Rizzi’s analysis requires additional principles for cases of filling – instead of abandoning – a criterial position and still other cases. For interrogatives, they explicate independently motivated assumptions under which a unification of Rizzi’s fulfilling a criterion with Criterial Freezing is feasible: next to the condition that syntactic objects need a label, it is 1) There is only one CQ in the (English) lexicon, appearing in both yes/no- and wh-interrogatives. 2) An SO, the label of which is identified as the head CQ, unaccompanied by a “wh-specifier,” is interpreted as a yes/no-question. 3) An SO, the label of which is identified as the Q-feature, shared by the two heads CQ and WHQ, is interpreted as a wh-question. 4) English yes/no-questions require T-to-C inversion or rising (question) sentential prosody, available only in matrix clauses, and when embedded, the resulting structure cannot be felicitously interpreted; such structures are gibberish (and perhaps crash) at CI. In Merge(X,Y) = {X,Y} Chris Collins shows how the history of generative grammar can be conceived of as a “gradual unbundling of syntactic transformations”. He exemplifies this research procedure by the way in which different movement types in the late 1970s were unified under one type (wh-movement)



Introduction 

 9

and eventually under the generic rule Move-α which became the central and only transformation of the Government and Binding-era. He highlights that these rules were quite simple (like Merge nowadays), while the explanations for the grammaticality status of sentences became – and becomes – more complex, because of the ways in which the simple components of the grammar interact within the architecture of the grammar as a whole. Like movement, phrase structure rules too underwent such an unbundling, turning from rewrite rules, via X-bar theory to its current form of “simplest Merge”. He shows how this unification of phrase structure rules and transformations was achieved by resorting to the single operation Merge targeting different sources: the lexicon/separate workspaces on the one hand and the interior of previously built structure on the other – the former corresponding to base-generation and the latter to movement transformations. The step towards a labelless conception of Merge, sparked by Collins (2002) and developed in Chomsky (2013, 2015) represents yet another stage of unbundling in that Merge is reduced to its bare essential form: Merge(X,Y) = {X,Y}. Collins lists 13 properties of Merge, stating that the current stage marks “the ultimate destination, in that no other simplifications are imaginable.” The properties he discusses include that Merge involves no feature checking, is strictly cyclic, involves no operation of copying, or of forming chains. He delves into the question why labels might be needed, proposes a minimal search based linearization procedure (which yields LCA-structures), gives a formal definition of derivations, and discusses the status of Agree, among other things. In Features and Labeling: Label-driven Movement Aleksandra Vercauteren hones in on the ways in which movement triggers have been implemented and motivated within minimalism (feature-driven movements of various kinds). She then draws links between ideas advanced in POP(+) and certain core assumptions of cartography/nanosyntax2: a) the claim that unlabelabillity of XP-YP-structures, not features, drive movement might have a wider range of application (e.g. grammatical objects, local focus movement or VP-fronting) if a richer – phrasal – structure is considered as launching site. b) the idea that XP-YP-structures bring the moving member to a halt if a relevant prominent feature on X and Y is shared; within cartography many such potential landing/criterial sites have been proposed over

2 Cartography is mainly associated with researchers like Luigi Rizzi and Guglielmo Cinque, while Nanosyntax was established mainly by Michal Starke. We recognize important differences between the frameworks, even though nanosyntax might in ways be considered a logical and radical extension of cartography (with – partly altered – ingredients of DM). Vercauteren stresses the commonalities and so do we by using the forward slash and mentioning the frameworks in one breath.

10 

 Introduction

the years, again suggesting a broader application than considered in POP(+). Finally, she c) emphasizes that the idea that POP(+)’s labelling algorithm probes for features (not heads) has been anticipated by cartography and naturally links the two approaches. Aside from these issues, Vercauteren raises the interesting possibility that some XP-YP-structures might neither be labelled by “symmetrybreaking” movement nor by feature sharing but remain labelless instead and might serve as the locus of the (postsyntactic) realization of words, i.e. lexical insertion. Part two comprises contributions that tackle the issue of how, in a phase-theoretical framework, labelling interacts with cyclic Transfer. In Labeling and Other Syntactic Operations Petr Biskup addresses the question if labelling can be considered a proper syntactic operation, on a par with others like Internal/External Merge (IM/EM), Agree and Transfer, and be ordered with respect to them. He argues that this is indeed the case and shows that four established and quite disparate constraints on movement follow from this conception: Freezing effects, the ban on headless XP-movement, order preservation in multiple movements and the ban on acyclic incorporation. Relying on the assumption in POP that syntactic objects do not need to be labelled for the ongoing derivation to continue (labels on syntactic objects being required at the point of transfer only), he proposes that IM precedes labelling and that movement is indirectly feature-driven. Moreover, he adopts the idea that transfer is a property of every Merge operation and suggests that labelling must precede Transfer. From this arrangement of syntactic operations several phenomena emerge as effects. In Is Transfer Strong Enough to Affect Labels Miki Obata distinguishes and recognizes two conceptions of Transfer in the literature on phases, strong and weak. Transfer is the operation that periodically hands over syntactic portions to the interfaces, commonly taken to be complements of phase heads. She asks if Transfer affects syntactic representation in such a way that (a) no syntactically relevant material remains in Narrow Syntax (strong Transfer) or (b) that syntactically relevant material remains part of the Narrow Syntactic representation, albeit inaccessible (weak Transfer). She exemplifies view (a) with Ott’s (2011) analysis of free relatives, argues that it is based on strong Transfer and identifies the problem that it overgenerates in wrongly predicting “stranding” (=in-situ linearization) of transferred material. She contrasts (a) with the view (b), which she exemplifies with Chomsky’s (2013) labelling analysis as well as the considerations in Obata 2009, 2010, where Transfer leaves a copy of the label of the transferred unit, marking the points at which transferred material can be reassembled or reconstructed. Finally, she discusses the issue of transferring entire phases, for which she suggests two instantiations: root clauses and adjunct clauses. In a phase-based system the option of transferring entire phases must be available, since root and adjunct clauses are interpreted by the interfaces.



Introduction 

 11

Part three goes beyond the point of transfer and turns to the question of how labelling interacts with the sensorimotor component. The chapters in this part address how labeling affects linear order and how properties of the phonological system can be associated with labeling. The first chapter by Dennis Ott Clausal arguments as syntactic satellites: a reappraisal argues that Koster’s (1978) and Alregna’s (2005) satellite hypothesis, which was recently criticized by Takahashi (2010) and Moulton (2013) can be maintained under the assumption that left-dislocation is analyzed as in Ott (2012b, 2014). According to the satellite hypothesis, sentential subjects are not simply the result of a fronting operation of the subject CP to Spec, CP of the matrix clause. Instead, they are the result of left-dislocation and they thus sit in a position external to their host CP. However, Takahashi (2010) and Moulton (2013) criticize that this account fails to explain connectivity effects, such as NPI licensing or variable binding from the left-dislocated CP into the main clause. Ott shows that under an analysis that accounts for left-dislocation in terms of a bi-clausal structure in which all but the dislocated XP of the first clause is elided, the original satellite hypothesis can be maintained. In fact, analyzing clause initial CP arguments as left-dislocated elements avoids certain problems that the alternative account cannot circumvent. The alternative would be to analyze these constructions as simple cases of clause internal CP fronting. This, however, would create a {CP, CP} structure, which violates Richards’ (2010) distinctness condition on linearization. Under a left-dislocation analysis where the host CP is elided no violation of the distinctness condition arises. The second chapter in this section by Michelle Sheehan A labelling-based account of the Head-Final Filter argues that Greenberg’s Universal 21 (1963)/­William’s Head Final Filter (HFF) can be subsumed under a more general constraint, i.e. the Final Over Final Constraint (FOFC) discussed in a number of recent publications (cf. Holmberg 2000, Biberauer, Holmberg & Roberts 2008, 2014, Sheehan 2013a, Biberauer, Holmberg, Roberts & Sheehan, forthcoming). According to the original observation in Greenberg, a prenominal modifier cannot be separated from the phrase it modifies, viz. the ungrammaticality of *a clever for a linguist analysis. Sheehan links the ungrammaticality of this construction and other violations of FOFC and HFF to a linearization problem that occurs with specifiers that are rightbranching. She shows that under a copy theory of labelling the problem surfaces in  those ­configurations where all labels and their terminal nodes are segments of the same category provided that linearization operates over category labels rather than phrases (as is independently proposed in Sheehan 2013a; 2013b). In this configuration right-branching specifiers as opposed to their left-branching counterparts cannot be linearized, which means that they either must stay in the base-generated position or that they are fronted as extraposed reduced relatives. Interestingly, thus, the ungrammaticality of HFF/FOFC violations is analyzed as

12 

 Introduction

a PF-phenomenon (resulting from linearization) and thus allows to keep narrow syntactic structural relations uniform for left- and right-branching specifiers. The final part in this volume owes its title and classification to the ambiguous nature of the grammatical term “root” and subsumes three chapters under a rhetorical rather than substantial category: open class lexical units (roots) on the one hand, and the topmost sentential (=root) node on the other. It is one of the hallmarks of DM that the derivation starts with category-neutral roots (cf. e.g. Marantz 1997). These are taken to replace lexical categories that figure prominently in Standard Theory and early GB analyses. However, there is still no uniform agreement on how – in which structural configuration, by virtue of what structural relations – these category-neutral roots are eventually categorized and on how long the derivation can operate on uncategorized material. The first two papers in this section address these issues. The final paper looks at root phenomena, i.e. syntactic phenomena confined to the highest, i.e. non-embedded matrix clause, corresponding to the end of the derivation (cf. e.g. Emonds 1976, 2004, 2012). Artemis Alexiadou & Terje Lohndal’s chapter The structural configurations of root categorization thus reexamines the question of what the status of roots is in the derivation. So far in the literature, at least four approaches can be distinguished that all make different predictions about the structural configurations that lead to root categorization. According to what may be called the traditional view on root categorization, roots are merged as complements (cf. e.g. Embick 2010 and Harley 2014 for an overview). Alexiadou & Lohndal first reiterate prominent evidence in favour of this view, which comes from one-replacement, VO-idioms, root suppletion in Hiaki and nominalization, and then go on to show that none of these phenomena provides conclusive evidence for the complement status of roots. Next, the idea that roots are introduced into the derivation as modifiers is scrutinized. Under this view roots could potentially remain uncategorized, as proposed e.g. in de Belder (in press). However, there is no conclusive evidence in favor of this view either. Under an alternative approach, pursued e.g. in Embick (2004), roots can be introduced into the derivation either as complements or as modifiers. This can thus be characterized as the compromise between the first two approaches. However, here as well, particularly the structural configurations under which roots can be merged as complements of categorizers seems to be restricted to a very limited set that is virtually indistinguishable from those roots that are merged as modifiers. The last view is that roots are introduced into the derivation by a special mechanism, such as self-merger (cf. Adger 2013) or merger with an empty set also called Unary Merge by de Belder and van Craenenbroek (in press). Particularly this last option is discussed in quite some detail. The idea derives from the observation that some configurations seems to live a Janus-faced life insofar as they allow for the insertion of either roots or functional vocabulary items.



Introduction 

 13

This can only be accounted for under the assumption that roots are late inserted into the derivation by a mechanism of unary merge. However, this account also leaves a number of questions unaccounted (and some unaccountable). Thus, Aleaxiadou & Lohndal argue that up to now there is no conclusive evidence in favor of any of the accounts that are suggested for the structural configurations that allow for the categorization of roots. The chapter by Leah Bauke & Tom Roeper How unlabelled nodes work: morphological derivations and the subcomponents of UG operations asks the question whether there is independent evidence for the role of labelling heretofore unlabelled lexical items, i.e. roots, in the course of the derivation. They argue that unlabelled nodes exist in syntax beyond the level of the root and that these have direct impact on syntactic movement operations in so far as they can disambiguate an underdetermined semantics. So far, the most prominent arguments that go into a similar direction come from an analysis of small clauses along the lines of Moro (2000) and subsequent literature. Bauke & Roeper extend this analysis and advance evidence from compounding and preverbal affixation in order to show how the existence of unlabelled nodes determines (morpho)syntactic structure. In fact, they argue that compound incorporation and productive recursive preverbal affixation is underivable in a bare phrase structure syntax unless unlabelled nodes are assumed to exist. Thus, this analysis also provides evidence for the fact that labelling is not solely an interface requirement imposed by PF legibility of syntactic structures (as originally argued in Moro 2000). It rather is a genuine syntactic operation that has an impact on both the PF and the LF interface. Bauke & Roeper cast their analysis in terms of a reformulated and advanced version of the basic ideas underlying the abstract clitic hypothesis already discussed in Roeper & Keyser (1992) and critically reevaluated in Bauke (2014). Andreas Blümel’s chapter Exocentric Root Declaratives: Evidence from V2 concludes the section and also the volume on Labels and Roots. In this chapter, the status of root declaratives in V2 languages is reevaluated. Contra widely-held assumptions (but cf. Emonds 2004, 2012 for an alternative account that is somewhat more in line with Blümel’s analysis), Blümel argues that root declaratives in V2 languages (and possibly universally) must remain unlabelled. Thus, the stipulations of a projection based endocentric syntax that stem from the days of an X’-theoretic conception of syntax are challenged and it is argued that it is precisely a failure to label an XP-YP configuration in root declaratives that forces these structures to remain labelless in a system based on simplex Merge. Blümel shows first that neither of the two labelling strategies suggested in Chomsky (2013, 2015) can readily account for V2 root declaratives and turns the argument around by proposing that V2 provides evidence for the labellessness of {XP, CPV2}. Thus the labeling algorithm holds the key to explaining typical properties of V2 root declaratives. Among these the fact that

14 

 Introduction

(almost) any phrase can occupy the prefield position in V2 clauses, and that at most and at least one phrase must occupy this position. This link, between V2 properties and a projection free bare phrase structure syntax derived by Merge provides strong evidence in favor of abandoning standard labelling assumptions for root declarative XP-YP structures (i.e. that they must be endowed with a syntactic category). The labelless nature of these structures is shown to be rather due to third-factor interface driven properties of syntactic structures and thus supports the view that narrow syntax is very minimal. We would like to end this introduction by thanking Erich Groat for his untiring energy and commitment within this project. Without his dedication and initiative, this volume would not exist.

References Adger, David. 2013. The syntax of substance. Cambridge, MA: MIT Press. Alexiadou, Artemis and Elena Anagnostopoulou. 2001. The subject in situ generalization, and the role of case in driving computations. Linguistic Inquiry 32. 193–231. Alrenga, Peter. 2005.  A sentential subject asymmetry in English and its implications for complement selection.  Syntax 3. 175–207.  Arad, Maya. 2003. Locality constraints on the interpretation of roots: the case of Hebrew denominal verbs. Natural Language & Linguistic Theory 21. 737–78. Arad, Maya. 2005. Roots and patterns: Hebrew morpho-syntax. Dordrecht: Springer. Bauke, Leah S. 2014. Symmetry breaking in syntax and the lexicon. Amsterdam: John Benjamins. De Belder, Marijke. 2011. Roots and affixes. Utrecht: LOT. De Belder, Marijke. In press. The root and nothing but the root: Primary compounds in Dutch Syntax. Biberauer, Theresa, Anders Holmberg and Ian G. Roberts. 2008. Structure and linearization in disharmonic word orders. In Charles B. Chang and Hannah J. Haynie (eds.), The Proceedings of the 26th Western Coast Conference on Formal Linguistics, 96–104 Somerville, MA: Cascadilla Proceedings Project. Biberauer, Theresa, Anders Holmberg and Ian G. Roberts. 2014. A syntactic universal and its consequences. Linguistic Inquiry 45(2). 169–225. Biberauer, Theresa, Anders Holmberg, Ian Roberts and Michelle Sheehan. Forthcoming. The final-over-final constraint. Cambridge, MA: MIT Press Blümel, Andreas. 2012. Successive cyclic movement as recursive symmetry-breaking. In Nathan Arnett and Ryan Bennett (eds.), Proceedings of WCCFL 30, 87–97. Somerville, MA: Cascadilla Proceedings Project. Boeckx, Cedric. 2009. On the locus of asymmetry in UG. Catalan Journal of Linguistics 8. 41–53. Boeckx, Cedric. 2014. Elementary syntactic structures. Cambridge: Cambridge University Press. Borer, Hagit. 2005a. In name only. Structuring sense, Volume I. Oxford: Oxford University Press.  Borer, Hagit. 2005b. The normal course of events. Structuring sense, Volume II. Oxford: Oxford University Press.  Borer, Hagit. 2013. Taking form. Structuring sense, Volume III. Oxford: Oxford University Press. Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.



Introduction 

 15

Chomsky, Noam. 1970. Remarks on nominalization. In In: R. Jacobs and P. Rosenbaum (eds.), Readings in English transformational grammar, 184–221. Waltham, MA: Ginn. Chomsky, Noam. 1986. Barriers. Cambridge, MA: MIT Press. Chomsky, Noam. 2004. Beyond explanatory adequacy. In A. Belletti (ed.), Structures and beyond, 104–131. Oxford: Oxford University Press. Chomsky, Noam. 2005. Three factors in language design. Linguistic Inquiry 36. 1–22. Chomsky, Noam. 2008. On phases. In R. F. C. Otero and M.-L. Zubizarreta (eds.), Foundational issues in linguistics, 133–166. Cambridge, MA: MIT Press. Chomsky, Noam. 2013. Problems of projection. Lingua 130. 33–49. Chomsky, Noam. 2015. Problems of projection: Extensions. In E. Di Domenico, C. Hamann and S. Mattoon (eds.), Structures, strategies and beyond: Studies in honour of Adriana Belletti, 1–16. Amsterdam & Philadelphia: John Benjamins. Collins, Chris. 2002. Eliminating labels. In S. D. Epstein, T. D; Seely (eds.), Derivation and explanation in the minimalist program, 42–64. Oxford: Blackwell. Embick, David. 2000. Features, syntax, and categories in the Latin perfect. Linguistic Inquiry 31. 185–230. Embick, David. 2004. On the structure of resultative participles in English. Linguistic Inquiry 35. 355–392. Emonds, Joe. 1976. A transformational approach to English syntax: Root, structure preserving and local transformations. New York: Academic Press. Emonds, Joe. 2004. Unspecified categories as the key to root constructions, 75–120. Dordrecht: Kluwer. Emonds, Joe. 2012. Augmented structure preservation and the Tensed S Constraint. In L. Aelbrecht, L. Haegeman, and R. Nye (eds.), Main clause phenomena: New horizons, 21–46. Amsterdam: John Benjamins. Epstein, Samuel D., Hisatsugu Kitahara, and T. Daniel Seely. 2012. Structure building that can’t be! In Myriam Uribe-Etxebarria and Vidal Valmala (eds.), Ways of structure building, 253–270. Oxford: Oxford University Press. Epstein, Samuel D., Hisatsugu Kitahara, and T. Daniel Seely. 2014. Labeling by minimal search: Implications for successive cyclic A-movement and the elimination of the postulate “phase”. Linguistic Inquiry 45. 463–481. van Gelderen, Elly. 2015. Forced asymmetry and the role of features in language change. Manuscript based on a talk at the DGfS-conference, March 6th 2015. Greenberg, Joseph. 1963. Some universals of grammar with particular reference to the order of meaningful elements. In Joseph Greenberg (ed.), Universals of Language, 58–90. Cambridge, MA: MIT Press. Harley, Heidi. 2014. On the identity of roots. Theoretical Linguistics 40. 225–276. Hauser, Marc, Noam Chomsky and Tecumseh Fitch. 2002. The faculty of language: who has it, what is it and when did it evolve? Science. 1569–1579. Holmberg, Anders. 2000. Deriving OV order in Finnish. In P. Svenonius (ed.), The derivation of OV and VO, 123–152. Amsterdam: John Benjamins. Irwin, P. 2012. Unaccusativity at the Interfaces. NYU Ph.D. thesis. Jackendoff, Jay. 1977. X’ syntax: a study of phrase structure. Cambridge MA: The MIT Press.  Katamba, Francis. 1993. Morphology. New York: Palgrave Macmillan.  Keyser, Samuel J. and Thomas Roeper. 1992. Re: the abstract clitic hypothesis. Linguistic Inquiry 23. 89–125. Koster, Jan. 1978. Why subject sentences don’t exist. In Samuel Jay Keyser (ed.), Recent transformational studies in European languages, 53–64. Cambridge, MA: MIT Press.

16 

 Introduction

Marantz, Alec. 1997. No escape from syntax: Don’t try morphological analysis in the privacy of your own lexicon. University of Pennsylvania Working Papers in Linguistics 4 (2), Article 14. Marantz, Alec. 2008. Phases and words. In Sook-Hee Choe (ed.), Phases in the theory of grammar, 191–222. Seoul: Dong In. Moro, Andrea. 2000. Dynamic antisymmetry. Cambridge, MA: MIT Press. Moulton, Keir. 2013. Not moving clauses: Connectivity in clausal arguments. Syntax 16. 250–291. Obata, Miki. 2009. How to move syntactic objects bigger than a phase: On the formal nature of transfer and phasal re-assembly. Proceedings of the 27th English Linguistic Society of Japan (JELS) 27. 207–216. Obata, Miki. 2010. Root, successive-cyclic and feature-splitting internal merge: Implications for feature-inheritance and transfer. University of Michigan, Ann Arbor, Ph.D. thesis. Ott, Dennis. 2011. A note on free relative clauses in the theory of phases. Linguistic Inquiry, 183–192. Ott, Dennis. 2012a. Local instability: Split topicalization and quantifier float in German. Berlin: De Gruyter. Ott, Dennis. 2012b. Movement and ellipsis in contrastive left-dislocation. In Nathan Arnett and Ryan Bennett (eds.), Proceedings of WCCFL 30, 281–291. Somerville, MA: Cascadilla Proceedings Project. Ott, Dennis. 2014. An ellipsis approach to contrastive left-dislocation. Linguistic Inquiry 45. 269–303. Richards, Norvin. 2010. Uttering trees. Cambridge, MA: MIT Press. Rizzi, Luigi. 2014. Cartography, criteria and labeling. University of Geneva unpublished manuscript. Saito, Mamoru. 2016. (A) Case for labeling: Labeling in languages without Phi-feature agreement. The Linguistic Review 33. 129–175. Seely, T. Daniel, S. D. Epstein, and H. Kitahara. 2015. Explorations in maximizing syntactic minimization. Routledge Leading Linguists Series. Routledge. Seely, T. Daniel, S. D. Epstein and H. Kitahara. 2016. Phase cancellation by external pair-merge of heads. The Linguistic Review 33 (1). Sheehan, Michelle. 2013a. Explaining the final-over-final constraint. In Theresa Biberauer and Michelle Sheehan (eds.), Theoretical approaches to disharmonic word orders, 407–444. Sheehan, Michelle. 2013b. Some implications of a copy theory of labeling. Syntax 16 (4). 362–396. Takahashi, Shoichi. 2010. The hidden side of clausal complements. Natural Language and Linguistic Theory 28. 343–380. Wood, Jim. 2012. Icelandic morphosyntax and argument structure. NYU Ph. D. thesis.

Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

Merge, labeling and their interactions

Abstract: This paper reviews and discusses a series of papers by Epstein, Kitahara and Seely, related to Chomsky’s (2013, 2014) ‘labeling by minimal search’ analysis. After providing a brief history of ‘labels,’ some empirically (and explanatorily) advantageous consequences of Chomsky’s labeling by minimal search analysis are revealed, including that (i) it explains ‘obligatory exit’ in A-movement without any reference to Merge-over-Move, lexical arrays and subarrays, nor in fact to the construct ‘phase’ (motivated in Chomsky 2000), at least suggesting the possibility of their eliminability, and (ii) it explains ‘obligatory halting’ in key instances of criterial freezing (without appeal to the analytical apparatus proposed in either Epstein 1992 or Rizzi 2014). These results are consistent with the twin (yet often implicit) goals of: (i) reducing Merge to its simplest and most unified form (with no labels nor label projection, as (to our knowledge) first proposed in Collins 2002, Seely 2006) while (ii) concomitantly maximizing Merge’s explanatory effects (postulating as few operations as possible beyond Merge). It is important to note that this research is entirely continuous with the 65 year old (scientific) enterprise of seeking to construct an explanatory theory of the format of descriptively adequate transformational and phrase structure rules (now unified under Merge) and to also explain the nature of the (apparent) constraints on transformational rule application, including when transformational application is obligatory (“obligatory exit”) and when it is prohibited (“freezing”), and why.

1 Introduction This paper provides a brief (and selective) history of the nature, motivation and use of labels within the generative tradition and then explores recent developments regarding labeling, focusing on Chomsky’s (2013, 2014) labeling by minimal search analysis. More specifically, we review, and add some speculative extensions to Epstein, Kitahara, Seely (EKS) (2014, 2015, to appear).

2 Labeling by minimal search In this section we briefly review the recent labeling by minimal search analysis of Chomsky (2013, 2014), which provides the point of departure for EKS (2014, 2015, DOI 10.1515/9781501502118-002

18 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

to appear). EKS adopts the labeling analysis of Chomsky and then provides further positive (and unnoticed) consequences of that analysis.

2.1 A short history of labeling Before presenting the details of Chomsky’s (2013, 2014) labeling analysis and our extensions of it, we first provide a selective history of the notion ‘label’ in generative grammar, from the PS rules of Standard Theory through X-bar theory, to Binary and Singulary Generalized Transformation, Internal and External Merge and finally to unified and simplest Merge.1 In part, this is a history of the simplification of a central aspect of syntactic theory, namely that labels were explicitly represented in the syntactic objects that constitute the representational output of the structure building mechanism(s).2 But, over time, labels and label projection were eliminated from the syntax. The structure building mechanisms have changed over the course of the development of generative grammar and, with these changes, we find different notions of label and label projection. A number of researchers, including Collins (2002) and Seely (2006), argue for the elimination of labels and labeling entirely. In Chomsky’s most recent work, however, the effects of labels, which are not explicitly represented in syntactic representations, are derived from the application of independently motivated, third-factor mechanisms (specifically minimal search), and with interesting empirical consequences. 2.1.1 PS rules in the Standard Theory Recursive PS rules of the Standard Theory (see Chomsky 1957, 1965) provided a revolutionary solution to the cognitive paradox of discrete infinity: while the human brain is finite, the generative capacity of any I-language (representing an individual’s knowledge of language) is infinite. A finite set of recursive PS rules (or a single recursive rule itself, see below) provided the means to generate an infinite number of mentally representable abstract structures and thus provided an explicit representation of human knowledge of syntactic structure and accounted for the fundamental “creative aspect of language use,” while playing a central role in the (re)birth of the cognitive sciences and the development of computational-representational theories of mind.

1 This discussion of the history of labeling draws extensively from EKS (to appear). 2 See Collins (this volume) for important related discussion.

Merge, labeling and their interactions 



 19

A recursive structure-building mechanism of some type is necessary for any adequate theory of I-language. But of course, one central question is: “Why do we find these particular (construction-specific or category-specific) rules, and not any of an infinite number of other PS rules, or other types of rules?”3 Why, for example, does a rule like (1) have the properties it has? (1) VP → V NP For instance, Why is the ‘mother node’ on the left labeled VP (and not some other category or, for that matter, some non-category)? And more generally still, why is there a label at all? Within Standard Theory, these questions were not asked; rather PS rules were axiomatic and any single phrasal category could be rewritten as any sequence of categories and thus the existence and categorial status of mother labels were pure stipulation, true by definition. So, for example, Standard Theory allowed a rule like (2) (2) S → NP VP in which the mother node S is not a projection of (i.e. it is categorially unrelated to) its daughters.4 This in turn raised the question: why do we find such headless phrases as S, while the major lexical phrasal categories seem to have heads, e.g., V in VP, N in NP, etc.? The (only) answer available at the time was: it seems to be true by definition, hence by stipulation, i.e. we have no explanation. Note that besides being stipulative, there is arguably a formal, or interpretive unclarity concerning the relationship between PS rules and the PS trees generated by applying rules. For example, rule (2) contains one and only one formal symbol “S”. However in the PS tree (3) generated by applying the rule (2), there are two entities we call “S”: (3)

S NP

VP

That is, in the tree representation (3), there appears the label “S” (appearing immediately above NP and VP), yet in addition, the entire tree itself is called ‘an S’. This disparity, between the rule and the representation, has perhaps engendered confusion concerning the nature of PS generation vs. PS representation.

3 These questions were not asked at the time. We explore them in hindsight; they largely emerge with the Minimalist Program, see Epstein and Seely 2006 for discussion. 4 Note that there is an “unprojecting headed” rule like (i) in Chomsky 1981: (i) S→NP INFL VP. Here we have not only S being projected from no head, but in addition a head (INFL) which fails to project.

20 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

Furthermore, the mother nodes of (1) and (2) on the left, and phrasal category labels in general, involve, at least in one sense, ‘look ahead.’ Standard Theory appealed to ‘top down’ PS rule application, but as pointed out by Chomsky (1995b), attributing the insight to Jan Koster, such PS rules are telic in that they indicate the categories generated by the syntax that will be relevant to the interpretive components, PF and LF. 5, 6 As discussed in detail in EKS (to appear), such look ahead is particularly evident given Chomsky (1965) Aspects’ postulation of the empty ∆ node, combined with substitution transformations:7 “… suppose that (for uniformity of specification of transformational rules) we add the convention that in the categorical component, there is a rule A → ∆ for each lexical category A, where ∆ is a fixed ‘dummy symbol.’ The rules of the categorical component will now generate Phrase-markers of strings consisting of various occurrences of ∆ (marking the positions of lexical categories) and grammatical formatives.” Aspects, p. 122.

So, consider passive: ∆ would appear in the (simplified) deep phrase marker associated with passive, as in (4)8 (4) [S [NP ∆ ] was arrested [NP the man]] The object NP then raises via substitution to the pre-existing ∆ subject NP generated in the Deep Structure (DS), yielding: (5) [S [NP the man] was arrested [NP the man]]

5 Chomsky (1995b) states “Some of the general constraints introduced to reduce the richness of descriptive apparatus also had problematic aspects. An example is Emonds’s influential structurepreserving hypothesis (SPH) for substitution operations. As has been stressed particularly by Jan Koster, the SPH introduces an unwanted redundancy in that the target of movement is somehow ‘there’ before the operation takes place; that observation provides one motive for nonderivational theories that construct chains by computation of LF (or S-Structure) representations. The minimalist approach overcomes the redundancy by eliminating the SPH: with D-structure gone, it is unformulable, its consequences derived … by general properties of Merge and Attract/Move” (Chomsky 1995b, 318). See below for further comment on the shift from Aspects to early minimalism. 6 Such labels are also relevant to Standard Theory’s category-specific operations in the syntax. 7 It should be noted that the ∆ node played an important role in another crucial development in Aspects; namely, the separation of the lexicon from the syntactic component. 8 We are simplifying what is meant by ‘passive’ in this context. At the time passive was analyzed as involving two transformations – one moving the deep subject into the by-phrase and the other moving the object into the vacated subject position.



Merge, labeling and their interactions 

 21

In effect, ∆ is an empty and non-branching maximal projection with a purely formal status, lacking in lexical (nominal) featural content, i.e. it is a projection of no head at all, raising one of the problems noted with respect to S in rule (2) above. The DS in (4) in fact ‘preordains’ the categorial structure of what the Surface Structure (SS) will be. If such structure preserving ∆ substitution is employed, then the label of the NP subject of S is already present at DS, ’awaiting’ the obligatory arrival of the man. This encoding of SS into DS threatens the concept of level itself, suggesting that levels are in some sense intertwined, or non-existent (as was later postulated in Chomsky 1993, Brody 1995, Chomsky 2000, Epstein et al. 1998, Uriagereka 1999). Overall, then, recursive PS rules of the sort found in Standard Theory provided an empirically motivated, profound answer to a paradox, and solved the fundamental cognitive problem of discrete infinity. But, the nature of labels and projection raised a number of important (and unanswered) questions. 2.1.2 X-bar theory: the elimination of PS rules X-bar theory represented a major development in the history of phrase structure, and specifically for our purposes here, in the history of the notion phrasal label.9 X-bar theory attempted to provide answers to (at least) some of the questions raised by PS rules. Rather than the stipulated, hence non-explanatory PS rules of Standard Theory, the X-bar format imposed clear restrictions on, and provided a uniform analysis of, ‘humanly possible phrase structure representation,’ eliminating PS rules, and leaving only the general and uniform X-bar format as part of UG.10

9 See Chomsky 1970, Jackendoff 1977, Stowell 1981, among others; see also Chametzky 2000 Chomsky 1995a, b, and Lasnik and Londahl 2013 for discussion. 10 There seem to be two views of X-bar theory in the literature. One is that there are PS-rules, it’s just that they are reduced to the absolute minimum (expressed in X-bar theoretic terms; thus, for example: XP → Spec Xʹ; Xʹ → X YP). The other view, which we assume, is that the X-bar template is a filter, the (implicit) assumption being that there is something like Generate-alpha that produces the syntactic objects to be filtered. Just like Move-alpha, Generate-alpha can build any phrase structure it wants, and only those satisfying the X-bar template survive. In retrospect, some general structure building rule must have been assumed, but there was no point of discussing it, since no matter how structures are built, only those X-bar compliant structures survive. Thanks to a reviewer for requesting clarification on this point. As Epstein & Seely (2002) p. 6 note: “GB, as contrasted with the Standard Theory, is traditionally assumed to be representational, characterized as a ‘virtually rule-free system’ (Chomsky 1986: 93). But ‘virtually rule free’ isn’t ‘rule free.’ Indeed GB theory did have rules, including, for example, Move-alpha and whatever generated structures that could comply with (or violate) the X-bar schema.”

22 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

The three central tenets of X-bar theory are endocentricity, cross-categorial uniformity, and (in the most widely adopted version) ternary levels of projection. All phrases have a lexical head and they all have the same basic internal structure, as encoded in the X-bar template. Also, as compared with standard PS rules, another essential property of X-bar is a third level of projection, neither lexical nor a full phrase, namely, X-bar. Endocentricity was assumed without exception: since some categories seemed to be endocentric (the lexical categories VP, NP, PP, AP, etc.), it was assumed that all categories, lexical and functional alike, are endocentric, thereby expressing crossphrasal uniformity.11 “Headless” PS rules, like (2), S → NP VP, are thus eliminated, reduced to the X-bar template and thus ‘forced’ to have a lexical head. Another crucial innovation of X-bar theory, representing a profound step in the development of the strong minimalist thesis, is the elimination of linear order from the PS component; X-bar theory specified no linear order of elements within the syntactic structure. By contrast, Standard PS rules simultaneously defined two relations, dominance and precedence, and therefore the application of a single PS rule could not (in retrospect) be a primitive operation since two relations, not one, are instantaneously created. X-bar theory takes an important step in reducing the two relations to one, and it does so by eliminating linear order, which is a property of PF and (by hypothesis) not a property of LF. X-bar theory thus disentangled “dominance” (in hindsight, a misnomer, better characterized as ‘set membership’ in more recent work, and avoiding the ‘confusion’ noted above concerning the difference between a label vs. the entire category the label is the name of) and precedence. In addition, it sought to explain their existence, and the non-existence of all other relations, as required by, hence subservient to, the interfaces (dominance for semantics, precedence for phonology).12

11 Of course, the asymmetry could have been resolved by alternatively assuming no category is endocentric. In current work the asymmetry (some phrasal categories like VP are endocentric and some categories, like S, are not) is simply a consequence of recursive simplest Merge – with simplicity of the generative procedure, not uniformity of its representational output, being the or at least an explanatory criterion (see below for further comment; and see Narita (2011)). 12 It should be noted that the elimination of word order from syntax did not happen all at once. For example, the head parameter made explicit reference to word order. The 1980’s also saw the notion of directionality of government (Travis 1984). In Chomsky 2007, with the advancement of the primacy of CI hypothesis, however, it is clearly suggested that order is out of syntax; (or that an optimal syntax is “out of order”!). It is part of externalization. And we have Chomsky’s revolutionary hypothesis, consistent with the primacy of CI, i.e. language is primarily ‘for thought’ and not ‘for communication;’ see Chomsky (2013). For important, and influential, work on the ‘removal’ of linear order from the syntax, see also Kayne (1994).

Merge, labeling and their interactions 



 23

What is the nature of a label in X-bar theory? Clearly, the mother is predetermined. Assuming binary branching, if α is non-maximal (i.e. a head or a non-­ maximal projection of a head), its mother will be the category of α. If α is maximal, its mother will be the category of α’s sister. Thus, with respect to the following tree representation (ignoring order) (6)

XP YP

Xʹ X

ZP

since X and X’ are non-maximal, each will itself project.13 YP is maximal and hence its mother is the category of YP’s sister (in this case X-bar). Projection from a head (i.e. endocentricity), and the syntactic representation of projection, are taken to be central concepts of X-bar theory, defining two core relations: Spechead and head-complement.14 Notice however that ∆ – the preordained landing site for movement of a maximal projection – as originally introduced in Aspects implicitly remains in the X-bar format. Under X-bar theory, the landing site of movement is often called “Spec”, but “Spec” is in effect a cover-term for ∆ as well. So, we could say that ∆ was still assumed for movement under X-bar theory, i.e. X-bar consistency was a constraint also imposed on transformationally derived structures in which projection is determined by the X-bar schemata: a moving category has no chance to project — the mother of the mover ‘landing in’ Spec is by definition not a projection of the mover.15 X-bar theory represented an important advance but raised a new set of questions: specifically, why is there projection at all, and why should it satisfy X-bar theory? Why does the mover never project (if that is in fact true)? Why are phrases endocentric (if they, in fact, all are)? And why are phrasal labels represented in the narrow syntax; are they in fact required syntax-internally? – and has there been a continued confusion between the label vs the category bearing the label, as discussed above, regarding (3)?

13 See Muysken 1982 on relational definitions of maximal and minimal categories. 14 As we will see below, the available relations change dramatically under Chomsky’s (1995a) “Bare Phrase Structure,” particularly the Spec-head relation, which under recent analyses based on simplest Merge, does not exist. 15 Note that in this discussion we do not consider head movement nor adjunction (neither of which is movement to Spec, and neither of which involves ∆), but see May 1985 for a theory of (segmental) adjunction seeking to render it X-bar consistent.

24 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

2.1.3 The initial transition from X-bar to Merge Early minimalism brought major shifts in the architecture of the computational system for human language and initiated changes in the mechanics of structure building. Chomsky (1993), for example, eliminates DS (and paved the way for the elimination of syntax-internal levels entirely, including SS, see also Chomsky 1986). Chomsky (1993) also saw the re-introduction of Generalized Transformation (GT), a structure building operation the output of which is required to be consistent with X-bar schemata by definition. In the new theory, there are two distinct kinds of applications of GT. Binary GT takes two separate syntactic objects and combines them into a single object. Binary GT is thus the ‘ancestor’ of what would become External Merge. Singulary GT is the precursor of its most immediate descendant, Internal Merge, where one of the objects being made a member of a newly created set is initially contained within the other. In effect, the elegantly constrained X-bar theory, together with its stipulated (or axiomatic) properties including its prohibition on mover projection, was taken to be a UG filter on both DS-level representations and on transformationally derived output representations, another form of unification (of DS and transformationally derived SS PS representations). 2.1.4xThe eMERGEnce of bare phrase structure While X-bar theory represented a very significant step in the continued quest for explanation, it was of course not exempt from explanatory scrutiny. Why should X-bar theory hold? Why do we find these particular relations (endocentricity, trinary projection, mover non-projection, head-complement, and Spec-head – the latter falling under the general definition of “government,” the cross-modular, unifying but quite complex, see Chomsky 1986, sole relation – as opposed to an infinite number of alternative phrase structure systems? Adhering to Minimalist method (see Chomsky (2007) “Approaching UG from Below”), we can ask: how “should” phrase structures be generated under minimalist assumptions? In “Bare Phrase Structure” (BPS), Chomsky (1995a: 396) provided an initial answer: “Given the numeration N, CHL may select an item from N (reducing its index) or perform some permitted operation on the structure it has already formed. One such operation is necessary on conceptual grounds alone: an operation that forms larger units out of those already constructed, call it Merge. Applied to two objects α and β, Merge forms the new object γ. What is γ? γ must be constituted somehow from the two items α and β; ... The simplest object constructed from α and β is the set {α, β}, so we take γ to be at least this set, where α and β are constituents of γ. Does that suffice? Output conditions dictate otherwise; thus verbal and nominal elements are interpreted differently at LF and behave differently in the phonological component ... γ must therefore at least (and we assume at most) be of the form {δ, {α, β}}, where δ identifies the relevant properties of γ, call δ the label of γ.” BPS p. 396.



Merge, labeling and their interactions 

 25

Merge was introduced as the central structure building operation of the narrow syntax (NS), necessary on conceptual grounds alone, and the simplest object γ constructed from α and β by Merge was taken to be the set {α, β}. However, as stated in the above excerpt, Chomsky (1995a) assumed that the set {α, β} was in fact descriptively inadequate; it was assumed that empirical adequacy demanded some departure from the simplest assumption (the standard scientific tension between explanation and ‘empirical coverage’); that is, the set must be labeled as in e.g. {δ, {α, β}}, where δ identifies the relevant properties of the entire set, i.e. such identification is required given output conditions imposed by the interfaces; thus, for example, {n, {α, β}} is identified as a nominal object, while {v, {α, β}} is identified as a verbal object. Interestingly, note that in the above passage from Chomsky (1995a) the argument for labels mentions only their necessity at the interfaces, and does not mention any reason for requiring them, as had always been assumed, in the NS.16 We return to the status of labels in NS, momentarily. Chomsky (1995a, b) did not discuss exactly how Merge operates to form the labeled sets {δ, {α, β}}, but he assumes that either α or β may project (in principle), but if the wrong choice is made, deviance results. Projection, then, is a defining suboperation of Merge, but it is ‘free,’ to project either the head of alpha or of beta, with the result subject to output conditions. Notice that Chomsky (1995a, b) eliminated both the ∆/Spec node of Standard Theory and X-bar theory. Projection, however, is still present; projection invariably applies by definition and is thus stipulated. Merge was defined as Merge (X, Y) → {Z, {X, Y}}, where Z is either the head H(X) of X or the head H(Y) of Y, and under this definition, it was guaranteed that the label (= projected node) is either H(X) or H(Y), again by definition. To sum up, (i) under ‘top down’ phrase structure grammar with ∆-substitution, a moving category has no chance to project by definition; a mover (in e.g. passive) arrives in a preordained NP position whose mother node S is also pre-determined, and categorically distinct from the mover (NP) (ii) under X-bar theory with Spec/∆ substitution, a moving category has no chance to project, again by definition, (iii) under GT with ∆ now internal to it and X-bar theory as an “everywhere” output constraint on GT application (Chomsky 1993), a moving category ‘still’ has

16 And, in fact, as argued in Seely (2006) since labels, as defined in Chomsky (1995a, b), are not syntactic objects/terms, they are inaccessible to syntactic operations and are thus “syntactically inert.” Labels in Chomsky (1995a, b) are not terms (as ‘term’ is defined there) and hence (informally speaking) ‘don’t exist.’ See Seely (2006) for detailed discussion.

26 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

no chance to project by definition, but (iv) under Merge (X, Y) → {Z, {X, Y}}, where Z is either H(X) or H(Y), either a hosting category a moving category can project.17 Though for the first time permitting mover-projection, it is important to note as does Seely (2006) that Merge appears at this stage to remain non-primitive in that it simultaneously creates two formal objects anew: (i) it creates a set {X, Y} that was not present in the input to Merge, and (ii) it creates a second set, {Z, {X, Y}} (where Z is identical to H(X) or H(Y); thus we would have {H(X), {X, Y} where H(X) is the label, or {H(Y), {X, Y}} where H(Y) is the label). Note that the second set, {Z, {X, Y}} expresses the relation ‘Z and {X, Y} are sisters’ (i.e. co-members of the set that contains them) representing the projection of Z as the label of {X, Y}. As we will see momentarily, although the label Z was, as just noted, assumed to be necessary, at the same time a definition of syntactic object/term was also necessary which had the specific intent of (correctly) excluding labels from the class of syntactic objects/terms. In effect, the theory implicitly hypothesized the necessary presence but concomitant invisibility of labels. 2.1.5 Toward Simplest Merge: the elimination of labels and projection from the theory of syntactic mental representation The strong minimalist thesis (SMT), presented by Chomsky (1993, 1995a, b) and elaborated by Chomsky (2000) and in subsequent work, takes the computational system for human language to be a “perfect system”, meeting the interface conditions in a way satisfying third factor principles.18 This is of course not an “assertion” but a hypothesis deemed worthy of exploration on a number of methodological grounds common to normal science.19

17 Certain previous analyses of mover-projection include projection of a moving (wh-phrase) maximal projection (Donati 2006) and the projection of a moving verb at LF (Epstein 1998). See also Hornstein and Uriagereka 2002. 18 See Epstein (2007) for discussion of the idea that the theory is an “I(nternalist)-functional” (or physiological) one in the sense that the rules apply only “in order to” satisfy the interfaces. See also Chomsky (2007, 2008, 2013, 2014) for discussion of the idea that operations freely apply as long as they conform to the laws of nature. For detailed discussion of these ideas, see EKS forthcoming. 19 See Chomsky (2014) regarding standard misguided criticism that “biological systems are ‘messy’ — so they cannot be perfect.”

Merge, labeling and their interactions 



 27

Under SMT, therefore, the combinatorial operation of the generative procedure assumes (by hypothesis) the simplest formulation in what comes to be called “simplest Merge”, a set-formation device that takes X and Y, and forms {X, Y}. (7) Merge (X, Y) = {X, Y}20 To the best of our knowledge, Collins (2002) was the first within the generative tradition to propose that labels be eliminated from the representation of syntactic objects and thus that the output of Merge (X, Y) is {X, Y} and not {Z, {X, Y}}. Taking Collins as his point of departure, Seely (2006) reanalyzes the matter derivationally, arguing that the rule simplest Merge (i.e. Merge (X, Y) creates {X, Y}) is motivated on minimalist grounds alone and simplest Merge entails the elimination of (any type of) projection-suboperation within Merge, thereby entailing the (Collinsonian) postulation of the absence of syntactically represented labels: “It is important to stress that, viewed derivationally, it is not labels and projection that are eliminated in and of themselves, RATHER WHAT IS ACTUALLY ELIMINATED ARE TWO SUBOPERATIONS OF THE “COMPLEX” OPERATION MERGE. It is a consequence of adopting the “simplest” version of Merge, namely, [Merge (x, y) = {x, y}], that there are no phrasal labels nor projections, i.e. it is a consequence of the simplification of Merge that phrases are represented as in [{x, y}], and not represented as in [{z, {x, y}}]. I’ll argue that this simplification of Merge is motivated on Minimalist grounds. The absence of labels is an immediate consequence of a well-motivated simplification of a fundamental, and arguably necessary, structure building (derivational) operation, namely Merge as in [Merge (x, y) = {x, y}]. In short, the question I am asking is: If indeed [{x, y}] is the “right” type of representation, what is the nature of the generative procedure from which the relevant properties of these representations could be deduced?” Seely 2006, p. 193.

Seely (2006) argues that if Merge creates the only relations, then since labels (as in Chomsky 1995a, b) are in fact not merged, they are in no relation to anything; i.e. Seely seeks to deduce their absence from independently motivated proposals. Summarizing to this point, we’ve traced the development of labels and projection, from their original postulation in Standard Theory PS, to their hypothesized elimination. What now of Chomsky’s most recent analyses?

20 More accurately, (7) should be understood as the simplest instantiation of the combinatorial operation under SMT; a number of properties of (7) follow from 3rd factors – e.g., the No Tampering Condition and the Inclusiveness Condition (see EKS forthcoming).

28 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

3 Label elimination in Chomsky’s (2013, 2014) labeling-by-minimal-search analysis This section reviews Chomsky’s labeling by minimal search analysis first tracing the basic ideas of the analysis (section 3.1), and then (section 3.2) briefly outlining the conceptual and empirical advantages of it, as presented in Chomsky 2013, 2014. This sets the stage for our review of our extensions of the analysis (in sections 4 and 5, which review EKS 2014 and EKS 2015 respectively).

3.1 What are labels for Chomsky (2013, 2014)? Chomsky (2013, 2014) develops an important new analysis of ‘labeling,’ and provides conceptual and empirical advantages of it. The basic, intuitive idea is that ‘labeling’ is nothing other than third-factor minimal search finding relevant object-identification information within the (representationally unlabeled) set that constitutes the output of (simplest) Merge. Label projection is not part of the definition of the structure building operation Merge, and labels are as a result, not explicitly generated, qua labels, in the output representation of the syntactic object that Merge produces. Just what are ‘labels’, then, and what role do they play in Chomsky’s analysis? First, for Chomsky (2013, 2014) Merge is maintained in its simplest form, as advocated by Seely (2006) (as just discussed) namely, Merge (X, Y) = {X, Y}. Consonant with minimalist methodology and in particular the strong minimalist thesis, Merge is deconstructed to only what is virtually conceptually necessary. Merge takes two (and only) two objects and puts them into the set {X, Y}, thereby creating the relation ‘member of’ for X and Y. As discussed, linear ordering is relegated to the phonology and as in Collins (2002) and Seely (2006) there are no syntactically encoded labels since there is no projection as a defining property of the operation Merge.21 What then are ‘labels’ under Chomsky’s system? This is an important question since, as we’ve seen above, labels have an empirical motivation in that they provide the information that categorically distinguishes one phrasal category from another, differentiating a VP from an

21 More specifically, there is no representationally dedicated symbol serving as the label, and only the label, of a syntactic object.



Merge, labeling and their interactions 

 29

NP, for example; and this identification is necessary, according to Chomsky, for proper interface interpretation.22 For Chomsky (2013, 2014), labeling is the process of finding the relevant information within the set, {X, Y}, which identifies the categorial status of the entire set generated by simplest Merge. Labeling is “just minimal search, presumably appropriating a third factor principle, as in Agree and other operations” (Chomsky 2013). Again, there are no labels resulting from the application of a projection-suboperation internal to Merge, representationally dedicated to labeling the set. Rather there is just minimal search. ‘Labeling’ then is simply the name given to the result of an independently motivated minimal search procedure, itself third-factor and hence not stipulated.23 Note further that Merge applies freely. Unlike earlier stages of minimalism, where operations applied ‘in order to’ satisfy output conditions, Merge is completely optional. It applies or it doesn’t, and if it does, it applies for no other reason than because ‘it can.’ Of course, Merge is subject to third-factor principles (i.e. laws of nature), but, these are not constraints built into Merge, rather Merge can’t help but conform to them. Thus, Merge (X, Y) leaves X, Y unchanged as a result of the third-factor No Tampering Condition (NTC), and by the (3rd factor)

22 Traditionally, labels were assumed to be required in the syntax; what were considered syntactic operations appealed to labels in the syntax. Consider, for example, that one and do so were argued to substitute for an N’ and a V’ respectively. Under current assumptions, it is hard to implement “one-substitution” or “do-so replacement” as transformational rules. So, by hypothesis such proforms are simply created by Merge (selecting lexical items). Then, the contrasts exhibited above come down to the question: what is wrong with e.g. merging “that” with “do so” (forming “I like this theory more than that one/*do so”) instead of “one” (forming “that one”) – presumably, a CI-interface problem, concerning the semantic interpretations of certain proforms. Labels were also required for C-selection and their absence in Chomsky’s recent analysis raises interesting questions about how empirical phenomena motivating C-selection are explained. Collins (2002), for instance, articulates the role of labels in various subsystems of GB theory and then attempts to derive their properties without labels (see also Seely 2006). See Hornstein and Nunes (2008) for additional important discussion. 23 A reviewer raises a series of important issues regarding Chomsky’s labeling by minimal search analysis that we too find somewhat unclear and that require further research. One questions has to do with the timing of labeling. We assume, with Chomsky (2013), that labeling (= minimal search) takes place as part of Transfer “Since the same labeling is required at CI and for the processes of externalization (though not at SM, which has no relevant structure), it must take place at the phase level, as part of the Transfer operation.” Another question has to do with why labels are required. Again, we follow Chomsky (2013; p. 43) in assuming that “For a syntactic object SO to be interpreted, some information is necessary about it: what kind of object is it?” The interfaces must be able to distinguish, say, ‘walk’ as an event vs. an ‘entity.’ As for exactly how this information is used by which interfaces, further research is required.

30 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

inclusiveness condition, “no new objects are added in the course of computation apart from arrangements of lexical properties” (Chomsky 1993, 1995b).24 To see how Chomsky’s new labeling analysis works, let’s consider two central cases. Suppose first that the syntactic object (SO) is {H, XP}, H a head and XP not a head. Then minimal search will select H as the label, allowing the object {H, XP} to be identified as ‘an H’ at the interfaces.25 It is interesting to note that all phrases of the form {H, XP} are endocentric, not in the sense of representationally projecting the head H (as in X-bar theory), but rather in the sense that properties of the head H serve as the identifiers of the entire object {H, XP}. As with X-bar theory, then, VP = {V, ...}, NP = {N, …}, etc. will have a ‘nucleus’ where the head of the phrase (in effect) matches the category label of the phrase. Within X-bar theory this followed by stipulation. For Chomsky’s labeling analysis, it follows naturally from 3rd factor minimal search and thus endocentricity relative to {H, XP} is deduced, and without the postulation of an X-bar level of projection.26

24 In early minimalism, all syntactic filters were eliminated. Only naturalized interface filters, called bare output conditions, survived. As for syntactic constraints, they were expected to reduce to principles of economy, now understood as third factor principles. But it didn‘t go that smooth. So, what we had in early minimalism were: operations (e.g. Merge, Move), third factor principles, and bare output conditions. But operations in early minimalism were very complex. If you look at the definition of Move, for example, it has various sub-clauses beginning with “only if...” In other words, all the syntactic constraints on movement were “stipulated” as part of the defining properties of Move. But in subsequent work, there was some success in reducing those defining properties of Move to the third factor principles, and we now have the simplest formulation of Merge for both Merge and Move. Under the framework of Chomsky (2013, 2014), what we have are: Merge (putting aside Agree), third factor principles (labeling by minimal search, NTC, inclusiveness), and bare output conditions. It is important to note that Chomsky (2013, 2014) adopts simplest Merge; the Merge-internal constraints of the form “Merge applies only if...” are all gone. In this system, “operations can be free, with the outcome evaluated at the phase level for transfer and interpretation at the interfaces” (Chomsky 2014). If this overview is correct, then the definition of Merge/Move has changed, but free application of Merge/Move has remained constant. The shift has taken place from freely applying complex Merge/Move to freely applying simplest (unified) Merge. For more detailed discussion, see EKS forthcoming. See also Boeckx 2010 and Ott 2010 for relevant discussion. 25 Minimal search ‘looks into’ the set {H, XP} and finds two objects (the members of the set), H and XP. Only H, a lexical item that bears linguistic features, qualifies as a label since XP, a set, does not directly bear linguistic features that could provide object identification information; in short, only an item that directly bears linguistic features can serve as a label. 26 As discussed in detail in EKS 2015, Rizzi’s (2014) analysis of ‘halting’ – discussed below – crucially relies on the postulation of an X-bar category, which is not compatible with Chomsky’s labeling by minimal search analysis, from which Rizzi (2014) seeks to deduce halting.



Merge, labeling and their interactions 

 31

By contrast, suppose SO is {XP, YP}, neither a head (recall PS rule (2) above, S → NP VP). Here minimal search is ambiguous; search finds the sets XP, YP, neither of which is a head; it then searches further, finding both the head X of XP and the head Y of YP.27 Overall, search does not find a unique element (that can provide the needed object-identification information). It is assumed that this ambiguity is intolerable; left as is (an option available under free Simplest Merge), Full Interpretation (FI) is violated at the interface levels. How can this XP-YP problem for labeling be solved? Chomsky (2013) suggests, and explores the empirical consequences of, two strategies: (A) modify SO so that there is only one visible head, and (B) X and Y are identical in a relevant respect, providing the same label, which can be taken as the label of the SO.28 These strategies, in turn, have important empirical consequences, to which we return below. To summarize so far, Chomsky (2013)’s analysis assumes that (i) Labels are required, but only at the interfaces. (ii) Labeling is just minimal search. (iii) There must be a single element that serves as the ‘identifier’ of a syntactic object, ambiguity of identification is not tolerated. For Chomsky, labels do not ‘exist’ in N.S. and hence can’t be referred to in NS.29, 30 In this system, minimal search identifies syntactic objects by looking at features/ properties of lexical items, so it needs neither postulation of labels (as a separate category) nor implementation of a label identification algorithm that is independent

27 We assume that only a head (i.e. a lexical item) can provide object-identification information, a set, like {XP, YP}, cannot. This asymmetry arguably follows from the fact that a head directly bears features, whereas a set does not. 28 The basic idea is that in {XP, YP}, minimal search equally finds the head X, Y; if these heads share some prominent feature, and specifically if the prominent feature(s) match via Agree (as in, say, phiagreement or Q-agreement), then that shared feature counts as the label of the set {XP, YP}. Thus, in {NP, {T, {NP, {v, VP}}}}, which is of the form {XP, YP}, minimal search finds the phi features shared by the two relevant heads N and T (in finite clauses) after Agree, and thus phi is taken as the label of the set. 29 More specifically, the labeling algorithm LA (i.e. minimal search) does not engage internal to the syntax and hence there are no ‘labels’ in NS. LA takes place at Transfer since the same labels are required by the CI and SM interfaces. As notes in Chomsky 2014, “since the same labeling is required at CI and for the processes of externalization (though not at SM, which has no relevant structure), it must take place at the phase level, as part of the Transfer operation.” 30 As we noted in footnote 22, given the absence of labels in NS, one or do-so substitution have two distinct problems: first if we have just (unified) Merge, then there is no substitution operation. If we add a substitution operation, arguably a departure from SMT, there may still be a problem if syntactic objects are not identified in the course of a derivation (e.g. nominal for one substitution or verbal for do-so substitution).

32 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

from minimal search. Interestingly, this eliminates what might be seen as the last vestige of ‘construction specificity.’ Internal to the syntax, there are syntactic objects, namely sets, but the label identification algorithm (i.e. minimal search) applies only at (the transition to) the interfaces. The interfaces need to know the identity of an object and it is minimal search that allows inspection of a set’s members to find information relevant to determining the identity of the set. Furthermore, the interfaces can’t tolerate ‘conflicting information’ concerning identity, information such as ‘this is simultaneously an N and a V,’ rather the interfaces require unique identification information.31 Chomsky’s recent work then explores certain empirical consequences of this labeling by minimal search analysis; for example, it explains obligatory exit from intermediate positions of A-bar movement.

3.2 Deriving obligatory exit in A-bar movement, Chomsky 2013 Chomsky’s (2013) labeling by minimal search analysis provides an interesting new account of obligatory exit from intermediate positions in successive cyclic A’-movement. Consider wh-movement as in (8), (where t stands for a full copy of a merged element): (8) [β In which Texas city did they think [α t [ C [TP the man was assassinated t ]]? Suppose that the wh-phrase in which Texas city (hereon, wh-PP) is internally merged to the Spec(ifier) of the embedded C and stays there. Then, the embedded clause α is of the form of {XP, YP}, where XP is the wh-PP and YP is {C, TP}, as in (9), which is intended to depict the first ‘half’ of the derivation of (8), and where C in (9) is (to be) selected by think, not wonder: (9) … (think) [α [XP in which Texas city] [YP C [TP the man was assassinated t ]] This is the {XP, YP} situation reviewed above. The heads X, Y are found by minimal search, resulting, potentially, in ambiguous object identification. Note that if Y (= C) bears no Q feature, then X, Y will not share any prominent agreeing feature, and object identification (i.e. label) failure will in fact result. Thus, if XP remains in this intermediate position, minimal search cannot find a label for α, since there

31 Note that this represents a shift from Chomsky 2007, 2008 where in the face of ambiguous information, either option could be chosen, any ‘bad’ results filtered by the interface. See Chomsky’s discussion of, eg, Donati on ‘who you like’ as in ‘who you like is irrelevant’ where D projects, or ‘I wonder who you like’ where CP projects.



Merge, labeling and their interactions 

 33

is no prominent feature (e.g. phi or Q), shared by X (the head of the wh-PP) and Y (the head of {C, TP}). What happens if the wh-PP raises to a higher position, as in (8)? In this case, the lower copy of the wh-PP is “invisible” inside α. Consequently, minimal search “sees” only C and TP, (= {H, XP}) which is therefore labeled C; i.e. minimal search can find a unique “visible” head, namely C as the label of α. Notice that the matrix clause β of (8) is also of the form of {XP, YP}, but there is an agreeing feature shared by X and Y, namely the Q feature of the wh-PP and the Q feature presumably borne by the interrogative-mood marked C of the matrix (“direct question”); hence, Q can be the label of β. We see, then, that Chomsky’s analysis nicely accounts for the “obligatory exit” from an intermediate position of wh-movement. Importantly, exit is not in fact obligatory, but rather, the wh-PP is free to remain in the intermediate position, but doing so results in labeling failure (FI violation) at the interfaces. Chomsky’s (2013) analysis makes no appeal to a mismatch of features between think, which selects a [–Q] C, and the [+Q] wh-PP occupying the Spec of this [–Q] C (as in traditional analyses); no appeal is made to an explicit Spec-head relation, defined via “m-command” or ”government” or the notion “maximal sister to an X-bar projection” in CP, nor to any of the technical devices that have been non-explanatorily invoked like co-superscripting of Spec and head in CP or to an S-structure level of representation (as in important prior analyses such as Rizzi’s (1997) wh-criterion or Lasnik and Saito’s (1984, 1992) S-structure condition which blocks a [+Q] wh-phrase from occupying the Spec of a [–Q] C in English, at SS). Nor is EPP (obligatory Spec-T or Spec-C) appealed to. No such non-explanatory descriptive technicalia are invoked, nor is the central principle appealed to (minimal search) specifically linguistic, but is rather attributed to third factor.

4 Extensions of the labeling by minimal search analysis: EKS 2014, obligatory exit in A-movement EKS (2014) argues that Chomsky’s labeling by minimal search analysis not only accounts for obligatory exit from intermediate position in successive cyclic A-bar movement, but that it also provides an elegant account of obligatory exit from intermediate position in A-movement. The central ideas are as follows. Consider a typical instance of successive cyclic A-movement, as in (10) (10) a. a man is likely [TP t to be t in the room] b. *there is likely [TP a man to be t in the room]

34 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

Assume that in (10a) the DP a man has moved through the intermediate A-position, Spec of the embedded TP, on its way to the matrix subject position. Such movement is clearly acceptable (analogous to wh-movement in (8), In which Texas city did they think [α t [ C [TP the man was assassinated]]]?). But, what happens if this DP moves to the intermediate position and then stays there, as in (10b), the analog of (9)? Here we would have an {XP, YP} set, namely {DP, TP}, with no chance of finding a label. Specifically, we would have (11) … likely {α {the, man}, {T, vP}} Since T is infinitival in (11), it will not bear the phi features necessary for the shared prominent agreeing (phi) feature option of labeling the {XP, YP} structure of α; it is not the case that the ‘head’ of {the, man} (X of XP) and infinitival to (Y of YP) bear phi features.32 Thus α will not have a label and a violation of Full Interpretation (FI) at the interfaces will result. As discussed in EKS 2014, Chomsky’s labeling by minimal search analysis naturally extends to these A-movement cases, which were the central motivation for Chomsky’s 2000 Merge-over-Move analysis, including the postulate ‘phase.’ (as well as lexical arrays and subarrays). Merge-over-Move is no longer statable as PS generation and transformational rule application have been unified under simplest Merge. If this labeling analysis of (10) is viable, it suggests the possibility of eliminating the concept phase, at least to the extent that it was based on the analysis of such examples (the conceptual motivation for phases, locality and ‘chunking’ in general, remain). See EKS (2014) for a detailed review of the history of and motivation for Chomsky’s (2000) phase based Merge-over-Move analysis of cases like (10).

32 We assume that in XP (= ‘the man’), the element that counts as X bears phi features. There are open questions about the technical details of this assumption. If the is the head of the DP {the, man}, then this D must bear phi (at the point of minimal search); but it’s the N that is assumed to inherently bear phi (see Carstens (2010) for important discussion); hence there could be N to D raising to get phi features on D. See Chomsky 2008 for relevant discussion, pp. 25–26. In terms of Chomsky’s 2014, assumptions, in case of {the, man}, there must be at least three items involved: D, n, and R, where the lexicon contains R, and nominal and verbal are determined in NS. If we interpret Chomsky's (2007) proposals in terms of Chomsky’s (2014) assumptions, then, Merge constructs {n, {D, R}}, D inherits unvalued phi from n, R moves to Spec-D, and finally D moves to n. As a result, we have {, {R, {D, R}}. This will give the right order “the man,” and the label contains the valued phi features.



Merge, labeling and their interactions 

 35

5 EKS on Criterial Freezing, the ‘halting problem’ Having just discussed ‘obligatory exit’ phenomena, we now discuss ‘obligatory halting’ phenomena, cases where a DP has moved to a position with respect to which all of the DP’s features are checked. With wh-movement, when such a position is reached, the DP is ‘frozen’ in that position, further movement results in ungrammaticality. Just as we have argued above regarding ‘obligatory exit,’ we are in fact assuming Merge is freely applied, and if it fails to apply in ‘obligatory exit’ cases, as just argued for A movement, then FI is violated at CI due to labelessness. So ‘obligatory exit’ is somewhat of a descriptive misnomer. In this section we will use the term “Obligatory Halt” or “Freezing” in the same way – i.e. we are in fact assuming that Merge application is free, and we seek to explain the anomaly resulting from Merge application to a frozen position, as an interface anomaly. Thus Merge is in fact free to apply or not, and we seek to reanalyze what have been assumed to be constraints on the application of movement as freely applied (unconstrained) movement which sometimes yields interface anomaly(ies), as when move is applied to a category occupying a so-called frozen position. To begin, consider (12) (12) a. You wonder [α [XP which dog] [YP CQ [TP John likes t ]]]. b. * Which dog do you wonder [α t [ CQ [TP John likes t ]]]? Historically, such cases have been accounted for with constraints on movement itself. That is, movement out of a criterial/frozen position is prohibited by a constraint barring the syntactic application of movement For example, Epstein (1992) derives such freezing as an arguably deducible (“last resort”) effect of Chomsky’s Strong Minimalist Thesis (SMT) encapsulated as “computationally efficient satisfaction of bare output conditions.” In short, if there is no need to move, then you can’t move, and if in the syntax we have ‘already’ generated what would (or will) be a legitimate LF representation, then syntactic movement is barred, by economy of derivation. More recently, Rizzi (2014) attempts to explain such freezing phenomena also as resulting from the inapplicability of syntactic movement to certain syntactic objects, by appeal to a particular (re-)formulation of the independently motivated hypothesis that Xʹ projections are invisible for movement, “the so-called Xʹ Invisibility hypothesis”. Once a phrase moves to a criterial position, it is argued that given certain modifications of Chomsky’s (2013) labeling by minimal search analysis, movement halting can be explained since a phrase moved to a criterial position becomes “an Xʹ projection”, hence invisible to syntactic movement. Thus, Criterial freezing is explained by appeal to a constraint on syntactic rule application.

36 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

The core of Rizzi’s (2014) analysis can be illustrated as follows.33 Consider (13), where the wh-phrase which dog has raised to a ‘criterial position’ (namely, Spec of CP headed by a complementizer CQ bearing a Q-feature): (13) [Q [Q which dog] [Q CQ [TP John likes t ]]] For Rizzi, the result of such movement is the creation of an X-bar, namely the Q-bar “which dog,” which is then frozen by X-bar invisibility. That is, (13) represents an {XP, YP} configuration of the form {{which dog} {C, TP}}. The entire structure in (13) is labeled by Q-agreement (i.e., the head of [which dog] Q and the head of [CQ TP] each bear Q). Thus the entire representation in (13) is labeled Q as indicated. Importantly this makes “which dog” a Q-bar ‘intermediate between the head (= Qmin) and the entire structure (13) (= Qmax). Since “which dog” becomes an X-bar in the course of the derivation, then by X-bar invisibility, it cannot move further; “only maximal objects with a given label can be moved” (Rizzi 2014). Rizzi’s analysis thus seeks to (elegantly) deduce freezing from Chomsky’s {XP, YP} label-identification algorithm coupled with the independently proposed X-bar Invisibility hypothesis. As EKS (2015) point out, however, there may be a number of conceptual and empirical disadvantages of Rizzi’s analysis. For one thing, it requires obligatory and immediate labeling in the syntax; in the example above, for instance, further movement of which dog is blocked only if we ‘know’ in the syntax that which dog is obligatorily and immediately an X-bar and hence can’t be moved (given X-bar invisibility). But such obligatory and immediate labeling is in fact inconsistent with Chomsky 2013, 2014, which allows complete non-labeling in NS. Note further that Rizzi’s analysis has X-bar invisibility as a constraint on Move, but not on Merge, which runs contrary to the attempted unification of Move/Merge. There is also a potential empirical problem with this analysis as it says nothing about a case like (14) * I wonder John likes this dog In the interface-based reanalysis of “freezing” (under what is in fact freely applied Merge) we present below, such cases and freezing are unified, in a way they cannot be under freezing-specific constraints on syntactic movement. For EKS (2015), following Chomsky 2013, 2014, movement is completely free. That is,

33 For detailed discussion of Rizzi’s (2014) analysis, and of potential conceptual and empirical disadvantages of it, see EKS 2015.



Merge, labeling and their interactions 

 37

the wh-phrase in a case like (13) can in fact syntactically exit (via application of Merge) a criterial position. However, it’s argued that if it does so either label failure or an interpretive problem at the interface will result. In short, EKS seeks to eliminate a syntactic constraint (X-bar invisibility, which they argue is not formulable in a way consistent with current theory) and reassign its empirical effects to independently motivated interpretive constraints. Recall from our review of Chomsky that simplest Merge applies freely, subject only to third factor. Merge is an operation that constructs larger units out of those already constructed, and simplest Merge is a third-factor compliant instantiation of Merge. Thus, relative to (12a), repeated in (15a) here (15) a. You wonder [α [XP which dog] [YP CQ [TP John likes t ]]].34 b. *Which dog do you wonder [α t [ CQ [TP John likes t ]]]? nothing prohibits the (bottom-up) applications of Merge that would produce (12b), repeated in (15b). In informal terms, the wh-phrase is free to move from the intermediate Spec of CP to the higher Spec of CP position. Such Merge does not run afoul of any 3rd factor principle (like NTC nor Inclusiveness). In fact, any constraint on the application of Merge that is not a 3rd factor constraint would represent a departure from the SMT, and hence would require substantial empirical support. As Chomsky (1998) states, “one plausible element of optimal design is that here are no constraints on application of operations.” For EKS, freezing of the sort in (12)/(15) is not the result of a constraint on the application of Merge. Rather, EKS argues that the contrast results from independently motivated morpho-phonological and CI requirements for properly interpreting clauses, specifically clauses whose labels are identified as either the interrogative complementizer CQ (yes/no questions) or the Q-feature shared by the two heads CQ and WHQ (wh-questions) – i.e. the shared prominent (Q) feature option of Chomsky’s labeling analysis. EKS argue that ‘obligatory syntactic halt’ in wh criterial position is the only way to satisfy these requirements. In short, wh-movement from wh criterial position (freely applied simplest Merge) is allowed to apply in NS, but if it does, independently motivated morpho-phonological and/ or CI requirements are violated.

34 Note that subscripts (XP, YP) are used here only for ease of reference, indicating the {XP, YP} situation for labeling.

38 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

As pointed out above, EKS adopts the assumption that every syntactic object must be labeled at CI (Chomsky 2013).35 EKS then proposed the following (minimum) assumptions concerning CQ: (i) there is only one CQ in the (English) lexicon, appearing in both yes/no- and wh-interrogatives (Chomsky 1995b), (ii) a CP with the label CQ, unaccompanied by a “wh-specifier,” is interpreted as a yes/ no-question at CI, while (iii) a CP with the label Q, when Q is shared by the two heads CQ and WHQ (the latter being the head-feature of a wh-phrase in “Spec-C”), is interpreted as a wh-question at CI (Chomsky 2013).36 Thus, in a typical yes/no-question structure, such as (16) (16) [α CQ [TP John likes this dog]] the label of the CP will be CQ by minimal search. Thus, the CP will be interpreted as a yes/no-question. However, as a language particular property of English, it is assumed that in order to actually be interpreted as a (direct) yes/no matrix interrogative, either T-to-C inversion or rising (question) sentential prosody is required.37 Thus, (16) will have to ‘surface’ as Does John like this dog? or John likes this DOG? (with question intonation).38 Again, this morpho-phonological requirement is a language particular property of English. Now, consider the following case in which (16) is embedded: (17) * You wonder [α CQ [TP John likes this dog]].

35 In Chomsky 2013, 2014, labeling cannot be required in NS, e.g. merger of T and vP must be allowed, yet vP (= {XP, YP}) has no label. By contrast in order to affect syntactic freezing under X-bar invisibility, Rizzi must obligatorily generate labels, and projection types, including D-bar, which to some extent, presupposes obligatory and immediate labeling in NS. 36 Why would CQ appear in both yes/no-questions and wh-questions? This might be explained under analyses of wh-questions in which they are interpreted as a family of yes/no-questions. So, for example, what did you buy is interpreted as something like: “Did you buy a car? Answer me yes or no; Did you buy a pen? Answer me yes or no,” etc. We are indebted to Ezra Keshet for this idea and for valuable discussion of issues relevant here. 37 Presumably, one or the other is needed as an overt indicator of the otherwise undetectable presence of CQ, as Chomsky (personal communication) notes. 38 A reviewer points out that there may be a difference between the syntax of yes-no questions with subject-aux inversion vs. the syntax of yes-no questions with rising intonation, noting “that the latter does not license NPIs, unlike the former: Does John have any money? vs. *John has any money? (rising intonation).” Thus, “it might not be the case that both kinds of yes-no questions have a +q C.” We leave this interesting issue open here.



Merge, labeling and their interactions 

 39

In (17), there is a C Q unaccompanied by a “wh-specifier.” α is then labeled C Q and hence interpreted as a yes/no-question at CI. But in (17) there is a morpho-phonological problem with this state of affairs: in embedded clauses in English, T-to-C is simply unavailable as is rising intonation in English. Thus, α in (17) though required to be interpreted as yes/no-question in fact cannot be interpreted as a yes/no-question. That is, with Chomsky (2014), we assume that when embedded, a yes/no-question, interpreted in concert with the structure above it, yields a composed representation that is “gibberish, crashing at CI” (Chomsky 2014, see also Chomsky 1995b). Leaving aside whether it is “crashing,” i.e. some yet-to-be-proposed unvalued feature appears at an interface, one possibility regarding its status as gibberish is as follows: the CP headed by CQ is itself interpreted as a yes/no-question and so would be interpreted as: “Answer me this: Does John like this dog?” that is, a performative request made of the speaker’s interlocutor, for a specific kind of information. As such, embedding it, as in “I wonder John left” yields an interpretation like: “I wonder ‘Answer me this, Did John leave?” This is anomalous to the extent that one cannot wonder a request for information. Given this analysis, (16) violates only the English morpho-phonological requirement (if neither T to C raising nor rising intonation is applied) while (17) violates the morphophonological requirement and is gibberish at CI. As EKS (2015) points out, this morpho-phonological, CI analysis of (16) and (17) naturally extends to the classic criterial freezing cases considered above and repeated here:39 (18) a.   You wonder [α [which dog] [ CQ [TP John likes t ]]]. b. *Which dog do you wonder [α t [ CQ [TP John likes t ]]]? The converse, however, does not hold – that is, analyses of freezing cases like (18b), including Rizzi’s (2014) do not extend to (16), which lacks a whphrase of any kind thereby exempting it from a freezing analysis, entailing that (18b) and (17) cannot be unified. Under the labeling analysis of Chomsky (2013), in (18a), at CI, the label of α is the Q-feature, shared by the two heads, namely CQ and the operator WHQ, and this label Q, accompanied by a “wh-specifier,” is interpreted as a wh-question (an indirect one in (18a)) at CI.

39 Epstein (1992) and Rizzi (2014) discusses other cases of “freezing”, beyond the core case examined here. Determining the predictive content of the analysis proposed here, regarding all such freezing cases requires further research.

40 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

In (18b), however, minimal search fails to identify the Q-feature (shared by the two heads CQ and WHQ) as the label of α, because the operator WHQ (= t) in α is “invisible” to minimal search. That is, Chomsky (2013) takes WHQ to be inside α if and only if every occurrence of WHQ is a term of α. Thus, after wh-movement into the matrix clause, the copy of WHQ in α is “invisible” to minimal search when it searches α for its label-identification (see EKS 2012 for further empirical support of this analysis). Therefore, the analysis proposed here asserts that the embedded clause α in (18b) cannot be interpreted as a wh-question, because which dog in the “specifier” of the embedded CQ is “invisible” to minimal search. It instead predicts that the label of α is the CQ (recall α appears to minimal search as [CQ TP]), and although selection is thereby satisfied, as wonder does select CQ, α cannot be interpreted as a wh-question. So what interpretation does (18b) receive? EKS (2015) argues that α in (18b) receives a yes/no-question interpretation. Recall that a CP with the label CQ, unaccompanied by a “wh-specifier,” is interpreted as a yes/no-question at CI. The hypothesized problems with (18b) are then that T-to-C is unavailable as is rising intonation in English embedded clauses, and when embedded, the larger construction resulting from the embedding, containing a yes/no-question as a term, is gibberish (and perhaps crashing) at CI. In short, semantic anomaly at CI results from interpreting an SO that in part means “wonder a (performative) request” (by contrast, of course, interpreting a structure that denotes “wondering if a proposition is true or false” is semantically nonanomalous. Summarizing, we made the following assumptions concerning English CQ: (i) There is only one CQ in the (English) lexicon, appearing in both yes/no- and wh-interrogatives. (ii) Every syntactic object (SO) must be labeled at CI. (iii) An SO, the label of which is identified as the head CQ, unaccompanied by a “wh-specifier,” is interpreted as a yes/no-question. (iv) An SO, the label of which is identified as the Q-feature, shared by the two heads CQ and WHQ, is interpreted as a wh-question. (v) English yes/no-questions require T-to-C inversion or rising (question) sentential prosody, available only in matrix clauses, and when embedded, the resulting structure cannot be felicitously interpreted; such structures are gibberish (and perhaps crash) at CI. (i)–(v) are all independently motivated, and to explain both apparent “obligatory syntactic halt” in wh criterial position, and cases like “I wonder John left” nothing more seems to be needed. We argued that there is no need to invoke an NS-specific



Merge, labeling and their interactions 

 41

halting constraint; the “halting” effect, observed in (18b), naturally follows from the independently needed morpho-phonological, CI analysis.40, 41

40 In Japanese, unlike English, raising from wh criterial position appears to be permissible. Consider (i) (from Takahashi 1993): (i) Nani-o Taroo-wa [Hanako-ga t katta ka] siritagatteiru no what-ACC Taroo-TOP Hanako-NOM bought Q want-to-know Q ‘What does Taroo want to know whether Hanako bought?’ Given that (i) converges and is interpretable at CI, we suggest that the interrogative complementizer CQ and the counterpart of “whether” are homophonous in Japanese; they are pronounced as ka. Thus, in (i), ka is not an interrogative complementizer CQ; rather, it is the Japanese counterpart of “whether” as the translation indicates. 41 The labeling analysis, developed here, sheds new light on partial wh-movement. Consider the following German data (from Sabel 2000): (i) a. [β Was [ CQ meinst du [α wen [ C Peter Hans t vorgestellt hat ]]]? WH think younom whoacc Pnom Hdat introduced has ‘Who do you think Peter has introduced to Hans?’ b. [β Was [ CQ meinst du [α wem [ C Peter t die Leute vorgestellt hat ]]]? WH think younom whodat Pnom the peopleacc introduced has ‘To whom do you think Peter has introduced the people?’ It is generally assumed that was is not a wh-phrase; it is a wh-expletive that functions as a scope marker; and the wh-phrase wen/wem “whoacc/whodat” is interpreted at the matrix CP, thanks to this wh-expletive, even though the wh-phrase is located in the embedded CP. From the labeling point of view, however, if the wh-phrase headed by the WHQ remained in α and appeared there at CI, a labeling failure would result, contrary to fact. So, what is going on? One possibility is that even though the WHQ (or the phrase containing it) can remain, violating FI at CI, in (ia, b) the WHQ (or the phrase containing it) can choose an option of moving out, allowing α to be labeled. Pursuing this possibility, what is left behind by such movement may in fact be only the pronominal material of the wh-phrase, including phi and Case; it is no longer the wh-phrase headed by the WHQ. One possible implementation of this might be to apply Obata and Epstein’s (2011) “Feature splitting Internal Merge” hypothesis. In this regard, Dutch provides an interesting case. Instead of wie ”who”, the pronominal element die can appear in α, as in (ii) (from Boef 2013): (ii) a. Ze vroeg wie jij denkt [α wie het gedaan heeft] she asked who you think who it done has ‘She asked who you think has done it.’ b. Ze vroeg wie jij denkt [α die het gedaan heeft] she asked who you think DEM it done has ‘She asked who you think has done it.’ If the structure of this A’ pronoun is analyzed as “WHQ + pronominal material,” then the WHQ (or a phrase containing it) moves out of α to form the label Q of the matrix clause, leaving its pronominal content behind, and such non-Wh, Non-Q pronominal content gets pronounced as die in Dutch, leaving the door open for a way to circumvent a labeling failure in the embedded clause. See Obata 2016 for detailed discussion.

42 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

6 Summary In this paper we have reviewed a number of recent papers, EKS 2014, EKS (2015) and (to appear), tracing first the history of labels from PS rules of Standard Theory to Chomsky’s recent labeling by minimal search analysis. Labels have gone from being stipulated (and thus non-explanatory) constructs of PS rules to being nothing other than the result of third-factor minimal search. Chomsky’s recent labeling by minimal search analysis was explored and the extensions of it proposed in EKS 2014 and EKS (2015) were presented. We’ve seen that Chomsky’s analysis accounts for ‘obligatory exit’ in successive cyclic A-bar and in A movement. In both instances, the mover continues out of an intermediate position to avoid label failure. We’ve also seen that ‘obligatory halt,’ i.e. freezing, can be accounted for, by interface problems – once in a ‘criterial position’, further movement would induce e.g. CI anomaly (and hence ‘gibberish’). Most importantly, all of these positive empirical results are obtained appealing only to simplest Merge operating within ‘natural  laws’ (3rd  factor principles such as minimal search, NTC, and Inclusiveness). Acknowledgments: We thank the volume editors and conference organizers, Leah Bauke, Andreas Blümel, and Erich Groat for helpful comments. We are indebted to Noam Chomsky, Chris Collins, and Ezra Keshet for valuable discussion. We also thank the audience of the Labels and Roots Workshop at Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, Philipps-Universität Marburg, Germany 2014, and particularly, Stefanie Bode, Hagit Borer, Dennis Ott, and Marc Richards. Finally, many thanks to an anonymous reviewer for insightful comments and suggestions that helped improve the final version.

References Boeckx, Cedric. 2010. A tale of two minimalisms: Reflections on the plausibility of crash-proof syntax, and its free-merge alternative. In Michael T. Putnam (ed.), Exploring crash-proof grammars, 89–104. Amsterdam: John Benjamins. Boeckx, Cedric. 2011. Approaching parameters from below. In Cedric Boeckx and Anna-Maria Di Sciullo (eds.), The biolinguistic enterprise: New perspectives on the evolution and nature of the human language faculty. Oxford: Oxford University Press. Boef, Eefje. 2013. Partial wh-movement revisited: A microcomparative perpsective. Paper presented at Toward a theory of syntactic variation, Bilbao, Basque Country, June 5–7, 2013. Brody, Michael. 1995. Lexico-logical form: A radically minimalist theory (Linguistic Inquiry Monograph Twenty-Seven). Cambridge, MA: The MIT Press.



Merge, labeling and their interactions 

 43

Carstens, Vicki, 2010. Implications of grammatical gender for the theory of uninterpretable features. In M. Putnam (ed.), Exploring crash-proof grammars, 31–57. Amsterdam: John Benjamins. Chametzky, Robert. 2000. Phrase structure: From GB to minimalism. Oxford: Wiley-Blackwell. Chomsky, Noam. 1957. Syntactic structures. Berlin: Mouton. Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: The MIT Press. Chomsky, Noam. 1970. Remarks on nominalization. In Roderick Jacobs and Peter Rosenbaum (eds.), Readings in English transformational grammar, 184–221. Waltham, MA: Blaisdell. Chomsky, Noam. 1981. Lectures on government and binding. (Studies in Generative Grammar 9) Berlin: Mouton de Gruyter. Chomsky, Noam. 1991. Some notes on economy of derivation and representation. In Robert Freidin (ed.), Principles and parameters in comparative grammar (Current Studies in Linguistics 20), 417–454. Cambridge, MA and London: The MIT Press. Chomsky, Noam. 1993. A minimalist program for linguistic theory. In Kenneth Hale and Samuel Jay Keyser (eds.), The view from building 20: Essays in linguistics in honor of Sylvain Bromberger, 1–52. Cambridge, MA: The MIT Press. Chomsky, Noam. 1995a. Bare phrase structure. In Gert Webelhuth (ed.), Government and binding theory and the minimalist program: Principles and parameters in syntactic theory, 385–439. Oxford: Blackwell. Chomsky, Noam. 1995b. The minimalist program. Cambridge, MA: The MIT Press. Chomsky, N. 1998. Some observations on economy in generative grammar. In P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis, and D. Pesetsky (eds.), Is the best good enough? 115–127. Cambridge, MA: The MIT Press. Chomsky, Noam. 2000. Minimalist inquiries: The framework. In Roger Martin, David Michaels, and Juan Uriagereka (eds.), Step by step: Essays on minimalist syntax in honor of Howard Lasnik, 89–155. Cambridge, MA: The MIT Press. Chomsky, Noam. 2007. Approaching UG from below. In Uli Sauerland and Hans-Martin Gärtner (eds.), Interfaces + recursion = language? 1–29. Berlin: Mouton de Gruyter. Chomsky, Noam. 2008. On phases. In Robert Freidin, Carlos P. Otero and Maria Luisa Zubizarreta (eds.), Foundational issues in linguistic theory: Essays in Honor of Jean-Roger Vergnaud, 133–166. Cambridge, MA: The MIT Press. Chomsky, Noam. 2013. Problems of projection. Lingua 130: 33–49. Chomsky, Noam. 2014. Problems of projection: extensions. Unpublished manuscript, MIT. Published in 2015 Structures, strategies and beyond: Studies in honour of Adriana Belletti. Elisa Di Domenico, Cornelia Hamann and Simona Matteini (eds.) Linguistics Today 223 Amsterdam John Benjamins. Collins, Chris. 2002. Eliminating labels. In Samuel D. Epstein and T. Daniel Seely (eds.), Derivation and explanation in the minimalist program, 42–64. Oxford: Blackwell. Collins, Chris. (this volume). Merge(X,Y) = {X,Y} Donati, Caterina. 2006. On wh-head movement. In L.L.-S Cheng, N. Corver (eds.), Wh-movement: Moving on, 21–46. Cambridge, MA: The MIT Press. Epstein, Samuel D. 1992. Derivational constraints on A’-chain formation. Linguistic Inquiry 23: 235–259. Epstein, Samuel D. 1998. Overt scope marking and covert verb second. Linguistic Inquiry 29: 181–229. Epstein, Samuel D. 2007. On i(internalist)-functional explanation in minimalism. Linguistic Analysis 33: 20–53.

44 

 Samuel David Epstein, Hisatsugu Kitahara and T. Daniel Seely

Epstein, Samuel D., E. Groat, R. Kawashima, and H. Kitahara. 1998. A derivational approach to syntactic relations. Oxford: Oxford University Press. Epstein, Samuel D., H. Kitahara, M. Obata and T. D. Seely. 2013. Economy of derivation and representation. In M. Den Dikken (ed.), The Cambridge handbook of generative syntax. Cambridge: Cambridge University Press. Epstein, Samuel D., H. Kitahara, and T. D. Seely. 2012. Structure building that can’t be! In Myriam Uribe-Etxebarria and Vidal Valmala (eds.), Ways of structure building, 253–270. Oxford: Oxford University Press. Epstein, Samuel D., H. Kitahara and T. D. Seely. 2014. Labeling by minimal search: Implications for successive cyclic A-movement and the elimination of the postulate “phase”. Linguistic Inquiry 45, 463–481. Epstein, Samuel D., H. Kitahara, and T. D. Seely. 2015. *What do we wonder is not syntactic? In Explorations in maximizing syntactic minimization Routledge Leading Linguists volume #22, Carlos Otero series ed.pp 222–240. Epstein, Samuel D., H. Kitahara and T. D. Seely. (2016). From Aspects ‘daughterless mothers’ (aka delta nodes) to POP’s ‘motherless’-sets (aka non-projection): A selective history of the evolution of Simplest Merge. In 50 Years Later: Reflections on Chomsky’s Aspects, MIT Working Papers in Linguistics 77, Angel J. Gallego and Dennis Ott, eds. pp 99–112. Epstein, Samuel D., H. Kitahara, and T. D. Seely. forthcoming. Is the faculty of language a “perfect solution” to the interface conditions? In The cambridge companion to Chomsky, second edition, James McGilvray, ed. Cambridge university press. Epstein, Samuel D. and T. D. Seely. 2006. Derivations in minimalism. Cambridge: Cambridge University Press. Hornstein, N. and J. Nunes. 2008. Adjunction, labeling, and bare phrase structure. Biolinguistics 2:57–86. Hornstein & Uriagereka. 2002. Reprojections. In Samuel D. Epstein, and T. D. Seely (eds.), Derivation and explanation in the minimalist program. Oxford: Blackwell. Jackendoff, Ray. 1977. X-bar syntax: A study of phrase structure. Cambridge, MA: The MIT Press. Kayne, Richard. 1994. The antisymmetry of syntax. Cambridge, MA: The MIT Press. Kitahara, Hisatsugu. 1997. Elementary operations and optimal derivations. Cambridge, MA: The MIT Press. Lasnik, Howard and T. Lohndal. 2013. Brief overview of the history of generative syntax. In M. den Dikken (ed.), The Cambridge handbook of generative syntax. Cambridge: Cambridge University Press. Lasnik, Howard and Mamoru Saito. 1984. On the nature of proper government. Linguistic Inquiry 15: 235–289. Lasnik, Howard and Mamoru Saito. 1992. Move α: conditions on its application and output. Cambridge, MA: The MIT Press. May, Robert .1985. Logical form: Its structure and derivation. Cambridge, MA: The MIT Press. Muysken, P. 1982. Parametrizing the notion ‘Head’. Journal of Linguistic Research 2: 57–75. Narita, Hiroki. 2011. Phasing in full interpretation. Harvard University PhD Thesis. Obata, Miki. 2016. Unlabeled syntactic objects and their interpretation at the interfaces. In Christopher Hammerly and Brandon Prickett (eds.), Proceedings of the 46th Annual Meeting of the North East Linguistic Society, volume 3, 63–70. Amherst, MA: University of Massachusetts Graduate Linguistics Student Association.



Merge, labeling and their interactions 

 45

Obata, M., M. Baptista and S. D. Epstein. 2015. Can crosslinguistically variant grammars be formally identical? Third factor underspecification and the possible elimination of parameters of UG. Lingua 156, 1–16. Obata, M. and Epstein, S. 2011. Feature-splitting internal merge: Improper movement, intervention, and the A/A′ distinction. Syntax 14: 122–147. Ott, Dennis. 2010. Grammaticality, interfaces, and UG. In Michael T. Putnam (ed.), Exploring crash-proof grammars, 89–104. Amsterdam: John Benjamins. Rizzi, Luigi. 1997. The fine structure of the left periphery. In Lilliane Haegeman (ed.), Elements of grammar: Handbook of generative syntax, 281–337. Dordrecht: Kluwer. Rizzi, Luigi. 2014. Cartography, criteria and labeling. University of Geneva, unpublished manuscript. Richards, Marc. 2008. Two kinds of variation in a minimalist system. In Fabian Heck, Gereon Müller & Jochen Trommer (eds.), Varieties of competition, 133–162. Linguistische Arbeits Berichte 87. Universität Leipzig. Sabel, Joachim. 2000. Partial wh-movement and the typology of wh-questions. In Uli Lutz, Gereon Müller, and Arnim von Stechow (eds.), Wh-scope marking, 409–446. Amsterdam: John Benjamins. Seely, T. Daniel. 2006. Merge, derivational c-command, and subcategorization in a label-free syntax. In Cedric Boeckx (ed.), Minimalist essays, 182–217. Amsterdam: John Benjamins. Stowell, Tim. 1981. Origins of phrase structure. MIT dissertation. Takahashi, Daiko. 1993. Movement of wh-phrases in Japanese. Natural Language and Linguistic Theory 11: 655–678. Travis, Lisa. 1984. Parameters and effects of word order variation. MIT dissertation. Uriagereka, J. 1999. Multiple spell-out. In S. Epstein and N. Hornstein (eds.), Working minimalism. Cambridge, MA: The MIT Press.

Chris Collins

Merge(X,Y) = {X,Y} Abstract: This paper explores the history, properties and implications of the syntactic operation Merge(X,Y) = {X,Y}.

1 Unbundling The history of generative syntax is characterized by the gradual unbundling of syntactic transformations into different components (operations, conditions, phrase structure rules, rules of interpretation). These components interact in complex ways to yield the observable syntactic data. A good example of such unbundling can be found in Chomsky 1977: 110 where it was argued that “….we can eliminate from the grammar rules of comparative deletion, topicalization, clefting, object-deletion and ‘tough movement,’ rules for adjective and adjective-qualifier complements, and others, in favor of the general rule of wh-movement…”. The general rule of wh-movement was given as follows (pg. 85): Move wh-phrase into COMP. Data was then accounted for in terms of the general rule of wh-movement, phrase structure rules, locality constraints, the strict cycle and rules of interpretation (the “predication rule”). Although the rule of wh-movement was simple, the explanation of the grammaticality status of particular sentences was quite complicated, due to the complex interactions of the various components. Chomsky 1977: 72 had only two general rules: Move wh-phrase and Move NP. Later, these were collapsed into one general rule Move-α (Chomsky 1980: 145). Mechanisms to generate phrase structure have undergone a parallel process of unbundling. In Chomsky 1965, phrase structure was generated by rewrite rules. Such rewrite rules bundled three different kinds of information: (a) linear order (b) hierarchical grouping, and (c) syntactic category labels. Later work unbundled the information found in rewrite rules into different components. X’-theory imposed general constraints on phrase structure rules through what were called “phrase structure rule schema” (see Jackendoff 1977: 33). As Travis (1989: 264) explains: “Phrase structure rules also encoded two disparate types of relationships: dominance relations and precedence relations…These different relations are being teased apart in the GB framework. Dominance relations are restricted by X-bar theory. The ordering of non-heads with respect to one another is restricted by subcomponents of the grammar such as Case theory (Stowell 1981). The order of DOI 10.1515/9781501502118-003

48 

 Chris Collins

non-heads with respect to heads is restricted by one of the first parameters proposed, the headedness parameter…” (see Fukui 2001 for a brief history of X’-theory). The clear break with the phrase structure grammar tradition occurred when Chomsky (1995: 296) proposed the operation Merge: “One such operation is necessary on conceptual grounds alone: an operation that forms larger units out of those already constructed, call it Merge.” (see Freidin and Lasnik 2011: 5, fn. 12: “Phrase structure rules remained the only grammatical device for generating phrase structure until the advent of Merge in the mid-1990s.”) However, early minimalism still had quite a bit of bundling. For example, all of Chomsky’s early writings on minimalism defined Move/Attract as some combination of operations. For example, Chomsky (1995: 297) defined Attract as “incorporating the MLC and Last Resort”. Chomsky (2001: 10, see also Chomsky 2000: 101, 138) defined Move as the combination of Agree, Merge and ­pied-piping. Crucially, the fact that Move was a combination of operations underpinned the proposal that Move was preempted by Merge (explaining certain facts about the distribution of expletives). Chomsky 2004: 110 (see also Epstein 1999: 320) made the crucial observation that there is just one operation, Merge, with two subcases: internal and external Merge. Merge(X,Y) is external Merge if X and Y are separate objects. Merge(X,Y) is internal Merge if either X contains Y or Y contains X. In effect, phrase structure rules and transformations collapsed into a single operation. Even though the output of Merge in early minimalism was unspecified for linear order, it still bundled in syntactic category information: Merge(X,Y) = {X, {X, Y}} where X is the label of the syntactic object formed by Merge (see Chomsky 1995: 243). An immediate question that arose was how to determine the label. For movement, the relevant principle was that the target projects (pg. 259). For Merge (not collapsed with Move at that point) the question was left open: “I will suggest below that labels are uniquely determined for categories formed by the operation Move α, leaving the question open for Merge,…” (pg. 244). Collins (1997: 65), Chomsky (2000: 135, footnote 101) and Collins 2002 propose (1) as the definition of Merge. (1) Merge(X,Y) = {X,Y} For Collins 2002 the main motivation for this simplification in the definition of Merge was given as follows: “The question is whether the result of the operation Merge(V, X) is {V, X} or {V, {V, X}}. The fact that Merge(V, X) combines two elements into a larger phrase is a necessary part of any theory of human language. The assumption that {V, {V, X}} is formed rather than {V, X} (that a label is chosen), goes way beyond what is necessary for grammar to make ‘infinite use of finite means.’” (pg. 43). Collins (2002) then showed how various generalizations



Merge(X,Y) = {X,Y} 

 49

of X’-theory, subcategorization and the Minimal Link Condition would work in a label-free theory. However, he did not touch upon the issue of word order, to which I shall return in section 5 below. Seely (2006) gave a completely different set of arguments for eliminating labels from the output of Merge. Perhaps the most compelling reason was that given the definition of c-command assumed by Seely, “…labels cannot participate in syntactic operations; they are syntactically inert.” (pg. 189) So even if there were labels, they could not play any role in syntactic derivations. Seely (2006: 195, 206) also raised a number of difficult questions that labels (as part of the syntactic object formed by Merge) raise: In {see, {see, Mary}}, does the label see have the same selectional properties as the see which is the sister of Mary? Can the label undergo head movement? Why is the label never pronounced? All of these questions dissolve under simplest Merge. The equation Merge(X,Y) = {X,Y} can be seen as the endpoint in the unbundling of syntactic transformations and phrase structure rules. The operation Merge gives rise to hierarchical syntactic structure. It contains no information about linear order or syntactic category (other than the syntactic category information found in the lexical items themselves). Nor does it contain any information about when syntactic transformations must apply and when they are blocked from applying (unlike earlier formulations of transformational rules). The equation Merge(X,Y) = {X,Y} seems to be the ultimate destination, in that no other simplifications are imaginable. Furthermore, adopting Merge(X,Y) = {X,Y} can serve as a guide to the development of other parts of the theory. Working backwards from (1), it should now be possible to make significant progress in formulating Transfer. In this paper, I assume the general background of Collins and Stabler 2016. In addition to Merge, there is an operation which takes syntactic objects and creates structures interpretable at the interfaces. This operation is Transfer, and it has two components: TransferSM and TransferCI. TransferSM (also known as Spell-Out or externalization) yields a structure interpretable at the SM Interface. TransferCI yields a structure interpretable at the CI Interface. A central goal of this paper is to show what the consequences of adopting Merge(X,Y) = {X,Y} are for the definition of TransferSM. It is possible that Transfer itself should be unbundled. A first step would be to dissociated TransferPF and TransferSM, which would then apply independently in a derivation. I do not take up this issue here for brevity’s sake. As Chomsky (2014a: 2) notes: “Naturally, one seeks the simplest account of UG. One reason is just normal science: it has long been understood that simplicity of theory is essentially the same as depth of explanation. But for language there is an extra reason: UG is the theory of the biological endowment

50 

 Chris Collins

of the language faculty, and each complication of UG therefore poses a barrier to some eventual account of the origin of language, to the extent that this can be attained.” Given this goal, consider the Strong Minimalist Thesis from Berwick and Chomsky (2011: 3) (see also Chomsky 2013: 38): “…the principles of language are determined by efficient computation and language keeps to the simplest recursive operation, Merge, designed to satisfy interface conditions in accord with independent principles of efficient computation.” So according to the SMT, the only structure building operation of UG is Merge. This formulation raises the question of the place of Transfer in UG (see Collins and Stabler 2016 who take Transfer to be part of UG). A natural question to ask is just how much of Transfer needs to be stipulated as part of UG and how much follows from principles of computational efficiency and the properties of the interfaces themselves. In other words, one can ask whether Transfer itself can be unbundled. The explicit statement of TransferSM that I give in section 5 might eventually help to resolve this issue, although I will not pursue it here (see Berwick and Chomsky 2011: 38, 40 for discussion).

2 Properties of Merge I will now be more explicit in my treatment of Merge. First, I give a formal definition of Merge in (2): (2) Given any two syntactic objects A, B, Merge(A,B) = {A,B} In Collins and Stabler 2016 it was stipulated that A and B be distinct. From the point of view of the current paper, such a stipulation is an example of the bundling of stipulations into the definition of Merge. If A and B must be distinct, then that distinctness should follow from independent principles; it should not be stipulated as an independent part of Merge (see Collins 1997: 81 for one attempt, see Adger 2013 and Guimarães 2000 for arguments against distinctness). According to (2), A and B are syntactic objects. The recursive definition of syntactic object is given below. I return below to the definition of lexical item in section 6. (3) X is a syntactic object iff i. X is a lexical item, or ii. X is a set of syntactic objects. I will now enumerate some of the properties of (2).



Merge(X,Y) = {X,Y} 

 51

First, Merge is iterable. Since the output of Merge is a syntactic object, it can serve as the input to Merge. If Merge(X,Y) = Z, then it is possible for Z, as a syntactic object, to be an argument of the Merge operation. I return to iterability in section 7 when I define a syntactic derivation. Second, Merge is binary operation, taking two arguments (essentially following Kayne 1983, see Collins 1997: 76 for an possible explanation). Third, Merge is commutative: Merge(X,Y) = Merge(Y,X), since {X,Y} = {Y,X} (contra Collins and Stabler 2016). However, Merge is not associative, since Merge(X, Merge(Y,Z)) = {X, {Y,Z}}, which is not in general the same as Merge(Merge(X,Y), Z) = {{X,Y}, Z}. Nor is Merge idempotent, since Merge(X,X) = {X}, which is not equal to X. Fourth, the output of Merge is unspecified for linear order. I assume, following Chomsky 1995, that order is imposed at TransferSM, in a way detailed in section 5. Of course, it would be possible to define ordered Merge: Merge(X,Y) = , where is an ordered pair (see Fukui 2001: 399 for a proposal). However, such a definition would constitute bundling, since two different kinds of information would be encoded in the output of Merge. Furthermore, in this case, it is natural to assume that order is imposed by TransferSM to create a structure interpretable at the SM Interface. See section 5 below for specific proposals on how linearization takes place. Fifth, as already discussed, there is no label encoded in the output of Merge (see Collins 2002 and Seely 2006 for the original arguments). An immediate consequence of this is that it is not possible to define specifier or complement, which are both defined in terms of labels (see Collins and Stabler 2016 for formal definitions, see also Adger 2013:chapter 2 for a discussion of specifiers and labelling). Sixth, Merge is untriggered. In order to calculate Merge(X,Y) there is no need for any feature to be checked (e.g., subcategorization features, EPP features, etc.). See Collins and Stabler 2016 for discussion. If it turns out that feature checking must take place for certain instances of Merge(X,Y), this would have to be forced by some independent principle of computational efficiency or some independent property of the interfaces, not by the definition of Merge in (2). Seventh, given the definition of Merge, counter-cyclic internal or external Merge is not possible. In fact, any kind of replacement of terms is impossible. This point was first made clear by Collins 1997: 84 (modifying ideas of Watanabe 1995): “However, such a replacement operation would greatly complicate Merge, and so it is to be avoided. A provision would have to be added to the effect that if Merge(α,β) = γ, where α is embedded in another constituent, α must be replaced by γ. Since this provision complicates the definition of Merge and prevents us from accounting for the cycle in terms of the independently needed LCA, there is no reason to believe replacement is possible.” Similarly, late merger and tuckingin are not possible. Both would require complications in (2).

52 

 Chris Collins

The fact that Merge cannot give rise to any kind of replacement of terms is now referred to as the No Tampering Condition (see Chomsky 2005: 11). However, there is no need to stipulate an independent No Tampering Condition. The NTC follows as a theorem from the definition of Merge, since Merge is by definition unable to replace terms. Similar remarks hold for the Extension Condition and Inclusiveness (see Collins and Stabler 2016, and Collins 2014 for discussion). These should be understood as theorems about the system, not actual filters or conditions that need to be stipulated (either as part of UG or as general properties). Given the definition of Merge, it is impossible for covert movement (QR) to be counter-cyclic. Cyclic QR is consistent with the framework of Groat and O’Neil 1996, where after internal Merge of α, the lower occurrence of α, but not the higher occurrence, would be spelled-out. Implementing this would require modifications to (13) and (15) below that I will not pursue here for reasons of space. Eighth, Merge is the only mechanism available for building structure. It replaces both phrase structure rules and transformations (Chomsky 2004: 110). In particular, there is no bundling in the definition of Move (= Merge + Agree), for the simple reason that there is no operation Move, there is only Merge. Even though Merge has two different cases (internal Merge and external Merge), there is just one operation. Ninth, there is no provision for the segment/category distinction in Merge. In other words, Merge does not produce adjunction structures of the type: [XP YP XP] (YP is adjoined to XP). Merge cannot produce head adjunction structures either. Chomsky (2004: 177) argues that an operation of Pair-Merge is needed to account for the properties of adjunction: Pair-Merge(X, Y) = (X adjoined to Y). However, Pair-Merge is a completely different operation from Merge, and would have to be stipulated as an independent operation of UG (going against the SMT as defined in section 1). Tenth, there is no operation Copy (for early discussions of this issue, see Bobaljik 1995: 47, Groat 1997: 41, contra Nunes 2004: 89). In the remainder of this paper, I will use the word “copy” to mean one of the two occurrences created by internal Merge. But even though I use the word “copy”, there is no copying operation. The existence of copies in this precise sense follows from (2). Eleventh, there is no provision for the creation of “traces” in the definition of Merge. Once again the absence of traces follows from the definition of Merge (and the additional assumption that there are no other structure building operations); there is no need to stipulate an independent No Tampering Condition. Twelfth, there is no provision for indices on the copies in the definition of Merge. The fact that Merge does not introduces indices falls under the inclusiveness condition (Chomsky 1995: 225, 228). But once again, there is no need to



Merge(X,Y) = {X,Y} 

 53

stipulate the inclusiveness condition as part of UG. Rather, it follows as a theorem from the definition of Merge. Thirteenth, there is no notion of Chain (as a sequence of occurrences) (contra Chomsky 1995: 43, 250, 2000: 114, 116, 2001: 39, 2008), nor is there an operation Form-Chain (contra Collins 1994 and Nunes 2004). Postulating Chains or Form-Chain would once again go way beyond the definition of Merge in (2). The consequence of points eleven, twelve and thirteen is that TransferSM and TransferCI must be formulated without reference to traces, indices or chains. Thus formulating Merge as (2) provides us with a clear constraint on formulating Transfer. I will meet this challenge for TransferSM in section 5. An important future challenge for minimalist syntacticians/semanticists is to write a formal definition of TransferCI meeting these criteria.

3 Chomsky’s labeling algorithm Chomsky (2008, 2013) proposes that Merge(X,Y) = {X,Y}, and that labels are identified via one of a small number of principles, the labeling algorithm. I summarize Chomsky’s (2013, 2014a, 2014b) system below, using the functional notation for labels introduced in Collins and Stabler 2016. (4) If SO = {H, XP} where H is a head and XP is not a head, then Label(SO) = H. As an example of (4), if Merge(see, {the, man}) = {see, {the, man}}, then Label({see, {the, man}}) = see. (5) If SO = {XP, YP} and neither is a head, then a. if XP is a lower copy, Label(SO) = Label(YP). b. if Label(XP) and Label(YP) share a feature F by Agree, Label(SO) = . An example to illustrate (5b) is given in (6): (6) Who does John see? Here the label of (6) will be , since that feature is shared by who and the interrogative complementizer. An immediate consequence of (5b) is that the label given by (6) is not sufficient to determine linear order at TransferSM. The label is , and there is no asymmetry between who and the rest of the clause that would allow one to determine which comes first in linear order. So like Collins 2002 and Seely 2006, Chomsky 2013 does not address the issue of labels and linear order.

54 

 Chris Collins

From this short overview, it is clear that Chomsky assumes that labels are not encoded in the output of Merge (see Collins 1997, Chomsky 2000, Collins 2002, Seely 2006). However, even though labels are not so encoded, they are still identified by the labeling algorithm and the labels play a role at the interfaces. (7) Chomsky 2013: The output of Merge is label-free. Labels are determined by a labeling algorithm, and play a role at the interfaces. Chomsky justifies the labeling algorithm by claiming that (pg. 43): “…there is a fixed labeling algorithm LA that licenses SOs so that they can be interpreted at the interfaces…”. But then the question arises as to why one needs a labeling algorithm at all. Why isn’t the structure of the SO, in combination with the definition of TransferSM and TransferCI sufficient to yield the correct interface representations? Pursuing this point, I argue that given independently needed assumptions about TransferSM, a labeling algorithm (an independent operation whose results feed both TransferCI and TransferSM) would be redundant. I attempt to implement this program in section 5 below.

4 Why are labels needed? The question that needs to be addressed first is what role labels play in syntactic theory. There are in principle three different areas where labels could play a role: (8) (a) internal to the syntactic computation; (b) at TransferCI; (c) at TransferSM As for (8a), it has been suggested that Merge is constrained by c-selectional information (see Collins 2002 and Collins and Stabler 2016 for references, see Seely 2006 for much relevant discussion of c-selection in a label-free framework). I will not discuss this any further here. If c-selectional requirements do constrain the output of Merge, it is not by virtue of the definition of Merge, as shown in section 2. Of course, all the perennial questions about the scope of c-selection, and its relation to s-selection remain (see Collins 2002 for references). In a series of publications, Chomsky assumes that labels are needed at the interfaces: (9) “Applied to two objects α and β, Merge forms the new object K, eliminating α and β. What is K? K must be constituted somehow from the two items α and β; … The simplest object constructed from α and β is the set {α, β}, so we take K to involve at least this set, where α and β are the constituents of K.



Merge(X,Y) = {X,Y} 

 55

Does that suffice? Output conditions dictate otherwise; thus verbal and nominal elements are interpreted differently at LF and behave differently in the phonological component. K must therefore at least (and we assume at most) be of the form {γ {α, β}}, where γ identifies the type to which K belongs, indicating its relevant properties. Call γ the label of K.” (Chomsky 1995: 243) (10) “For a syntactic object SO to be interpreted, some information is necessary about it: what kind of object is it? Labeling is the process of providing that information. Under PSG and its offshoots, labeling is part of the process of forming a syntactic object SO. But that is no longer true when the stipulations of these systems are eliminated in the simpler Merge-based conception of UG. We assume, then, that there is a fixed labeling algorithm LA that licenses SOs so that they can be interpreted at the interfaces, operating at the phase level along with other operations.” (Chomsky 2013: 43). (11) “Since the same labeling is required at CI and for the processes of externalization (though not at SM, which has no relevant structure), it must take place at the phase level, as part of the Transfer operation.” (Chomsky 2014a: 4) What information is needed at the CI Interface (see (8b))? I assume hierarchical information is important: {{the, man}, {bite, {the, dog}}} clearly has a different interpretation at the CI Interface than {{the, dog}, {bite, {the, man}}}. In fact, in at least one introductory semantics textbook (Heim and Kratzer 1998), semantic rules of interpretation are formulated solely in terms of hierarchical structure. Since nobody has given a concrete formal definition of TrasnsferCI, it is difficult to evaluate the claim that labels are needed in addition to hierarchical structure. The above quotes from Chomsky suggest that for a SO to be interpreted at the CI Interface, there is a need to know what kind of SO it is. For example, if the label of an SO is a question complementizer, then one knows that SO is an interrogative CP, and that information would be useful at the CI Interface. I will leave this issue aside, and focus uniquely on TransferSM in the rest of this paper. As for (8c), labels have traditionally played an important role in linear ordering. In Principles and Parameters, specifiers, complements and adjuncts all have different ordering properties. For example, in English, specifiers precede heads, and complements follow heads. Adjuncts have a freer word order than either specifiers or complements. These notions are defined in terms of labels (see Collins and Stabler 2016 for formal definitions). See Dobashi 2003 for a labelfree phase-based approach to phonological phrasing.

56 

 Chris Collins

5 TransferSM In this section, I will first present the version of TransferSM (also known as SpellOut or externalization) found in Collins and Stabler 2016 (see Uriagereka 1999 for the original Multiple Spell-Out proposal). The version of TransferSM in (13) below embodies three general ideas. First, the spell-out of a lexical item is a sequence of phonological segments (Phon), where a lexical item is a triple of features: LI = . I return to the definition of lexical items in section 6. Second, to transfer a lower copy of an element that has been internally merged, the lower copy is simply ignored. There is no separate operation of Delete that deletes lower copies. I assume that ignoring the lower copy is the result of a principle of computational efficiency: If a syntactic object has two or more occurrences, it is only spelled out at one of them. A further question is why the lower copy, instead of the higher copy, is ignored (unless covert movement is involved, see section 2, point seven). I do not pursue this question here. Third, specifiers precede heads, and heads precede complements. This last assumption shows the dependence of (13) on labels, since specifiers and complements are defined in terms of projections of a head. Before presenting (13), I comment on a few technical details. First, Transfer is a two-place function, since it spells out a syntactic object SO with respect to the phase that the SO is contained in. For example, suppose the phase is Phase = {that, {John, ran}}. Then one has TransferSM(Phase, {John, ran}). Such a system allows extraction to the edge of a phase (and successive cyclic movement), exactly as in Chomsky 2004. Second, the symbol “^” is the concatenation symbol. X^Y means that X is followed linearly by Y and they are adjacent. Third, the notion of “lower copy” is formalized in the following definition (basically c-command): (12) A ∈ {A,B} is final in SO iff there is no C contained in (or equal to) SO such that A∈C, and C contains {A,B}. Otherwise, A is non-final in SO. So an occurrence of A is final if it is not c-commanded by any other occurrence of A. Otherwise, an occurrence of A is non-final. A non-final occurrence of A is what is also called a “lower copy”. With this background, TransferSM is defined as follows (slightly modified from Collins and Stabler 2016): (13) TransferSM (First Version) For all syntactic objects SO such that either SO=Phase or SO is contained in Phase, TransferSM(Phase, SO) is defined as follows: a. If SO is a lexical item, LI = 〈Sem, Syn, Phon〉, then TransferSM(Phase, SO) = Phon;

Merge(X,Y) = {X,Y} 









 57

b. If SO={X,Y} and X and Y in SO are final in Phase, and if Y is the complement of X, then TransferSM(Phase, SO) = TransferSM(Phase, X)^TransferSM(Phase, Y). c. If SO={X,Y} and X and Y in SO are final in Phase, and if X is the specifier of Y, then TransferSM(Phase, SO) = TransferSM(Phase, X)^TransferSM (Phase, Y). d. If SO={X,Y} and X in SO is final in Phase but Y is not, TransferSM(Phase, SO) = TransferSM(Phase, X); e. If SO={X,Y} where both X and Y in SO are non-final in Phase, then TransferSM(Phase, SO) = the empty sequence ɛ.

No provision has been made for the linearization of adjuncts, head movement or QR in this definition. I do not purse these issues here. Clause (13a) determines how lexical items are spelled-out (including functional heads). At the SM Interface, only the phonological features of the lexical item are relevant, the others are ignored. The crucial clauses are (13b,c) which say that a head precedes the complement, and the specifier precedes the head. Of course, in order for clauses (13c,d) to work, one needs a definition of complement and specifier, which are defined in terms of labels. These definitions are given in Collins and Stabler 2016. An issue arises, given (13d,e), about the distinction between copies and repetitions. I follow the general program given in Groat 2013 (see Collins and Stabler 2014 for formalization). If X and Y are identical within a phase, they are treated as copies. If X and Y are identical, but are spelled-out in different phases, they are treated as repetitions. Clause (13d,e) determines the spell-out of lower copies. They are simply ignored by the algorithm. An important aspect of (13d,e) is that they do not make reference to indices or chains of any kind. If an occurrence Y is c-commanded by an identical occurrence in a phase, then the lower occurrence is non-final. In that situation the lower occurrence of Y is not pronounced. (13) in conjunction with the definition of “final” in (12) also accounts for remnant movement, as shown in Collins and Stabler 2016. An example from Collins and Stabler 2016 shows how the system works (note that fall is unaccusative): (14) Phase = {that, {John, {will, {fall, John}}}} TransferSM(Phase, {fall, John}) = TransferSM(Phase, fall) = /fall/ The drawback of definition (13) is that it relies on the notion of label (since it uses the label-based notions of specifier and complement), and that goes

58 

 Chris Collins

beyond the equation in (2). Therefore, I propose a label-free definition of TransferSM in (15). The definition in (15) below is structured around the following general ideas. First, the spell-out of a lexical item is a sequence of phonological segments (Phon). Second, to transfer a lower copy of an element that has been internally merged, the lower copy is simply ignored. These two principles are identical to what is found in definition (13). Third, in a structure SO = {X,Y} where X is a lexical item and Y is not (it is a phrase), X precedes Y. I take this principle to be a principle of computational efficiency, reflecting the fundamental left-right asymmetry imposed by the SM Interface. The principle is: the most accessible lexical item (identified with minimal search) is spelled-out to the left. In this case, there is an ordering established by accessibility with minimal search (X is a lexical item, accessible with minimal search, Y is not), and this ordering is mapped to linear precedence. Fourth, in a structure SO = {X,Y} where both X and Y are complex constituents (non-heads) (and neither is a lower copy), a problem arises. There is no obvious way to linearly order X and Y. I suggest a principle where if Y dominates X, then X is linearly ordered before Y. I will call this the IM (Internal Merge) Ordering Principle (IMOP). If neither X nor Y dominates the other, the structure cannot be spelled-out. The question is whether the IMOP can be reduced to independent principles of computational efficiency. Suppose SO = {X, Y}, where X is contained in Y. The relation “X is contained in Y” is a partial order with minimal elements (the lexical items). The relation “X precedes Y” is a total order with a smallest element (the Phon of the first lexical item in the utterance). So the IMOP trades one ordering relation (“is contained in”) for another (“precedence”) (much as in Kayne 1994 where asymmetric c-command maps to linear order). Since this is the simplest mapping between the two orders, by computational efficiency, it is the one used by TransferSM. Return to SO = {X,Y} where X does not dominate Y, and Y does not dominate X (there is no internal Merge). In this case, there is no ordering relation between X and Y that can be mapped to linear precedence. One possibility would be to order X and Y according to size (e.g., total number of lexical items dominated by X). But this would involve a kind of counting, and I assume that UG does not permit this kind of operation. I assume that TransferSM has no parameters. Therefore, as in Kayne 1994, there is no head parameter. The apparent effects of the head parameter are due to various instances of internal Merge.



Merge(X,Y) = {X,Y} 

 59

Given this background, the revised label-free TransferSM is given below: (15) TransferSM (Second Version) For all syntactic objects SO such that either SO=Phase or SO is contained in Phase, TransferSM(Phase, SO) is defined as follows: a. If SO is a lexical item, LI = 〈Sem, Syn, Phon〉, then TransferSM(Phase, SO) = Phon; b. If SO={X,Y} and X and Y in SO are final in Phase, where X is a lexical item and Y is not, then TransferSM(Phase, SO) = TransferSM(Phase, X)^TransferSM(Phase, Y). c. If SO = {X,Y} and X and Y in SO are final in Phase, where X is contained in Y, then TransferSM(Phase, SO) = TransferSM(Phase, X)^TransferSM(Phase, Y). d. If SO={X,Y} and X in SO is final in Phase but Y is not, TransferSM(Phase, SO) = TransferSM(Phase, X); e. If SO={X,Y} where both X and Y in SO are non-final in Phase, then TransferSM(Phase, SO) = the empty sequence ɛ. I have also not defined the notion of phase, and some care needs to be taken that defining phases does not depend on the notion of labels (e.g., the maximal projection of C or v*). I do not pursue this issue here. Clauses (15d,e) are identical to (13d,e). Clause (15b) states that if a head and a non-head are merged, then the head precedes the non-head. Clause (15b) is the label-free version of clause (13b). Clause (15c) states that if X has undergone internal Merge, it is linearized to the left of the constituent Y out of which it has moved. Clause (15c) replaces clause (13c). An issue looming in the background is how, given SO = {X, Y}, TransferSM or TransferCI finds out whether X is contained in Y. If X has K nodes (where by “node” I mean X itself and any set or lexical item contained in X), and Y has N nodes, then checking whether X is contained in Y could take in the worst case N*K steps (one checks every node of Y to see if it is identical to X). I do not pursue this issue here. One similarity between Chomsky’s system and the one in (15) is what happens with externally merged specifiers. Suppose SO = {DP, vP} where both DP and vP are final in the phase dominating SO. In Chomsky’s system, such a structure cannot be labeled. In my system, such a structure cannot be linearized. To see this consider all the clauses of (15). (15a) is not relevant, since {DP, vP} is not a lexical item. (15b) is not relevant since neither DP nor vP is a lexical item. (15c) is not relevant, since DP has been externally merged. (15d,e) are not relevant, since by assumption

60 

 Chris Collins

DP and vP are final in the phase. So there is no condition in (15) that is relevant. Furthermore, without labels, it is unclear how any further condition could be added to (15) that would linearize SO = {DP, vP}. There is simply no asymmetry to latch on to. I will make the assumption that if there is a SO for which TransferSM does not yield a value (not even an empty string), the derivation crashes. A criticism of my analysis is that it involves two different ordering statements (15b) and (15c). In the best of possible worlds, these two statements would be unified. In fact, Kayne 1994 proposes a unification of the head-complement case with the specifier-head case in terms of asymmetric c-command. Basically, the LCA maps asymmetric c-command onto linear order. However, Kayne’s approach rests on some problematic assumptions about phrase structure and c-command, in particular the crucial definition (3) of Kayne (1994: 16), which draws a distinction between categories and segments that is not available with simplest Merge (see section 2). Chomsky’s (2013) system blocks (16b) below: (16) a. They thought JFK was assassinated in which Texas city? b. *They thought [α in which Texas city [ C [JFK was assassinated]]]? c. In which Texas city did they think that JFK was assassinated? Sentence (16a) illustrates wh-in-situ in English. Since the embedded C in (16b) lacks a Q feature, there is no way that α can be labeled creating a problem at the interfaces. Sentence (16c) is acceptable with successive cyclic movement, since by (5a) the intermediate copy is ignored for labeling. On my account, there would be no problem with α at the SM Interface. In particular, TransferSM would linearize (16b). A natural place to look for an explanation of the ungrammaticality of (16b) in my system is at TransferCI. I leave open the issue here.

6 Merge and the lexicon The formulation of TransferSM in (15) raises the question of what happens when two lexical items are merged. For example, what happens in SO = {the, man}? (15) does not provide for any way to linearize such structures, so there should be a crash at TransferSM. The question is related to the issue of the definition of lexical items. Collins and Stabler 2016 define a lexical item in the following way (see Chomsky 1995: 230) (SEM-F, SYN-F and PHON-F are universal sets of semantic, syntactic and phonological features, respectively. PHON-F* is the set of sequences of segments built up from PHON-F).



Merge(X,Y) = {X,Y} 

 61

(17) A lexical item is a triple: LI = 〈Sem, Syn, Phon〉, where Sem and Syn are finite sets such that Sem ⊆ SEM-F, Syn ⊆ SYN-F, and Phon ∈ PHON-F*. A problem with this definition is that it entails a structure (the lexical item) that is built by some operation other than Merge. In other words, some mechanism combines the three sets of features, and that mechanism is not Merge. This state of affairs seems undesirable for two reasons. First, humans have an unlimited capacity to learn and to coin new lexical items, just like they have an unlimited capacity to form new phrases, suggesting that lexical items are also created by Merge. Second, adding a new mechanism (to form lexical items) would increase the complexity of UG, going against the SMT as defined in section 1. I suggest that the three sets of features in (17) are combined by Merge. Limiting syntactic features to the categorial features (v, n, a, p), I propose the following representations (Phon = /dɔg/ and Sem = DOG): (18) a. Merge(/dɔg/, DOG) = {/dɔg/, DOG} b. Merge(n, {/dɔg/, DOG}) = {n, {/dɔg/, DOG}} On this view, n simply has no phonological features, so that TransferSM(Phase, n) = ∅ (the empty string). I leave aside the question of why other combinations are not possible, e.g., Merge(n, /dɔg/), etc. A natural question is to ask whether functional categories such as the are decomposed in the same way as dog. For closed class, functional categories, I will assume that there are only syntactic features (Syn), which are also semantically interpreted. So the definite article will have the following representation, where DEF is a set of syntactic features (possibly a singleton set): (19) Merge(/ðə/, DEF) = {/ðə/, DEF} Recall that the notion of lexical item played a role in TransferSM. I suggest the following replacement: (20) A functional head is a syntactic object of the form {Phon, Syn} or a categorizer (n,v,p,a…). Lastly, the definition of TransferSM with the revised definitions is given below. The relevant new conditions are given below: (21) a. If SO = {X, Phon} (X = Sem or Syn), then TransferSM(Phase, SO) = Phon; b. If SO={X,Y} and X and Y in SO are final in Phase, where X is a functional head and Y is not, then TransferSM(Phase, SO) = TransferSM(Phase, X)^ TransferSM(Phase, Y).

62 

 Chris Collins

The SO = {the, dog} will now be linearized in the following way: (22) TransferSM(Phase, {{/ðə/, DEF}, {n, {/dog/, DOG}}}) = TransferSM(Phase, {/ðə/, DEF}) ^ TransferSM(Phase, {n, {/dɔg/, DOG}}) = /ðə/ ^ TransferSM(Phase, n) ^ TransferSM(Phase, {/dɔg/, DOG}) = /ðə/ ^ ε ^ /dɔg/ = /ðə/ ^ /dɔg/ In order for Merge to operate as in (18) and (19) one of the two definitions in (2) and (3) needs to change. Consider (2). A sequence of phonological segments Phon is not a syntactic object, so Merge(Phon, Syn) would be undefined according to (2). One possibility is to redefine (2) to allow merger of more than just syntactic objects. Another possibility is to redefine (3) so that Phon counts as a syntactic object. The output of Merge in (18a) is similar to what is called a “root” in Distributed Morphology (Embick and Noyer 2007: 295). However, in my system, there is no primitive notion of “root”. Furthermore, there is no notion of Vocabulary Insertion (Embick and Noyer 2007: 297). The rule of Vocabulary Insertion is quite complicated necessitating many auxiliary assumptions (e.g., a list of vocabulary items, the placeholder Q for the phonological exponent, conditions on insertion, etc.), making it unclear how Vocabulary Insertion would fit into the SMT (see section 1). Similarly, in the system of this paper (based on simplest Merge), there is no possibility of post-syntactic insertion of ornamental morphemes (agreement, case, theme vowels, see Embick and Noyer 2007: 305) or post-syntactic lowering (Embick and Noyer 2007: 319). There is only Merge, and all “morphology” must ultimately be explained in terms of Merge. To take a simple example, Embick and Noyer (2007: 298,299) claim that there are two vocabulary items, one for the regular plural (z ↔ [pl]) and one for the irregular plural ([pl] ↔ -en in the context of the roots OX and CHILD). These two vocabulary items are in competition to be inserted at the same terminal node specified with the abstract morpheme [pl]. But in my I-language, both oxen and oxes (and even ox) are possible plural forms, so there can be no competition. Furthermore, in my I-language there is a clear difference between oxens (marginal) and oxesen (completely ungrammatical) (the difference was pointed out to me by Richard Kayne on January 16, 2012), which suggests that the regular plural and –en do not even occupy the same position in the DP. These facts together suggest that there are two different morphemes {Syn, Phon}={pl1, z} and {Syn, Phon}={pl2, en} each of which can be merged. Similar remarks can be made about the CI side. Operations which produce interpretations that mimic the results of Merge should be avoided (e.g., implicit arguments that are present semantically but not syntactically, quantifiers or



Merge(X,Y) = {X,Y} 

 63

operators present in the semantics but not the syntax, or type shifting rules). I do not pursue these issues for reasons of space.

7 Derivations It is not possible to study Merge without specifying how it enters into derivations. As noted in section 2, the output of Merge is a syntactic object, and hence the output of Merge can also be one of the input arguments to Merge. This makes it possible to define a derivation in simple terms. Because of point seven, in section 2, derivations will be bottom-up, creating larger and larger structures as the derivation progresses. (23) A derivation D from lexicon L is a finite sequence of steps such that each step is one of the following: i. A lexical item ii. Merge(X,Y) = {X,Y}, where both X and Y are syntactic objects that appear earlier in the derivation (either as lexical items or as the output of Merge). Even though derivations have a finite length, there are an unlimited number of derivations, allowing for an account of the fact that language makes “infinite use of finite means”. There is no need for a numeration or a lexical array. There is also no need for an operation Select (see Collins and Stabler 2016), although listing the lexical item as its own line of the derivation (as in (23a,b) below) could be viewed as selection of a lexical item out of the lexicon. For example, given the definition in (23), the following is a derivation: (24) a. see b. John c. Merge(see, John) = {see, John} The definition in (23) contrasts with the usual definition of a derivation found in minimalist literature as a sequence of workspaces, where each workspace is a set of syntactic objects. On the sequence of workspaces definition, Merge(X,Y) creates the syntactic object {X,Y} and adds it to the workspace. Crucially, the two syntactic objects X and Y are removed from the workspace (Chomsky 1995: 226, 243, see Collins and Stabler 2016 for discussion and references). As Collins and Stabler 2016 have shown, implementing the sequence-of-workspaces definition requires a great detail of stipulation and complexity. Hence, the

64 

 Chris Collins

simpler definition in (23), not involving a sequence of workspaces, should be preferred on minimalist grounds (see the discussion of the SMT in section 1). One issue with (23) is the status of Transfer in the derivation. There are two possibilities. One is to view Transfer as an operation that happens automatically when a strong phase has been built. On this view, there is no reason to modify (23). Transfer simply occurs when the conditions are met. The second (more common) way is to view Transfer as an operation ordered amongst Merge operations in a derivation. On this view, the definition in (23) must be altered to include Transfer: (25) A  derivation D from lexicon L is a finite sequence of steps, such that each step is one of the following: i. A lexical item ii. Merge(X,Y) = {X,Y}, where both X and Y are syntactic objects that appear earlier in the derivation (either as lexical items or as the output of Merge). iii. Transfer(Phase, SO) = , where Phase is a syntactic object that appears earlier in the derivation, and SO is contained in (or equal to) Phase. I do not tackle the Assembly Problem here. That is, I do not show how the outputs of TransferSM throughout the derivation are assembled correctly. See Collins and Stabler 2016 for extensive discussion of this issue. I also do not tackle the definition of phases in a label-free system. These simple definitions of a derivation have far reaching implications for syntactic analysis, including the issue of two peak structures (originally studied by Collins 1997: 83, Groat 1997: 101–104 and more recently by Epstein, Kitahara and Seely 2012), sideward movement and head movement (see in particular Bobaljik and Brown 1997). I will not pursue these issues for reasons of space.

8 Agree Merge(X,Y) = {X,Y} is indispensable as a syntactic operation. It is what allows the language faculty to generate and parse an unlimited number of expressions, given the finite means of the human mind/brain. The null hypothesis is that there are no other operations. An additional operation OP would go against the SMT as defined in section 1. Consider Agree from this perspective. Agree creates dependencies that are similar to the dependencies created by internal Merge. With Agree, some feature



Merge(X,Y) = {X,Y} 

 65

set (e.g., 3SG) has two different occurrences, exactly like a syntactic object that has undergone internal Merge. Given this background, I propose (26) (see Seely 2014 for a similar conclusion from a different perspective): (26) There is no operation Agree in UG The effects of Agree need to be captured by Merge. For example, subject-verb agreement could be analyzed as follows. I make the simplifying assumption that the phi-features of the DP are located on D (written the[phi]). It would be more accurate to say that the phi-features of the DP are distributed throughout the DP (e.g., the number feature is found in the Num head). (27) “The man falls.” a. the[phi] b. man c. Merge(the[phi], man) d. fall e. Merge(fall, {the, man}) = {fall, {the, man}} f. T g. Merge(phi, T) = {phi, T} h. Merge({phi,T}, {fall, {the man}}) = {{phi, T}, {fall, {the, man}}} i. Merge({the, man}, {{phi,T}, {fall, {the man}}}) = {{the, man}, {{phi,T}, {fall, {the man}}}} The SO = {the, man} has phi-features. These phi-features are also merged with T, creating a dependency. This instance of Merge is neither internal nor external Merge, but rather sidewards Merge (as in the Bobaljik and Brown 1997 analysis of head movement). It is notable that the simple definition of derivations in section 7 does not preclude sidewards movement, although I will not pursue the issue here. Support for this way of looking at agreement concerns the No Tampering Condition. Suppose that T has unvalued phi-features (uphi) that are valued. Suppose furthermore that T is contained in SO. Then after valuation, T has changed and SO has changed. Agree has created new syntactic objects, which we can call T’ and SO’. This is a violation of the NTC, as defined formally in Collins and Stabler 2016. Of course, it is possible to avoid the conclusion that Agree violates the NTC if one changes the definition of the NTC. Alternatively, one might want to limit the scope of the NTC to Merge, claiming that Agree is part of Transfer and that Transfer is not subject to the NTC. However, all these issues simply evaporate if Agree is not part of UG, and clearly the derivation in (27) does not violate any version of the NTC. In order for Merge to operate as in (27) one of the two definitions in (2) and (3) needs to change. Consider (2). phi is not a syntactic object, so Merge(phi, T)

66 

 Chris Collins

would be undefined according to (2). One possibility is to redefine (2) to allow merger of phi. Another possibility is to redefine (3) so that phi is counted as a syntactic object. The issues are similar to those raised in section 6. This approach to agreement leaves open many questions. For example, how are the two occurrences of phi interpreted by TransferSM and TransferCI? Also, what triggers agreement in the first place? Lastly, how is the structure in (27i) linearized? I will not address these issues here for reasons of space.

9 Conclusion In this paper, I have explored the history, properties and implications of the syntactic operation Merge(X,Y) = {X,Y}. I sketched the historical development of the equation Merge(X,Y) = {X,Y}, claiming that it is the final destination in a process of unbundling that has taken place since the beginning of generative grammar. Then, I presented thirteen properties of Merge. I discussed the consequences of simplest Merge for the definition of TransferSM (leaving open the status of TransferCI). I showed how the definition of TransferSM can be modified to linearize a syntactic structure without reference to labels. I discussed the consequences of simplest Merge for the nature of the lexicon, where I proposed that lexical items are created by Merge. I argued that there is no operation Agree in UG, and showed how Merge fits into a simple (perhaps the simplest) definition of a derivation.

Acknowledgements I would like to thank Yoshi Dobashi, Bob Freidin, Erich Groat, Richard Kayne, Dennis Ott, Daniel Seely and an anonymous reviewer for helpful comments on an earlier version of this paper.

References Adger, David. 2013. A Syntax of Substance. Cambridge: The MIT Press. Berwick, Robert and Noam Chomsky. 2011. The biolinguistic program: The current state of its development. In Anna Maria Di Sciullo and Cedric Boeckx (eds.), The biolinguistic enterprise. Oxford: Oxford University Press.



Merge(X,Y) = {X,Y} 

 67

Bobaljik, Jonathan. 1995. In terms of Merge. In Rob Pensalfini and Hiroyuki Ura (eds.), Papers on minimalist syntax, 41–64. Cambridge, MA: MITWPL. Bobaljik, Jonathan and Samuel Brown. 1997. Head movement and the extension requirement. Linguistic Inquiry 28.2. 345–356. Chomsky, Noam. 1965. Aspects of the theory of syntax. Cambridge, MA: The MIT Press. Chomsky, Noam. 1977. On Wh-Movement. In Peter Culicover, Thomas Wasow and Adrian Akmajian (eds.), Formal syntax, 71 – 132. Academic Press: New York. Chomsky, Noam. 1980. Rules and representations. New York: Columbia University Press. Chomsky, Noam. 1995. The minimalist program. Cambridge, MA: The MIT Press. Chomsky, Noam. 2000. Minimalist inquiries. In Roger Martin, David Michaels, Juan Uriagereka (eds.), Step by step: Essays on minimalist syntax in honor of Howard Lasnik, 89–155. Cambridge, MA: The MIT Press. Chomsky, Noam. 2001. Derivation by phase. In Michael Kenstowicz (ed.), Ken Hale: A life in language, 1–52. Cambridge, MA: The MIT Press. Chomsky, Noam. 2004. Beyond explanatory adequacy. In Adriana Belletti (ed.), Structures and beyond, 104–131. (Originally published as: Chomsky, Noam. 2001. Beyond explanatory adequacy. MIT Occasional Papers in Linguistics 20. MIT Working Papers in Linguistics.) Chomsky, Noam. 2005. Three factors in language design. Linguistic Inquiry 36. 1–22. Chomsky, Noam. 2008. On phases. In Robert Freidin, Carlos P. Otero, and Maria Luisa Zubizarreta (eds.), Foundational issues in linguistic theory, 133–166. Cambridge, MA: The MIT press. Chomsky, Noam. 2013. Problems of projection. Lingua 130. 33–49 Chomsky, Noam. 2014a. Problems of projection: Extensions. MIT manuscript. Chomsky, Noam. 2014b. Lectures on syntax at MIT. (http://whamit.mit.edu/2014/06/03/ recent-linguistics-talks-by-chomsky/) Collins, Chris. 1994. Economy of derivation and the generalized proper binding condition. Linguistic Inquiry 25. 45–61. Collins, Chris. 1997. Local economy. Cambridge, MA: The MIT Press. Collins, Chris. 2002. Eliminating labels. In Samuel David Epstein and T. Daniel Seely (eds.), Derivation and explanation in the minimalist program, 42–64. Oxford: Blackwell. Collins, Chris. 2014. Why formalize? Faculty of Language (blog). (http://facultyoflanguage. blogspot.com/2014/02/where-norbert-posts-chriss-revised-post.html) Collins and Edward. Stabler. 2014. A formalization of a phase-based approach to copies versus repetitions. NYU and UCLA manuscript. Collins, Chris and Edward Stabler. 2016. A Formalization of Minimalist Syntax. Syntax 19.1. 43–78. Dobashi, Yoshihito. 2003. Phonological Phrasing and Syntactic Derivation. Ithaca, NY: Cornell University, doctoral dissertation. Embick, David and Rolf Noyer. 2007. Distributed morphology and the syntax-morphology interface. In Gillian Ramchand and Charles Reiss (eds.), The Oxford handbook of linguistic interfaces. Oxford: Oxford University Press. Epstein, Samuel D. 1999. Un-principled syntax: The derivation of syntactic relations. In Samuel David Epstein and Norbert Hornstein (eds.), Working minimalism, 317–145. Cambridge, MA: The MIT Press. Epstein, Samuel D., Hisatsugu Kitahara and Daniel T. Seely. 2012. Structure building that can’t be. In M. Uribe-Etxebarria and V. Valmala, Ways of structure building, 253–270. Cambridge: Cambridge University Press.

68 

 Chris Collins

Freidin, Robert and Howard Lasnik. 2011. Some roots of minimalism in generative grammar. In Cedric Boeckx (ed.), Linguistic minimalism, 1–26.Oxford: Oxford University Press. Fukui, Naoki. 2001. Phrase structure. In Mark Baltin and Chris Collins (eds.), The handboook of contemporary syntactic theory. Malden, MA: Blackwell. Groat, Erich. 1997. A derivational program for syntactic theory. Cambridge, MA: Harvard University, doctoral dissertation. Groat, Erich. 2013. Is movement part of narrow syntax? (handout). Goethe University Frankfurt, May 22, 2013. Groat, Erich and John O’Neil. 1996. Spell-out at the LF interface. In Werner Abraham, Samuel David Epstein, Höskuldur Thráinsson and C. Jan-Wouter Zwart (eds.), Minimal ideas, 113–139. Amsterdam: John Benjamins. Guimarães, Max. 2000. In defense of vacuous projections in bare phrase structure. In G. Maximiliano, L. Meroni, C. Rodrigues and I. San Martin (eds.), University of Maryland Working Papers in Linguistics 9. 90–115. Heim, Irene and Angelika Kratzer. 1998. Semantics in generative grammar. Oxford: Blackwell. Jackendoff, Ray. 1977. X’-syntax: A study of phrase structure. Cambridge, MA: The MIT Press. Kayne, Richard. 1983. Connectedness and binary branching. Dordrecht: Foris Publications. Kayne, Richard. 1994. The antisymmetry of syntax. Cambridge, MA: The MIT Press Nunes, Jairo. 2004. Linearization of chains and sideward movement. Cambridge, MA: The MIT Press. Seely, T. Daniel. 2006. Merge, derivational c-command, and subcategorization in a label-free syntax. In Cedric Boeckx (ed.), Minimalist essays. Amsterdam: John Benjamins. Seely, T. Daniel. 2014. Agreement is the shared prominent feature option of POP’s labeling algorithm. Eastern Michigan University manuscript. Stowell, Timothy. 1981. Origins of phrase structure. MIT, doctoral dissertation. Travis, Lisa. 1989. Parameters of phrase structure. In Mark Baltin and Anthon Kroch (eds.), Alternative conceptions of phrase structure. Chicago: Chicago University Press. Uriagereka, J. 1999. Multiple spell-out. In Samuel D. Epstein and Norbert Hornstein (eds.), Working minimalism, 251–160. Cambridge, MA: The MIT Press. Watanabe, Akira. 1995. Conceptual basis of cyclicity. In Rob Pensalfini and Hiroyuki Ura (eds.), Papers on minimalist syntax, 269–291. Cambridge: MITWPL.

Aleksandra Vercauteren

F eatures and labeling: Label-driven movement Abstract: In this paper I explore the consequences for movement of the Labeling Algorithm as proposed by Chomsky (2013) in combination with a cartographic approach to syntax. With the Labeling Algorithm, agreement is a prerequisite for halting, in the sense that agreement is necessary for a syntactic object obtained by XP-YP merge to be labeled. I argue that a cartographic hierarchy of the clause provides the necessary halting places for several XP movement operations, since they provide the necessary functional heads for labeling through agreement. The cartographic approach also benefits from a theory with the Labeling Algorithm, as it permits us to maintain that there is a universal highly articulated hierarchy of strictly ordered functional heads, while allowing for cross-linguistic variation in word order patterns and optional movement operations: the functional projections are only exploited when movement occurs, they (or features on them) do not trigger movement themselves.

1 Introduction: The ‘triggers’ for movement The question of why syntactic displacement exists and what its triggers are, if any, has occupied a central place in the Generative Enterprise. Several hypotheses concerning movement have been advanced. In Chomsky (1995), movement is assumed to be a costly operation, and due to economy considerations, it should thus only occur when necessary. This situation is known as movement by Last Resort. More concretely, it is assumed that movement occurs because uninterpretable features need to become interpretable before they are shipped off to the interfaces for the derivation to converge. Since in this framework, agreement is assumed to be restricted by locality, checking uninterpretable features will often require constituents to move. The main weakness of this approach to movement is that positing uninterpretable features in all movement contexts risks turning the notion of uninterpretable feature empirically vacuous, as agreement is very often invisible. Additionally, agreement does not always trigger movement: take for instance the well-known difference between English and Romance verbs. Both exhibit agreement, nevertheless only the latter can move. These observations lead to a theory of movement DOI 10.1515/9781501502118-004

70 

 Aleksandra Vercauteren

in which movement and agreement were disarticulated. Several alternatives to the movement-for-feature-checking have been advanced. For instance, it has been argued that only EPP features or strong features trigger movement. Stipulating such additional types of features does however not explain why movement exists, it just describes the facts by using a different terminology.1 In posterior theories, the restrictions on movement were relaxed, and movement, an instance of merge, was argued to come for free (Chomksy 2001). According to the Free Merge hypothesis, movement does not need any triggers, although the presence of a trigger is of course not excluded. For instance, movement can be obligatory in case that the lack of movement gives rise to a syntactic structure that cannot be interpreted and/or linearized. This idea has been explored by several authors, including Moro (2000) who argues that movement is obligatory in case that the lack of movement gives rise to a symmetric, and thus not linearizable structure. Also in the Nanosyntactic framework movement occurs because of interface requirements: it is assumed that words do not spell out terminals but little pieces of syntactic structure. When there is no word available in the Lexicon that corresponds to the output of syntax, movement is obligatory.2 Also Chomsky’s (2013) view on the role of labels in the computation has some interesting implications for movement. Chomsky (2013) argues that labels are a necessary requirement for syntactic objects to be interpretable at the interfaces, the output of syntax thus needs to have a label. Labels are determined through Minimal Search: the head that is closest to the node to be labeled will provide the label for the whole structure. When two heads are equidistant to the new node, a label can be found through agreement: the agreeing feature provides the label for the new syntactic object. In case that no label can be found by lack of agreement, movement is obligatory in order for one of the equidistant heads to exit the competition for labeling. Under this approach to labels, the relation between movement and agreement is relaxed in comparison to previous theories of movement. Nevertheless, agreement continues to be relevant in the sense that it is needed to ensure halting. This will be discussed in more detail in section 2. The aim of this paper is to discuss the implications of this approach to labeling for movement from a cartographic point of view: I will show that cartography and the Labeling Algorithm (LA henceforth) can benefit from each other. In the

1 See Boeckx (2010) for a discussion of this issue. 2 Up to now, Nanosyntax only investigates the sub-word level. It is not clear to me if and how their conception of Spell-Out driven movement can be applied to the sentence level.



Features and labeling: Label-driven movement 

 71

cartographic framework (Cinque 1999, Rizzi 1997, Cinque & Rizzi 2008, Rizzi 2013 a.o.), it is assumed that the clausal structure consists of a series of hierarchically organized highly specialized projections, that each encodes one specific property of clause structure. More specifically, each head is assumed to correspond to one single feature. As is discussed in section 3, this approach to clause structure provides a great number of positions in which labeling can proceed through agreement. Additionally, the implications of the LA for movement and halting permit us to capture variation in word order patterns without having to abandon the idea that the clausal hierarchy is universal. Then, in section 4, I will explore C ­ homsky’s (2013) suggestion that the LA seeks features only, a natural consequence of the cartographic approach to clausal structure: in the cartographic framework, it is assumed that each feature corresponds to one terminal node, i.e., there is one feature per head (Cinque & Rizzi 2008). Under this view, the ‘closest head’ which provides the label of a syntactic object coincides with the ‘closest feature’, and the assumption that only features provide labels naturally follows. I will show that a cartographic approach to syntax has several advantages in the light of the LA. More particularly, I show that a cartographic approach widens the potential applications of label-driven movement.

2 Labeling, movement and halting In Chomsky’s most recent interpretation of the role of labeling in syntax (Chomsky 2013), syntactic objects need a label in order to be interpretable at the interfaces.3 Labels are determined according to the following labeling algorithm (LA henceforth): (1) α receives the label of the closest head, with H1 the closest head of α iff: (i) α contains all occurrences of H1 and (ii) there is no H2 such that α contains H2 and H2 c-commands H1. According to this definition, labeling is determined by Minimal Search. Note that Minimal Search is also relevant for other syntactic operations such as Agree. Hence, no additional stipulations are needed.

3 See Adger (2013) and Cecchetto & Donati (2015) for an opposing view. According to these authors, labels are primarily necessary within the syntactic computation, and not relevant at the interfaces. Collins (2002) on the other hands argues that labels are not a part of grammar, while Citko (2008) and Hornstein (2009) argue that labels may be absent in some particular syntactic contexts.

72 

 Aleksandra Vercauteren

Three merger configurations are possible: –– Merge (X, YP)4: the merger of a head with a non-head. –– Merge (XP, YP): the merger of two non-heads. –– Merge (X, Y): the merger of two heads. The first situation is straightforward: when a head is merged with a non-head, α is labeled by the head. The second situation is more problematic: when a non-head merges with a non-head, a labeling conflict arises since the heads of both XPs are equidistant to α. This labeling conflict can be resolved in two different ways: move or agree. Presumably moving a constituent makes the lower copy invisible to the LA (Chomksy 2013: 44), in a way similar to what happens in intervention contexts: the foot of a chain does not give rise to intervention, only the head does. Alternatively, XP and YP agree with each other and the agreeing feature provides the label of the new syntactic object. Note that two classes of elements can provide a label: heads and features. Related to this observation, Chomsky (2013: 45) suggests that the LA seeks features only. Chomsky (2013) limits the discussion of labeling a syntactic object created by head-head merge to cases where a root is merged with a categorial head. In this case, it is presumably the categorial feature that will label the new head, since a root “does not qualify as a label” (Chomsky 2013: 47); see Alexiadou & Lohndal (2014) for empirical evidence. It is not clear what happens in other cases of head-head merge, such as for instance v-T merger. The LA as proposed by Chomsky (2013) has two important implications for the theory of movement: a labeling conflict can force movement of XPs and agreement becomes a necessity for halting, instead of for movement, as it used to be in earlier versions of the Minimalist Program. Given the LA, movement is forced in case the merger of two non-heads gives rise to a structure that cannot be labeled, by lack of agreement. This approach to movement has the advantage that it does not assume agreement as a prerequisite for movement. This is a problematic assumption for the reasons discussed in the introduction. In more recent theories, movement, an instance of merge, is assumed to come for free: it occurs because it can. However, this assumption highly overgenerates, since not all instances of movement give a grammatical result. Movement thus has to be restricted. The LA does exactly this: movement can only occur if there is a landing place where the moved constituent can find

4 For ease of exposition, I will use XP to refer to elements that are not heads in order to distinguish heads from syntactic objects that received their label from a head. This is a purely notational option.

Features and labeling: Label-driven movement 



 73

a label, otherwise the newly formed structure will not be interpretable at the ­interfaces. The LA thus allows us to maintain that move comes for free, while restricting its applications. Additionally, there are contexts in which movement is obligatory, and mechanisms are needed to ensure movement in these cases. With the LA, movement will be forced in case of a labeling conflict. The following example, taken from Rizzi (2012: 5, ex. 20b, c), illustrates the LA at work. In English, a wh-element cannot stop in the intermediate C-system selected by non-interrogative verbs, as illustrated in (2)a; it has to move further to the higher C-system, as in (2)b: (2) a. *You think [α [whichQ book] [C [Bill read __]]]? b. [β[WhichQ book] [ Q [you think [α__ C [ Bill read__]]]]]?5 c.



c think

α → CP QP

whichq book

CP n

C n

I Bill read Q



When [which book] merges with the intermediate CP a labeling conflict arises between a Q-element and a non-Q-element: as a consequence, one of the constituents is forced to move further, in this case QP. After QP moves out of α, it exits the competition for labeling, and α is labeled by C.6

5 The examples do not take into account verb raising to C, which is not relevant for the current discussion. 6 An important question in relation to this derivation is why it is QP that moves further and not CP, a question also raised for a similar context by Chomsky (2013: 44, fn. 36). We could assume this is due to economy principles: the presence of one movement chain is more economical than that of two. Since QP has reached the intermediate CP through movement (it originates in VP), it has already created a movement chain. CP on the other hand has not moved yet. If it were to move to solve the labeling conflict, a new movement chain would be created. On the other hand, if QP moves, an existing chain is being enlarged instead of creating a new one, a more economical and hence more desirable option. This logic however does not always apply. Take for instance the situation in which a subject DP is merged with vP. Neither of both XPs is part of a movement chain, so in theory either of both can move. The choice between the two XPs will have consequences for the further derivation. I refer to Chomsky 2013 for a brief illustration of this matter.

74 

 Aleksandra Vercauteren

In indirect interrogatives, on the other hand, the wh-constituent ends up in the intermediate C-system, as in (3). When the wh-constituent reaches the intermediate CP, the new node can be labeled QP, since the QP agrees with the interrogative C-head. (3) a. John wonders [α [whichQ book] [Q [Bill read __]]]. b.

… wonders

α → QP

QP

CP n CQ

whichq

book



I

n Bill read Q

With the LA, the relation between movement and agreement changes in comparison with agreement-triggered movement: agreement is a prerequisite for the labeling of an XP-YP complex and thus for halting. A consequence of this is that all movement of non-heads has to be assumed to target an agreement position. This straightforwardly follows from a cartographic approach to syntax, as I will discuss and illustrate in section 3. Furthermore, not all instances of obligatory movement can be reduced to a labeling conflict. More particularly, only ‘Spec-to-Spec’ movement is a candidate for label-driven movement.7 Take a look at the following structure: (4)

α ZP



XP X

YP

The LA can account for why ZP moves: α cannot be labeled since Z and X are equidistant to it. The solution, by lack of agreement, is for either ZP or XP to move. X and YP on the other hand will never be forced to move because of a labeling conflict, so we would need other mechanisms in order to account for why these elements sometimes have to move (i.e., why they cannot remain in their first merge position). In

7 The term specifier is used in a purely descriptive manner, no a priori specifiers are assumed. Specifiers are non-heads merged with non-heads.



Features and labeling: Label-driven movement 

 75

section 4, I will discuss how a revision of what is traditionally considered to be a head augments the inventory of syntactic objects that can be forced to move because of a labeling conflict. I wish to underline that I do not argue that all movement should be reduced to a labeling conflict, and I do not exclude that the causes of obligatory movement can be various. I am simply evaluating what types of movement can be reduced to a labeling conflict.

3 The LA and agreement Given the assumption that labels are needed at the interfaces in order for a syntactic object to be interpretable (Chomsky 2013: 43), agreement is a necessary prerequisite for several instances of XP movement, not in the sense that movement is triggered by agreement, but that agreement is needed in order to ensure halting. In other words: if you move, make sure you find a place to stop. Given that labeling can be done by a head or through agreement, we expect that all XPs should (i) be merged with a head or (ii) agree with its sister, as illustrated below. (5) a. {FP F, XP} b. {FP XPF, YPF} These two available options for merge XP imply that, when an XP moves, it can only stop moving (the ‘halting problem’) if it agrees with the syntactic object it is merged with. Although agreement is quite standardly assumed to be involved in the derivation of wh-interrogatives (see example (3) above), the existence of agreement in several other instances of A’-movement is controversial (see Aboh 2010 for a discussion). Take fronting for instance. The term fronting here is used in the sense of XP movement to the left periphery of the clause for scope-discourse purposes, such as focus fronting (a), VP fronting (b) or topicalization (c): (6) a. IL TUO LIBRO ho letto [-] (non il suo) the your book have.1s read not the his ‘Your book I read (not his).’ b. I said that I would solve this problem, and solve this problem I certainly did [-]. c. To Mary, I gave a book about linguistics [-]. The relevant point of the derivation is that in which the fronted constituent merges at the root of the clause, an instance of XP-YP merger. Given the LA, this

76 

 Aleksandra Vercauteren

gives rise to a labeling conflict because both the head of the fronted constituent and the C head are equidistant to the node to be labeled: (7)

α → ?? DP

CP

il tuo libro C

TP ho letto



If labels are needed for interpretability, and if labeling can be accomplished through agreement, we need to assume that the fronted constituent agrees with (a head in) the CP. This is only possible if C has discourse related features such as topic and focus features, or if, as is assumed in the cartographic framework, there are dedicated scope-discourse heads in the left periphery of the clause with which the fronted constituent can agree. For instance, a fronted constituent with a focus feature could be assumed to merge with a syntactic object containing a Foc head, assumed to be equivalent to a focus feature (see section 4), in which case the focus feature provides the label for the newly created syntactic object through agreement: (8)

α → FocP DPfoc il tuo libro Foc

FocP CP ho letto

The variety of functional heads that are argued to exist in the cartographic framework thus provide the necessary halting place for several instances of XP-movement, and XP merger in general. Additionally, a cartographic approach to syntax is not only advantageous for a theory including the LA, the opposite is also true. One of the main weaknesses of cartography is that it is not entirely clear how the universal hierarchy is exploited from a cross-linguistic point of view. In standard cartographic theory, movement is assumed to be a last resort operation: along the lines of Chomsky (1995), movement is triggered by the need to check uninterpretable features. This system however reveals to be too rigid to account for language variation and optional scope-discourse movement. In the case of focus fronting for instance, it is assumed that the focalized constituent has a focus feature that needs to be checked in a Spec-head



Features and labeling: Label-driven movement 

 77

configuration with a focus head (Rizzi 1997, Belletti 2004 a.o.), hence requiring overt or covert displacement of the focalized constituent. Although focalized constituents can undergo movement in languages such as Italian (Rizzi 1997) or even consistently seem to do so in languages such as Hungarian (Brody 1990, Kiss 1998, but see Szendröi 2003 or Kiss 2010 for opposing views), focalized constituents cannot be assumed to consistently undergo displacement: some foci simply do not move (Costa 2004, Vercauteren 2015). It is not clear how to account for this variability in a system where all uninterpretable features need to be licensed in a local configuration with the licensing head. An alternative to the focus feature triggered movement is to assume that focalized constituents move because they can, along the lines of Chomsky’s (2001) approach to movement. However, given the LA, if a focalized constituent moves, it has to find a label in its landing place. This is where the functional focus heads come in: they provide the necessary features in order to ensure labeling through agree. With free movement in combination with the LA, one of the basic insights of the cartographic framework can be maintained, namely the observation that crosslinguistic word order patterns seem to be reducible to a universal hierarchy (Cinque 1999, Cinque & Rizzi 2008 a.o.). The role of the universal hierarchy can however be weakened: the functional heads are present from a cross-linguistic point of view but they are not necessarily exploited. More specifically, several functional ‘projections’, such as scope-discourse projections, are only exploited in case a moving XP needs a place to find a label. This approach would explain for instance why the left periphery is a typical position for focalized constituents, while also accounting for the fact that focus does not need to surface in this position. Note that a theory of syntax with the LA under current assumptions requires that several features be present twice in order for labeling through agreement to be possible. For instance, in order for the focalized constituent in (8) to be able to halt, a second focus feature is needed on a head in the clausal spine. The question whether this type of feature doubling is desirable is extremely relevant, as discussed in detail by Starke (2004), who defends that there is no such thing as empty functional heads because this would be a redundancy. Alternatively, he proposes that the features of lexical items themselves function as the ‘functional head’ in the clausal spine. In this system, the one adopted in Nanosyntax, agreement and movement are entirely separated. Movement is triggered by Spell-Out: if a syntactic object does not comply with an a priori functional hierarchy, Spell-Out will not succeed. For instance, if a given constituent has a Q-feature, it will have to be merged in the CP of the clause in order to comply with the universal functional sequence. Of course, this view on movement presupposes an a priori universal hierarchy, part of Universal Grammar, a claim that is highly controversial (see Ramchand

78 

 Aleksandra Vercauteren

& Svenonius 2014, Rizzi 2013 a.o.). Additionally, such an approach is too rigid, for the same reasons as the ones discussed above concerning movement in the standard cartographic framework: it is not clear how cross-linguistic variation can be accounted for, or how optional movement can fit in in such a theory (but see Starke 2011 for some suggestions). I will thus maintain that it is not a problem that a lot of features be present twice in order for the derivation to converge. If feature doubling in the sense described above effectively reveals to be a wrong or unnecessary assumption for syntactic theory, than the LA in its current terms has to be abandoned and an alternative account for halting has to be found. A solution could lie at the interfaces: what exactly do the interfaces need from syntax? It might be the case that not all syntactic objects need a label or that labels are not needed at the interfaces (see for instance Citko 2008, Hornstein 2009 and Cecchetto & Donati 2015), or that labels are not needed at all (Collins 2002). As this issue is way beyond the scope of this paper, I will not discuss it any further. As is clear from the brief discussion in this section, the LA as formulated by Chomsky (2013) conjugates very well with a cartographic approach to syntax. On the one hand a cartographic clausal spine straightforwardly provides a series of necessary halting places for XP merger, and on the other hand the LA permits to account for word order variation while maintaining the advantages of the highly articulated functional hierarchy that cartographists argue to exist.

4 Labeling by features Given that the combination between cartography and the LA has some interesting advantages, we might wonder whether a cartographic view on syntax has other advantages in combination with the LA. More particularly, I will evaluate the consequences of the cartographic hypothesis that there is no substantial difference between heads and features: heads are features (Cinque & Rizzi 2008). Under this view, we can eliminate one of the classes of objects that can provide a label to a syntactic structure. As illustrated in section 2, in case of XP-YP merger, agreeing features provide the label of the new node, while in case of X-YP merger, it is the head X that does. If we change our view on what is considered to be a head, we can straightforwardly reduce the class of labeling elements to features. Apart from eliminating a class of labeling objects, under this view several instances of head-head merger can be subsumed under X-YP merger, allowing for labeling under Minimal Search, and several instances of ‘head’ movement can be assumed to be forced by a labeling conflict, on a par with XP movement.



Features and labeling: Label-driven movement 

 79

4.1 Features are hierarchically organized As referred to above, in Chomsky’s (2013) proposals both heads and features are able to provide labels for syntactic objects. An important question is thus if we can unify the class of ‘labelers’, perhaps by eliminating either heads or features as possible labelers. Indeed, Chomsky (2013) suggests that it might be the case that only features provide labels for syntactic objects. Since features are also the relevant elements when it comes to probe-goal relations and more specifically agreement relations, this would be quite natural. If only features can provide the necessary labels for syntactic objects, an important question is which features? For instance, a noun can have several features, like a categorial n-feature, ϕ-features, [Def], [Num], etc. Which feature will ultimately label the whole NP? Since the LA is an instance of Minimal Search, the answer could lie in the way features are organized in syntax. There are two main views: according to the traditional view, features are bundled together in functional heads and in lexical items. According to cartographic views, features are hierarchically organized throughout. I will argue that the second view is not only more adequate, it also has advantages for the LA. Traditionally, Lexical Items, which lexicalize heads, are assumed to be bundles of features. As Chomsky suggests in his paper on phases (2001: 101), a language L selects features from the set of all features available in human language and assembles them in the Lexicon as Lexical Items. In other words, all the elements contained in the Lexicon consist of features. In frameworks such as the Nanosyntax framework (see Caha 2009; Starke 2009 a.o.) and Distributed Morphology (Halle & Marantz 1993, 1994; Harley 1994 a.o.) and in work by many authors working in the cartographic framework (see Cardinaletti & Starke 1999; Poletto 2000, 2006; Cinque & Rizzi 2008 a.o.), it is assumed that words (and morphemes) are assembled by the same syntactic operations that derive “higher” levels of the clause: merge of features is binary and gives rise to hierarchical structures. Lexical Items are thus little pieces of syntactic structure, even if they do not always look like it, maybe due to the fact that words are phases (see the l-syntax from Hale & Keyser 1993 and Marantz 2001). One advantage of such an approach is that we apply the same hierarchical logic throughout syntax: it is standardly assumed, not only in the Cartographic Framework, that there is an ordered hierarchy of the heads on clausal level, with C on top of T on top of v on top of V, and never in a different order.8 If these functional heads are hierarchically ordered, why wouldn’t features be? There

8 See Ramchand & Svenonius (2014) for a discussion of this basic hierarchy, and the need to explain why it is as it is.

80 

 Aleksandra Vercauteren

is evidence for the idea that lexical items have internal, hierarchical structure, since they can be morphologically decomposed. I refer to the relevant literature for evidence (Harris 1995, Ramchand 2008, Caha 2009, Taraldsen 2009, Alexiadou 2010 and many others). Hence, I will assume that Lexical Items, and thus several of those elements which we standardly call heads are taken to be complex sets of features with an internal syntax. Following the Nanosyntactic view, I will assume that these little pieces of structure can be stored in the Lexicon as a whole, reducing computational burden in the same way as the ‘traditional’ Lexicon does. 9 Given the assumption that features are hierarchically organized in the way we standardly assume that heads are, the opposition between heads and features basically dissolves, and features become equivalent to heads. This is exactly what is assumed in the cartographic framework: each head in the clausal spine is assumed to correspond to one single feature. In the previous section I argued that such an approach to clause structure is beneficent for a theory with the LA under its current terms. In the remainder of this section I examine what the consequences are for ‘head’-movement and Lexical Insertion.

4.2 Heads and labeling Returning to the hypothesis that only features provide labels for syntactic objects, if we assume that features are joined in bundles without any internal hierarchy, we would expect that all features of a head are equally able to label a structure, since all the features of the head would be equidistant to α. This would provoke labeling conflicts each time a head with several features is involved, not a welcome output.10 If we assume that features are hierarchically organized, and thus function the way we standardly assume heads do, i.e. with one feature corresponding to each terminal (Cinque & Rizzi 2008), Minimal Search will decide which feature gets to label the syntactic object: the feature that is closest to α. We can thus reformulate the LA as follows: (9) α receives the label of the closest feature, with F1 the closest feature of α iff: (i) α contains all occurrences of F1 and (ii) there is no F2 such that α contains F2 and F2 c-commands F1.

9 I thank a reviewer for asking me to clarify the status of the Lexicon under this view on lexical items. I refer to Starke (2011) for more details. 10 Of course, this would not be a problem in case labeling is established by Agree: it is quite clear that agreement can involve several features (for instance number, gender and person), all of which would provide the label. However, it is not clear to me whether all lexical items are ultimately in an agreement configuration.

Features and labeling: Label-driven movement 



 81

This small change provides a way to determine what features count for the labeling algorithm, by referring to the notion of Minimal Search only. This small change has potentially interesting consequences: several instances of head-head merge actually involve an XP, in which case labeling is straightforward, unlike under the ‘labeling by a head’ hypothesis. Additionally, given that several elements that traditionally are assumed to be heads are actually XPs, several instances of head movement can be argued to be due to a labeling conflict. As the operation of merge is currently conceived, the only configuration where two bare features are merged, resulting in head-head merge, would be the case of the merger of a root with a categorial feature. Following a suggestion by Chomsky (2013), elaborated by Alexiadou & Lohndal (2014), roots have no capacity of labeling since they consist of conceptual information only, irrelevant for syntax.11 The root being incapable of providing a label, the only candidate remaining is the categorial feature, as illustrated below: α → CatP

(10)

categorial feature

root

For each syntactic object subsequently formed by the merger of a feature with this root-category complex, labeling is straightforward. The new feature which is merged with a complex syntactic object will inevitably provide the label, since it is the highest one, hence the closest to α: (11)

F3P F2P

F3 F2

F1P F2

CatP

categorial feature root I point out in passing that this way of looking at things also has another advantage. Since Chomsky (2001: 37–38) there has been discussion as to whether

11 See Marantz (1997, 2001: 10) and other authors working in Distributed Morphology, see also the ‘naked roots view’ discussed in Ramchand (2008: 11). The same solution for the head-head merge problem is put forward by Chomsky (2015).

82 

 Aleksandra Vercauteren

head movement should be conceived of as part of syntax, the main question being why head movement is possible (see Hornstein 2009 and Roberts 2010 a.o. for a discussion). In the nanosyntactic approach which I refer to here, ‘head movement’ is actually dispensed with since what is usually called head movement is not really head movement at all. It is a case of regular XP movement, with XP being a syntactic object built of hierarchically ordered features. Let us briefly discuss the implications of the LA in relation to the reanalysis of heads as atomic features, focusing on the syntax of verbs in the following sentence: (12) The boy has been playing the guitar. The verb been is derived by merging a root expressing the concept be, a categorial feature [v] indicating that the word is a verb and a feature [perfective] indicating the aspect. The verb playing is built in a similar way, but there is a feature [progressive] instead of the [perfective] feature. The aspectual features label the whole syntactic object. (13)

PerfP perfective

VP V

(14)

be

ProgP progressive

VP V



play

Then the verb playing and the direct object DP are merged. Here, a labeling conflict arises: ProgP (the verb playing) merged with the object DP cannot be labeled since there is no common feature, so one of the two constituents is forced to move, in this case ProgP moves. (15)

vP v

α → DP ProgP

DP

Features and labeling: Label-driven movement 



 83

Once vP is completed, functional heads, or features under the current view, are merged and syntactic objects already derived are merged in the right place, i.e. the place where they can be labeled. Recall that at this level we are mainly dealing with XPs merging, so in order to get a label, there has to be a feature shared by the two syntactic objects being merged. Assuming that Cinque’s (1999) hierarchy is correct, the clausal spine contains a progressive head and a perfective head. These provide the necessary label through agreement for the syntactic object that is formed by the merger of the ProgP playing and the PerfP been:12 (16)

…P …

PerfP PerfP

PerfP VP perfective

Perf V

be

ProgP ProgP

ProgP

V

ProgP VP progressive

VP

play

the guitar

Once everything is merged in the right place, all nodes have a label and the structure can be Spelled Out. Of course, this is nothing more than an illustration with several shortcomings. For instance, lexical verbs do not move in English, so apparently playing finds a label at its first merge position in vP. The example works much better with Romance languages, since in these languages all verbs (can) move to TP. My main point here is that a theory in which it is assumed that features are hierarchically organized permits us to reduce some instances of ‘head’-movement to labeling conflicts, and as such to widen the application of label-driven movement.

12 Of course, for the revised version of the LA to work, we have to assume that all levels of the clause are as detailed as features, assuming the TP field is as in Cinque’s (1999) hierarchy or more detailed even, otherwise there would be labeling problems. This view is defended in Cinque & Rizzi (2008).

84 

 Aleksandra Vercauteren

An additional advantage of this approach to heads is that also ‘complement’- movement can be reduced to a labeling conflict. Note that ‘complement’ here is to be understood as XPs that are sisters of a head. Complements, being merged with a head, are never in a conflicting labeling situation under the original formulation of the LA, since the heads they are merged with will always be the closest head to the node to be labeled. So at first sight complements present us with a configuration where movement cannot be triggered by a labeling conflict. However, under the current view according to which several elements that are traditionally considered to be heads are actually phrases, several instances of ‘complement’-movement can be reduced to labeling conflicts. Take VP-fronting for instance. This is a construction in which the sister of T (or some other functional head in the TP-field) moves to the root of the clause: (17) I said that I would solve this problem, and [VP solve this problem] [TP I certainly did [VP-]]. The discussion of features and heads above, however, changes this perspective: some heads aren’t really heads but little pieces of structure, so several syntactic objects that we have considered to be complements aren’t really complements in the standard meaning of the term: they are not strictly speaking sisters of heads but of XPs. The element that is traditionally assumed to lexicalize the T head for instance, would consist of at least a root, a categorial feature, some tense feature and some phi-features: it is an XP. For simplicity’s sake, I will label this element TP in the structure below, although it probably has a different label.13 The main point is that it is not an atomic element. The vP is merged with it, which causes a labeling conflict in the absence of agreement, and vP moves in order to resolve this conflict. (18)

… vP

CP

solve this problem C

α TP

vP

did

13 Given that the person features are rightmost in verbal morphology, and given the Mirror Principle, the element in T is probably a PersonP.

Features and labeling: Label-driven movement 



 85

As became clear from the discussion in this section, if we take seriously the idea of applying a uniform logic throughout the computation, which should be the goal of any formal theory of language, and if we thus assume that features are hierarchically organized and assembled in the same manner as higher level syntactic objects, the LA can also be assumed to lie at the basis of some instances of ‘head’ movement and ‘complement’ movement. It thus permits us to widen the application of label-driven movement, a welcome result.

4.3 Lexical insertion The attentive reader might have noticed that the internal syntax of the ‘heads’ in the previous section does not correspond to their surface structure. Take the ProgP for instance, in which the [progressive] feature is merged with the VP and forms a ProgP. After linearization, this would be spelled out as ingplay, with the progressive morphology preceding the root, which is clearly wrong. The right feature order can be derived through movement of the V-root complex. This movement does however give rise to a labeling conflict, that cannot be solved through agreement. (19)

α VP V

ProgP Play prog

VP

The question arises how this element is labeled. It clearly is, since neither VP nor ProgP is forced to move further, they are Spelled Out as a unit. Several possibilities come to mind: (i) the LA is not an instance of Minimal Search; (ii) labels are not always required and (iii) words are special. The first hypothesis implies that labeling in case of XP-YP merge is not necessarily problematic, since the label is not provided by the closest head but through some other mechanism. It is however not immediately clear what this mechanism would be. What we certainly do not want in our theory is to establish an ad hoc distinction between ‘labeling heads’ and ‘non-labeling heads’. Another option would be to assume that also labeling ‘comes for free’: in the configuration above for instance, both the V head and the Prog head are equally able to label the whole structure. Of course, each choice will have consequences for the further computation. This sort of approach to labeling is explored for instance by Citko (2008) who argues that either one of two merged objects can provide a label, alongside of having a ‘mixed’ label provided by both or no label at all. One consequence of the

86 

 Aleksandra Vercauteren

‘free labeling’ hypothesis is of course that movement can never be the consequence of a labeling conflict, since labeling conflicts would simply not arise, and the LA loses a lot of its appeal. The second hypothesis has been explored by several authors, see Collins 2002, Citko (2008) and Hornstein (2009) for instance. Collins (2002) argues that labels are not a part of grammar, while Citko (2008) and Hornstein (2009) maintain that labels are a part of grammar but are not always necessary. Hornstein (2009) for instance argues that only arguments need labeling in order to be integrated into the clause, while adjuncts do not. The syntactic object that results from the merger of an adjunct with an XP can thus remain labelless. The question of whether labels are always necessary depends a great deal on one’s assumptions concerning the role of labels. Their role also determines when they are necessary: during the computation itself (as argued by Hornstein 2009 and Cecchetto & Donati 2010 for instance) or only for Spell-Out, as in Chomksy (2013). However, this is not the place to discuss all these hypotheses in detail, since the aim of this paper is to evaluate the LA from a cartographic point of view. I refer to the relevant literature for more details. Concerning the third option, namely that words are special, I will discuss two possibilities: words are special because lexical insertion can save label-less objects or words are special because word-internal movement is a post-syntactic PF operation. We could assume that words can save label-less objects in the sense that, if there is a lexical item available in the lexicon that can spell out the syntactic structure, labeling does not have to occur through Minimal Search, for instance because words stored in the Lexicon come with a label. Then the question arises which label the word will have. We could assume that all its internal features are visible on its label, as is standardly assumed. This would make it look like a feature bundle, instead of like a syntactic structure. This however implies an ad hoc exception to labeling through Minimal Search, not a welcome result. As an alternative we could assume that only one of the features labels the whole word, although it is not clear how to determine which feature that would be in a principled way. The hypothesis that words are special because lexicalization can save label-less objects is thus not very promising. The second hypothesis, namely that words are special because word-internal movement is a PF operation is more promising, and a standard assumption: lexical insertion is assumed to be a post-syntactic operation. Since these wordinternal movement operations only take place for lexical insertion to be possible (Spell-Out driven movement in the Nanosyntax framework), they can very well be assumed to take place post-syntactically. Since this is post-Spell-Out



Features and labeling: Label-driven movement 

 87

movement, and given the assumption that labels are needed for syntactic objects to be interpretable at the interfaces, i.e., when they are spelled out, we can assume that the lack of a label resulting from post-Spell-Out movement is not problematic, and thus that syntactic objects like the one in (19) can remain labelless. In this section I discussed a potential issue for the view on features defended here in combination with the LA, namely Lexical Insertion. I showed that potential labeling conflicts do not arise if we assume that movement operations that are required in order for Lexical Insertion to be possible occur after Spell-Out, and are purely PF operations, as is standardly assumed.

5 Conclusion In this paper I explored the LA in combination with a cartographic approach to syntax. First I argued that a cartographic hierarchy of the clause provides the necessary halting places for several XP movement and merger operations, since they provide the necessary functional heads for labeling through agree. Additionally, the LA permits us to maintain one of the basic insights of cartography, namely that there is a universal highly articulated hierarchy of strictly ordered functional heads, while allowing for cross-linguistic variation in word order patterns and optional movement operations: the functional heads are only exploited when movement occurs, they (or features on them) do not trigger movement themselves. After showing the strength of a cartographic approach to syntax in combination with the LA for general word order patterns, I turned to the sub-word level. I illustrated that a cartographic approach to words, as in distributed Morphology or in Nanosyntax, according to which words have internal structure that is derived by the same mechanisms as higher-level syntactic objects, augments the number of contexts in which a labeling conflict can be assumed to force movement. Under the original formulation of the LA (Chomsky 2013), only ‘Spec-to-Spec’ movement could be assumed to be triggered by a labeling conflict, as only XPs merged with YPs could ever be in a conflictive labeling situation. If we assume that features are hierarchically organized throughout, and thus also on the sub-word level, numerous instances of ‘head’-movement can also be subsumed under label-driven movement. Additionally, also ‘complement’-movement can be due to a labeling conflict. The net outcome of this discussion is thus that cartography and the LA benefit from each other.

88 

 Aleksandra Vercauteren

Acknowledgments: I thank Liliane Haegeman for the helpful and thought provoking comments on the first version of this paper, and Eric Lander for discussion of the nanosyntactic ideas included here. I also thank the audience of the WoSS (Madrid 2013) and the DGfS workshop on Roots and Labeling (Marburg 2014). Finally, I thank two anonymous reviewers for very interesting and relevant comments. Needless to say that I am entirely responsible for all the shortcomings in this paper. The ideas presented here were born as part of a research funded by FWO13/ASP_H/258.

References Aboh, E. 2010. Information structuring begins with the numeration. Iberia, 2(1). Adger, D. 2013. A syntax of substance. Cambridge, MA: MIT Press. Alexiadou, A. & T. Lohndal. 2014. The structural configurations of categorization. DGfS workshop on roots and labeling. Marburg, Germany. Benincà, P. & C. Poletto. 2004. Topic, focus and V2. Defining the CP sublayers. In L. Rizzi (ed.), The structure of CP and IP, 52–75. Oxford: Oxford University Press. Boeckx, C. 2010. Defeating lexicocentrism. ICREA/Universitat Autònoma de Barcelona. Caha, P. 2009. The nanosyntax of case. Tromso, Ph.D. dissertation. Cardinaletti, A. & M. Starke. 1999. The typology of structural deficiency: A case study of the three classes of pronouns. In H. van Riemsdijk, Clitics in the languages of Europe, 145–233. Berlin/New York: Mouton de Gruyter. Cecchetto, C. & C. Donati. 2010. On labeling: Principle C and head movement. Syntax, 2(13). 241–278. Cecchetto, C. & C. Donati, C. 2015. (Re)labeling. Cambridge, MA: The MIT Press. Chomsky, N. 1995. The minimalist program. Cambridge, MA: The MIT Press. Chomsky, N. 2001. Derivation by phase. In M. Kenstowicz (ed.), Ken Hale: A life in language, 1–52. Cambridge, MA: The MIT Press. Chomsky, N. 2014. Problems of projection. Lingua. Chomsky, N. 2015. Problems of projections. Extensions. In E. Di Domenico, C. Hamann & S. Matteini (eds.), Structures, strategies and beyond: Studies in honour of Adriana Belletti, 1–16. Amsterdam: John Benjamins. Cinque, G. 1999. Adverbs and functional heads: A cross-linguistic perspective. Cambridge, MA: Oxford University Press. Cinque, G. & L. Rizzi. 2008. The cartography of Syntactic Structures. Studies in Linguistics CISCL Working Papers 2. 42–58. Citko, B. 2008. Missing labels. Lingua 118. 907–944. Collins, C. 2002. Eliminating labels. In S. Epstein, & T. D. Seely (eds.), Derivation and explanation in the minimalist program, 42–65. Oxford: Blackwell. Costa, J. 2004. Subject positions and interfaces: The case of European Portuguese. Berlin: Mouton de Gruyter. Hale, K. & S. J.Keyser. 1993. On argument structure and the lexical expressions of syntactic relations. The view from building 20: Essays in linguistics in honor of Sylvain Bromberger, 53–110. Cambridge, MA: MIT Press.



Features and labeling: Label-driven movement 

 89

Halle, M. & A. Marantz. 1993. Distributed morphology and the pieces of inflection. In K. Hale, & S. Keyser (eds.), The view from building 20, 111–176. Cambridge, MA: The MIT Press. Harley, H. 1994. Hug a tree: Deriving the morphosyntactic feature hierarchy. In A. Carnie, & H. Harley (eds.), MITWPL 21: Papers on phonology and morphology, 275–288. Cambridge, MA: MITWPL. Marantz, A. 2007. Phases and words. In S.-H. Choe (ed.), Phases in the theory of grammar, 191–222. Seoul: Dong-In. Poletto, C. 2000. The higher functional field. Evidence from Northern Italian dialects. New York: Oxford University Press. Poletto, C. 2006. Doubling as economy. Working Papers in Linguistics 16. 211–235. Ramchand, G. 2008. Verb meaning and the lexicon. A first phase syntax. Cambridge: Cambridge University Press. Ramchand, G. & P. Svenonius. 2014. Deriving the functional hierarchy. Language Sciences, 152–174. Rizzi, L. 1997. The fine structure of the left periphery. In L. Haegeman (ed.), Elements of grammar. Handbook in generative syntax, 281–337. Dordrecht: Kluwer. Rizzi, L. 2012. Labeling and criteria. EALING 2012 – Blaise Pascal Lectures. Paris. Rizzi, L. 2013. Notes on cartography and further explanation. Probus 25(1). 197–226. Starke, M. 2004. On the inexistence of specifiers and the nature of heads. In A. Belletti (ed.), Structures and beyond: The cartography of syntactic structures, 252–267. New York: Oxford University Press. Starke, M. 2011. Towards elegant parameters: Language variation reduces to the size of lexically stored trees. Taraldsen, T. 2009. Lexicalizing number and gender in Colonnata.

Petr Biskup

Labeling and other syntactic operations Abstract: This article proposes certain modifications to the minimalist system, among which labeling plays a prominent role. It argues for a specific model of cyclic Transfer, where every operation Merge constitutes a phase. The operation labeling is a prerequisite of Transfer and can be delayed. This allows syntactic objects to escape from a phase. Because of the early Transfer, movement is triggered by a greedy feature on the moving syntactic object. It is shown that the proposed system can straightforwardly derive the following movement phenomena: freezing effects, order preservation in multiple movement, the prohibition of headless XP-movement and the ban on acyclic combinations of incorporations. It is also shown that the proposal has certain advantages over Chomsky’s minimalist system.

1 Introduction Chomsky (2013) argues that the operation labeling can be delayed. In his proposal, labels are determined by a labeling algorithm that operates at the phase level along with other operations. This means that first the phase structure is built and then – at the phase level – the whole phase is labeled. For instance, in the case of successive-cyclic movement, labeling of a CP with a moving syntactic object in Spec, CP must wait at least until the next phase head v merges and attracts the moving syntactic object to its edge. Only then can the complement CP (and the higher phrases) be labeled and consequently be properly interpreted at the interfaces. In this article, I follow Chomsky (2013) in assuming that labeling operates at the phase level. Since in my proposal every operation Merge constitutes a phase, as we will see below, labeling (and Transfer) should operate after every Merge. This, however, would block syntactic movement. Therefore, I also follow Chomsky (2013) in that labeling can be delayed. It does not have to happen immediately after Merger of the appropriate syntactic object; it follows movements related to the to-be-labeled syntactic object. Since labeling is necessary for interpretation, it means that movement feeds the operation labeling and labeling in turn feeds Transfer. To maintain advantages of the cyclic proposal, I assume that the moving syntactic object does not wait for a probe. Instead, movement is triggered by an uninterpretable feature on the moving syntactic object. DOI 10.1515/9781501502118-005

92 

 Petr Biskup

The first motivation for setting up the system in this way is that it provides us with a straightforward analysis of several movement phenomena. It will rule out ill-formed sentences with freezing effects, acyclic incorporations or headless XP-movement and derive order preservation effects in multiple movement. The proposed system also has other benefits. For instance, it does not employ the Phase Impenetrability Condition (PIC) and null phase heads; it does not stipulate special properties of phases and it can derive successive-cyclic movement targeting every phrase on its path. The article is organized as follows. Section 2 briefly introduces the four constraints on movement. Section 3 discusses properties of the proposed system and advantages it has over Chomsky’s approach. Section 4 demonstrates how the proposed system works and how it derives the four movement phenomena discussed in section 2.

2 The phenomena 2.1 Freezing effects It has been observed that extraction out of moved syntactic objects is degraded; see, for instance, Ross (1967), Postal (1972), Wexler & Culicover (1980), Freidin (1992), Diesing (1992), Collins (1994), Müller (1998), Boeckx (2008), Gallego (2009), Lohndal (2011). Some definitions of freezing phenomena can be found in Ross (1967: 160, Frozen Structure Constraint), Wexler & Culicover (1980: 119, Freezing Principle), Diesing (1992: 128, Revised Extraction Constraint), Müller (1998: 22, Freezing). Generally, freezing can be schematized as in (1), where angle brackets mark copies. (1) * … β … [αP … …] … The following example, taken from Lasnik & Park (2003: 651), demonstrates the freezing effect for English subjects. As opposed to extraction from the subject in situ in (2a), with the subject position occupied by there, extraction from the moved subject is ungrammatical, as shown in (2b). (2) a. [Which candidate]1 were there [posters of t1] all over the town? b. *[Which candidate]1 were [posters of t1]2 t2 all over the town? As far as extraction from objects is concerned, consider the Czech example in (3). Czech is an SVO language and allows left branch extraction. Therefore, when the direct object occurs in situ, as in (3a), extraction of the possessive čí is possible. However, when the object is moved, as in (3b), the extraction of čí is ungrammatical.



Labeling and other syntactic operations 

(3) a. Čí Pavel políbil [t1 sestru]? whose Pavel kissed sister ‘Whise sister did Pavel kiss?’ b. *Čí1 Pavel [t1 sestru]2 políbil whose Pavel sister kissed

 93

t2?

The following example from Postal (1972: 213) shows the freezing effect for extraction from a topicalized PP in English. In contrast to extraction from the in situ position in (4a), extraction from the topicalized PP in the lowest clause in (4b) or in the higher clause in (4c) is ungrammatical. (4) a. Who1 do you believe Mary thinks Joan talked [to t1]? b. *Who1/Whom1 do you believe Mary thinks [to t1]2 Joan talked? c. *Who1/Whom1 do you believe [to t1]2 Mary thinks Joan talked? Similarly, extraction from the topicalized VP in German is bad, as shown in (5), taken from Müller (1998: 20). Example (5a) demonstrates that VPs can undergo topicalization and (5b) shows that movement of the embedded direct object was to the matrix clause is grammatical. Although both movements exist in German, extraction of was out of the topicalized VP is not possible, as shown in (5c). (5) a.   Ich denke [CP [VP das Buch gelesen]1 hat keiner t1].  I think the book read has no.one  ‘I think no one read the book.’ b. Was2 denkst du [CP t2 hat keiner [VP t2 gelesen]]?  what think you has no.one read ‘What do you think no one read?’ c. *Was2 denkst du [CP [VP t2 gelesen]1 hat keiner t1]? what think you read has no.one There are also some counterexamples to freezing; see, for instance, Abels (2007), Neeleman & Van de Koot (2010) (consider also discussions of various counterexamples in Broekhuis (2006), Boeckx (2008), Gallego (2009), Müller (2010)). In this article, I take the position that freezing is a real phenomenon and that in examples (2)–(5) there are genuine freezing effects.

2.2 The prohibition of headless XP-movement Takano (2000) discusses restrictions on remnant movement and observes that remnant movement of a syntactic object is not possible if its head moved out. He proposes the following generalization in (2000: 146): “Remnant movement of α is impossible if the head of α has moved out of α”. This is schematized in (6).

94 

 Petr Biskup

(6) * [XP … …] … X … In the German example (7a), from Haider (1990: 96), VP undergoes remnant movement and contains the copy of the finite verb, which has moved to the second position. (7b) then shows the control verb-second sentence with the topicalized subject. (7) a. * [Ihr ein Buch t1]2 gab1 Hans t2 her a book gave Hans b.   Hans gab ihr ein Buch   Hans gave her a book   ‘Hans gave her a book.’ The same phenomenon can be observed in English; consider (8), where remnant movement of Anna a book is ungrammatical. (8) a. Hans gave Anna a book. b. *[Anna t1 a book]2 Hans gave1 t2. According to Takano (2000: 145), adjectival phrases show a parallel behaviour. He assumes a Larsonian shell structure with the adjectival head moving from the lower AP to the higher AP, as shown in (9a). Example (9b) shows that remnant movement of the lower AP results in ungrammaticality. (9) a. Mary is [AP grateful1 [AP to John [A’ t1 for his help]]]. b. *It’s [to John t1 for his help]2 that Mary is grateful1 t2.

2.3 Order preservation in multiple movement It is well known that certain multiple movements preserve the base order; see Rudin (1988), Vikner (1990), Johnson (1991), Grewendorf (2001), Müller (2001), Williams (2003), Fox & Pesetsky (2005), among others. Order preservation effects – as schematized in (10) – can be found, for instance, with wh-movement in Bulgarian, pronoun fronting in German, clitic movement in Czech or object shift in Scandinavian. (10) […XP1 YP2 … t1 t2 ...] The following example, taken from Rudin (1988: 472–473), demonstrates the shape conservation effect with multiple wh-movement in Bulgarian. The wh-subject must precede the wh-object. vižda (11) a. Koj1 kogo2 who whom sees ‘Who sees whom?’

t1 t2?

Labeling and other syntactic operations 



b. *Kogo2 koj1 whom who

vižda sees

 95

t1 t2?

As to shape conservation effects in multiple clitic movement, consider the Czech example below. Since the indirect object precedes the direct object in base word order in Czech (Veselovská 1995, Kučerová 2007, Biskup 2011), (12) shows that clitic movement must preserve the pre-movement order. (12) a. Pavel ti1 ji2 představil t1 t2. Pavel you.dat her.acc introduce ‘Pavel introduced her to you.’ b. *Pavel ji2 ti1 představil t1 t2. Pavel her.acc you.dat introduce Another case of order preservation can be found in the object shift example in (13), from Müller (2001: 288, originally Vikner 1990). It shows that in Danish the shifted pronouns preserve their base word order. (13) a. Peter viste hende1 den2 jo Peter showed her it ‘Peter indeed showed her it.’ b. *Peter viste den2 hende1 jo Peter showed it her

t1 t2. indeed t1 t2. indeed

2.4 The ban on acyclic incorporation Baker (1988) argues that acyclic combinations of incorporations are not allowed. Consider the derivation in (14), where the higher head Y incorporates into the head X before the incorporation of the lower head Z (Baker (1988) uses right adjunction). According to Baker, the reason for ill-formedness of derivations like (14) is that the higher trace (t1) blocks antecedent government of the lower trace (t2). (14) X X X

YP Z2

Y1

t1

ZP

1

t2 2

96 

 Petr Biskup

The government analysis can explain the ungrammatical status of the Tuscarora example in (15), from Baker (1988: 364, originally Williams 1976). Specifically, (15) shows that preposition incorporation (with the preposition glossed as appl(icative)) cannot feed noun incorporation (of the prepositional complement child). According to Baker, the complement can never appear inside the verbal complex. The example can only receive the interpretation where the two objects – children and him – are switched, as shown in the translation. Baker’s reasoning, however, cannot be applied in the minimalist approach because there is no notion of government. (15) *Waˀ-khe-yat-wir-ahninv-ˀ-θ. past-1sS/3O-refl-child-buy-asp-appl ‘I sold him to the children.’ (ok as ‘I sold the children to him.’)

3 Assumptions The computational system works with syntactic objects, which consist of three sets of features, phonological, semantic and syntactic (formal); see, for instance, Chomsky (1995a: 394, 2001: 10), Collins and Stabler (2011: 1–2). Syntactic objects are combined by the operation Merge, which can be external or internal. In this respect, I follow Chomsky (2007: 8, 2008: 138–139, 2013: 8) and assume that Merge applies to two syntactic objects and creates a new syntactic object, a set containing the original objects, with no projection or order, as shown in (16). (16) Merge Merge(α, β) = {α, β}. That is different from the earlier version of Merge, where labeling was part of the process of forming a syntactic object. For instance, in Chomsky (1995a: 396–397, 2000: 133, 2001: 3), Merge creates a new syntactic object with a label, which is identical to one of the two original syntactic objects: Merge(α, β) = {α, {α, β}}, if α projects. This Merge – in contrast to (16) – is composed of two different operations, the set-constructing operation that combines α and β and the operation labeling, which constructs the superset with the label α (cf. Gärtner 2002: 64, Boeckx 2008: 84–85, Hornstein and Nunes 2008, Carnie 2010: 265).1 Because of the simpler form of Merge, Chomsky (2008, 2013) proposes a new labeling mechanism. Labels are

1 There is no consensus in the literature as to whether or not labels are necessary in the derivation and how labeling works; see, e.g., Collins (2002), Seely (2006), Cecchetto and Donati (2010), Collins and Stabler (2011), Adger (2013).



Labeling and other syntactic operations 

 97

determined by a fixed labeling algorithm based on minimal search, which operates at the phase level. According to Chomsky (2005, 2007), the third factor in growth of language in the individual, that is, principles not specific to the faculty of language, includes principles of efficient computation and efficient computation in turn requires some version of strict cyclicity. The idea behind the notion of cyclic computation is that what has been derived is not accessible to later stages of the derivation. This is ensured by phases in the late minimalist framework (Chomsky 2000 et seq.). Once a phase is transferred, it is mapped to the interfaces and then “forgotten”. If it is correct that phases reduce computational load, they should be as small as possible. This is explicitly stated by Chomsky in his (2005: 17) article: “What objects constitute phases? They should be as small as possible, to minimize computational load”. For this reason, I assume that every operation Merge produces a phase, as stated in (17) (cf. Epstein and Seely (2002), Biskup (2013), also Bošković (2007a), Müller (2010) for the proposal that every maximal projection is a phase or Matushansky (2006) for a strongly cyclic Spell-Out needed for head movement). (17) Phase Formation Merge constitutes a phase. Now we need to define a phase, which is done in (18). (18) Phase A phase is a syntactic object that is transferred to the interfaces. The idea that every operation Merge constitutes a phase in the derivation can be taken to be the null hypothesis since no justification of particular phase projections is necessary (cf. Epstein and Seely (2002), Bošković (2007a)). Chomsky (2000 et seq.) assumes that vP and CP are phases, possibly DP (Chomsky 2004, 2007). However, there are problems with the special status of these phases, for instance, with their propositional status, with their phonetic isolability, with the correlation between phases and the transferred domain, with reflections of successive-cyclic movement on the phase head; see Epstein and Seely (1999, 2002), Abels (2003: chap. 2), Boeckx and Grohmann (2007), Müller (2010), among others. Therefore, I define phases in terms of Transfer and do not run the risk of stipulation. Taken together, (17) and (18) say that the Merge operation produces a syntactic object that is transferred to the interfaces. It has been argued that successive-cyclic movement targets every phrase on its path; see Manzini (1994), Takahashi (1994), Agbayani (1998), Bošković (2002a), Boeckx (2003), Müller (2004, 2010), Boeckx and Grohmann (2007). This is ensured in the phase model if every Merge forms a phase, as suggested above and as demonstrated in (21) below.

98 

 Petr Biskup

The current proposal is more parsimonious than Chomsky’s phase system because it does not employ the PIC and null phase heads. Here is why. Chomsky’s model needs to define phases (see, e.g., Chomsky (2000: 106)), which is done in (18) in my proposal, and also determine which syntactic objects are phases (vP, CP), which is done in (17) here. Chomsky’s model also assumes Transfer but in contrast to my analysis it also proposes the PIC. The crucial difference is that in the current proposal phases are defined in terms of Transfer. Given that every Merge operation constitutes a phase and that a phase is a syntactic object that is transferred to the interfaces, the whole merged constituent is transferred (not only the phase complement), which means that the PIC is redundant. Concerning the null phase head, in Chomsky’s phase model, where the transferred part of the phase differs from the phase itself, only the complement of the highest phase head is transferred, not the head itself and its edge. Therefore, some special null phase head needs to be merged on top of the built structure to transfer the rest of the sentence and to ensure the correct interpretation of the sentence at the interfaces (see Biskup (2014) for a proposal along this line). Such a null phase head is not necessary in the current proposal since given (18) it is the whole phase that is transferred. Coming back to the labeling operation, I follow Chomsky (2013: 43) and assume that labeling operates at the phase level. Given that every Merge constitutes a phase, as stated in (17), one expects that labeling operates after every Merge (but see the discussion of delayed labeling below). The question arises what the relation between labeling and Transfer is. Since labeling licenses syntactic objects so that they can be interpreted at the interfaces (see Chomsky (2013: 43) and also Boeckx (2008: 84), Ott (2011: 63), Blümel (2013: 34) for relating asymmetry/labeling to mapping to the external systems), ideally, Transfer of a syntactic object should happen after the object was labeled; consider (19). (19) Transfer Only labeled syntactic objects are transferred. Labeling and Transfer could also take place at the same time because Chomsky (2013) is not specific about their relation; he only argues that labels are necessary for interpretation at the interfaces and that labeling takes place at the phase level, as Transfer and other syntactic operations except external Merge. However, if we assume the standard minimalist system with labeling taking place at the same time as Transfer, it will happen that some derivations will not be interpreted because Transfer can also transfer syntactic objects that cannot be labeled. In contrast, (19) ensures that Transfer does not transfer unlabeled (unlabelable) syntactic objects and that derivations will not crash because of the lack of a label.



Labeling and other syntactic operations 

 99

Inspired by Moro (2000), Chomsky (2008, 2013) proposes that in the case of the syntactic object {XP, YP}, there are two ways in which the element can be labeled. Either X and Y share some prominent feature, which is taken as the label of the syntactic object, or crucially, the syntactic object {XP, YP} is modified so that there is only one visible head, that is, one of the phrases is moved. I interpret it in the way that labeling generally follows movements related to the labeled syntactic object. This is a more economical option than to assume that labeling can also apply before movement. If labeling applied before movement, then it could be unsuccessful – in the case of {XP, YP} – which given (19) would make the derivation crash. Further, it could also happen in a case different from the above one that a syntactic object containing an uninterpretable feature relevant for movement will be labeled and transferred, which will again cause the derivation to crash. At this point, the question arises how long labeling (and consequently Transfer) is going to wait for the appropriate movement. Given the cyclic nature of the proposal – every Merge constitutes a phase –, the moving syntactic object should not wait for a (remote) probe; hence, movement should be of the greedy type. Otherwise strict cyclicity of the proposal would be lost or substantially weakened. Therefore, I assume that movement is triggered by an uninterpretable feature on the moving syntactic object (see Bošković 2007a). For ease of exposition, in what follows, I use the general “uF”, which stands for various features inducing movement. This feature forces the appropriate syntactic object to move immediately. Having said this, I now formulate labeling, as shown in (20). (20) Labeling For every phase, labeling of a phase follows movements triggered by uF contained in the phase. According to (20), a phase (i.e., the merged constituent) is labeled after all movements triggered by the movement feature uF present in the phase happened. This allows moving syntactic objects to escape from the to-be-transferred phase. Thus, labeling of a phase does not have to happen immediately after Merger of that phase. It also means that labeling can affect not only the newly built phase but also syntactic objects contained in that phase, similarly to Chomsky (2013). Recall that according to Chomsky, first the phase structure is built and then, at the phase level along with other operations, the whole phase is labeled. Let us now demonstrate with an abstract example how the proposal works. The syntactic objects α and β are externally merged and β bears the movement feature uF, as shown in (21a). Given the Phase Formation in (17), the merged syntactic object is a phase and given Phase in (18), the syntactic object must be transferred to the interfaces. Further, because of Transfer in (19), Transfer waits for labeling. As discussed above, the movement feature uF forces its bearer (here β) to

100 

 Petr Biskup

move, as demonstrated in (21b). With respect to the lower copy, I assume that the uninterpretable feature becomes inactive – is deleted – upon movement, in order not to cause the derivation to crash at the semantic interface. Then, given Labeling in (20), the phase is labeled and consequently transferred to the interfaces, as illustrated in (21c). In the next step, γ is merged from the lexical array, as in (21d), and uF forces β to move, as in (21e), and uF on the lower copy is deleted. Given Transfer (19) and Labeling (20), the phase {β, {α, β}} can be labeled and transferred, as shown in (21f). The same also holds for the phase {γ, {β, {α, β}}}; see (21g). Since derivations are assumed to proceed in the bottom-up fashion, labeling and Transfer of the phase {β, {α, β}} must precede labeling and Transfer of the phase {γ, {β, {α, β}}}. (21) a.

b. α

β [uF]

β [uF]

c.

[uF]

d. β [uF]

γ

α α

[uF]

β [uF]

e.

α α

[uF]

f. β [uF]

β [uF]

γ

[uF]

α α

[uF]

g. β [uF]

γ γ

α

[uF]



α

α α

[uF]

γ

α

[uF]

α α

[uF]



Labeling and other syntactic operations 

 101

In this way, the moving syntactic object moves up until it reaches its ultimate landing position and its uninterpretable feature is valued. Thus, labeling can be delayed in the sense that it does not have to happen immediately after the appropriate operation Merge. Since phases are defined in terms of Transfer in (18) (and the timing of Transfer is irrelevant for the phase status), also delayed phases are phases. Given the early Transfer in the proposed system, the operation Agree must not be constrained by phases (and the operation Transfer). There are various approaches to long distance Agree and its exceptional behaviour in the literature, ranging from cyclic agreement analyses, through percolation approaches, to proposals treating agreement as a post-syntactic process; see, for instance, Stjepanović and Takahashi (2001), Legate (2005), Bošković (2007a, 2007b), Bobaljik (2008), Müller (2010), Biskup (2012), Richards (2012). I remain agnostic here about which of these analyses should be adopted; in what follows, I will just assume that Agree is not restricted by Transfer. To sum up this section, the operation Merge forms syntactic objects – phases – that are transferred to the interfaces. The operation labeling feeds Transfer and movement in turn feeds labeling. We have already seen some benefits of the proposed system. In the next section, I will demonstrate that the current proposal can straightforwardly derive the four movement phenomena introduced in section 2.

4 The analysis 4.1 Freezing effects In section 2.1., we saw several examples of ungrammatical extraction from moved syntactic objects. This section aims to answer the question why such an extraction is bad. Recall from the preceding discussion that movement is triggered by the greedy feature uF, which forces its bearer to move immediately. Combining this with the assumption that derivations proceed in the bottom-up fashion provides us with the answer to the “why” question above. Specifically, as shown in (22a), the embedded syntactic object (β) bearing the movement feature must move out of the dominating syntactic object (α) before the dominating element itself moves. Thus, it cannot happen that the higher syntactic object moves first and only then the embedded syntactic object; the derivation schematized in (22b) will always be ungrammatical. It is also not possible to add the movement feature to the embedded β after movement of α since it would violate the Inclusiveness Condition.

102 

 Petr Biskup

(22) a.

b. 2



α [uF]

α [uF]

1 β [uF]

2

β [uF]

1

α [uF] β [uF]



Let us now demonstrate how the proposal works in the case of example (2), repeated below as (23). The relevant part of the derivation proceeds as in (24) (for ease of exposition, I will use the standard category labels in trees). (23) a.   [Which candidate]1 were there [posters of t1] all over the town? b. *[Which candidate]1 were [posters of t1]2 t2 all over the town? The noun candidate is merged with the determiner which, which bears the movement feature (that could be the usual wh-feature or Q-feature). Since the feature has the pied-piping property in (23), which does not move itself and the preposition of is merged.2 After that, which pied-pipes candidate across of and the movement feature on the lower copy of which candidate is deleted. Given Labeling in (20) and Transfer in (19), the lower copy of which candidate is labeled and transferred (which could not happen before the movement). The phase {of, {which, candidate}} is also labeled and transferred since there is no movement feature in the phase that could trigger movement. The labeling algorithm does not look for movement features in transferred phases because, being transferred (that is, labeled), they cannot contain a movement feature. Hence, in (24) only the head of must be inspected. PP and the higher copy of which candidate, however, cannot be labeled and transferred because they contain an undeleted/active movement feature. In the next step, the noun posters is merged and the movement feature forces which candidate to move across posters, with the consequence that the movement feature on the lower copy of which candidate is deleted. Now the copy of which candidate in PP is labeled and transferred. The same also holds true for the PP and N’ since they do not contain any (active) movement feature that could trigger movement; see Labeling and Transfer again.

2 I do not assume any particular theory of pied-piping here, but see, e.g., Horvath (2006), Heck (2008), Cable (2010).

Labeling and other syntactic operations 



 103

(24) which c. [uF] D [uF] 2

1

NP DP

N’

which c. N [uF] posters

PP DP

which c. P [uF] of

P’ DP D [uF] which

NP candidate





Next, the (phonetically null) determiner is merged. Suppose that the determiner bears the movement feature and that its sister should be piep-piped. The fact that the determiner asymmetrically c-commands which candidate is crucial for the order of movements. Given the bottom-up assumption, which candidate must move first. So, it moves across the determiner in accordance with the Extension Condition, as illustrated with step 1 in (24). After this, the lower copies of which candidate and NP are labeled and transferred. Crucially, only then can the determiner with its movement feature pied-pipe the whole NP, as illustrated with step 2. Thus, we can conclude that extraction from a moved syntactic object is not possible because of the bottom-up application of the impatient movement features.

4.2 The prohibition of headless XP-movement In section 2.2., I discussed the constraint on headless phrasal movement. I presented some examples showing that remnant movement of a syntactic object is not possible if its head moved out of the phrase. What is the reason for this behaviour? The rationale is demonstrated in (25). Suppose that the lexical item α merges with the phrase β and that α bears the movement feature, as illustrated in (25a). This Merge operation constitutes a phase but given Labeling – and

104 

 Petr Biskup

the presence of the movement feature –, labeling and Transfer cannot happen yet. The movement feature forces α to move and the movement feature on the lower copy is deleted, as shown in (25b). After that, the phase is labeled as α since only α is an atomic element; consider (25c). Then, Transfer takes place. From this is obvious that the headless syntactic object cannot move. Since the moving syntactic object always takes its movement feature along and the movement feature on the lower copy is deleted, the remnant does not contain a feature that could trigger movement. In this way, we receive the effect of Chomsky’s (1995b: 304) and Takano’s (2000) condition that only the head of a chain enters into the operation Move. Given the Inclusiveness Condition, it is also not possible to add a new movement feature to the remnant constituent, which would trigger its movement. (25) a.

b. α [uF]

α

c. α [uF]



α

[uF]

α [uF] β

α

[uF]

β

Furthermore, it does not help if two (or more) movement features are put on the moving head because the head takes all its syntactic features along (cf. Chomsky 1995b: 265). Thus, there will never be a situation where one movement feature or syntactic feature remains on the lower copy and another one moves together with the head; which seems to partially derive effects of the principle of l exical integrity (cf. Di Sciullo & Williams 1987: 49, Anderson 1992: 84, Bresnan 2001: 92, Spencer 2005: 81). To be more concrete, consider example (8b), repeated below as (26). (26) *[Anna t1 a book]2 Hans gave1 t2. The relevant part of the derivation of (26) is demonstrated in (27). The verb gave merges with the DP a book, as shown in (27a). Since gave bears the movement feature, it must move and the movement feature on the lower copy is deleted, as in (27b). After this step, the phase is labeled and transferred, as shown in (27c). Then Anna is merged, as in (27d), gave moves across it and the movement feature on the lower copy is deleted, as demonstrated in (27e). Thus, on copies of gave, there is no active movement feature, which could pied-pipe the syntactic object [Anna a book]. The derivation proceeds with labeling and transferring the syntactic object {gave, {gave, a book}}; see (27f). The same also happens to the syntactic object {Anna, {gave, {gave, a book}}} because it contains no active movement feature anymore; consider (27g).

Labeling and other syntactic operations 



(27) a.

 105

b. gave V [uF]

DP a book

VP gave V [uF] DP a book gave [uF] d.

c.

Anna DP

VP gave V [uF] DP a book gave [uF]

e.

gave V VP [uF] gave DP a book [uF] f.

gave V [uF] Anna DP gave VP [uF] gave DP a book [uF]

gave V [uF] Anna DP gave [uF]

VP VP

[uF]

DP

g. gave V VP [uF] Anna DP VP



gave VP [uF] gave DP a book [uF]

Some counterexamples to Takano’s generalization can be found in Fanselow (1991), Lenerz (1995), Müller (1998), Abels (2003). For instance, Müller (1998: 260) proposes that in the following example, the verb gibt has raised prior to remnant VP topicalization. (28) (Ich glaube) [[VP2 Kindern Bonbons t1] gibt man besser nicht t2]. I believe children sweets gives one better not ‘I believe that one should rather not give children sweets.’

106 

 Petr Biskup

However, there are data calling the remnant analysis into question. Example (29) shows that V2 movement strands the particle in German. Example (30), taken from Haider (1990: 96), shows that when the particle is topicalized together with the object, the sentence is ungrammatical, which is unexpected in the light of the remnant analysis of (28).3 (29) Hans schlug ein Buch Hans opened a book ‘Hans opened a book.’

auf. on

(30) *[Ein a

Hans. Hans

Buch book

auf ] on

schlug open

The counterexamples to Takano’s generalization, in fact, can be analyzed in terms of movement of a syntactic object headed by a phonetically null head. For instance, (31) can receive an analysis under which the moved putative remnant is a projection of an applicative head, as shown in (32) (cf. Fanselow (1993) and Müller (2005) for analyses of remnants in terms of a covert verb). (31) [Kindern Bonbons] gibt man besser children sweets gives one better ‘One should rather not give children sweets.’

nicht. not

(32)

V gibt Kindern DP Bonbons DP

Appl [uF]  

The applicative head, bearing the movement feature, is merged with the direct object Bonbons. Since its movement feature has the pied-piping property, the head does not move by itself and the indirect object Kindern is merged. The syntactic object cannot be labeled and transferred because of the presence of the movement

3 There are also verb-particle combinations that allow such constructions; see, e.g., Müller (2002).



Labeling and other syntactic operations 

 107

feature. Then, the verb gibt is merged, the applicative head pied-pipes the whole constituent across it, as illustrated in (32), with the consequence that the movement feature on the lower copy is deleted and the copy is labeled and transferred. So, the applicative phrase moves successive-cyclically to its ultimative landing position.4 To sum up, since movement is triggered by the movement feature on the moving head and the head takes all its syntactic features along and since the movement feature is deleted upon movement, the lower copy of the head has no movement feature, is transferred and cannot pied-pipe other constituents.

4.3 Order preservation in multiple movement This section is concerned with the question why certain multiple movements show shape preservation effects. In the ideal case, shape preservation effects – as discussed, for instance, in section 2.3. – should be derived in the same way. Another question, closely related to the preceding one, is why order preservation is often derived by syntactic objects belonging to the same class, by wh-elements, pronouns, clitics etc. I propose that the reason for this is that the syntactic objects move to the same projection. If we combine this proposal with the assumptions from section 3. that movement is triggered immediately by the movement feature on the moving element and that derivations proceed in the bottom-up fashion, then it is obvious why it must be this way. As shown in (33), the moving syntactic objects move successive-cyclically across the higher elements, the lower syntactic object always moves first and then the higher syntactic object skips over the lower one in accordance with the Extension Condition. If the syntactic objects value their movement features with the same head, as with Y in (33), we receive the shape conservation effect. (33) [WP1 UP2 [Y [t1 t2 [X [t1 t2 ...]]]]] To be more specific, the Bulgarian wh-movement example in (11), for convenience repeated below as (34), shows that the wh-subject must precede the wh-object. The relevant parts of the derivation of (34) are shown in (35).

4 An answer to the question of why such a derivation is not possible in English could be based on the different shape of vP in English and German. For instance, one can propose in line with Haider (2000, 2013) that in VO languages like English, also in the case of the applicative/covert verb phrase, the head must raise, which brings about the same result as (27).

108 

 Petr Biskup

(34) a. Koj1 kogo2 vižda who whom sees ‘Who sees whom?’ b. *Kogo2 koj1 vižda whom who sees

t1 t2? t1 t2?

(35) [CP koj1 [CP kogo2 [C’ C [TP koj1 [TP kogo2 [T’ T [vP koj1 [vP kogo2 [vP koj1 [vP kogo2 [v’ v [vP kogo2 [vP V kogo2]]]]]]]]]]]]] The wh-object kogo with its movement feature moves across V, v and the wh-subject koj (leaving aside the derivation of the verb). The movement feature of the subject forces it to skip back over kogo. It is important because this step restores the original hierarchical relation (and the order) between the two elements. The wh-object, however, cannot skip over koj again before Merger of a new syntactic object (and the same also holds for the subject). Note that such an assumption is also necessary in Chomsky’s system because edge features – which drive the operation Merge and are not deleted – could trigger Merger of one and the same syntactic object repeatedly. Then, T is merged and given the bottom-up assumption, the object kogo moves across it and then the higher koj. Since these movements observe the Extension Condition, the subject ends up in a position from where it c-commands the object. Such movement steps are also repeated after Merger of C. What is important is that both wh-elements end up in the same projection, that is, their movement features are valued by the same head, here C. This derives the shape conservation effect as in (34a). The ungrammatical sentence in (34b) is derivable only if some condition – the Extension Condition or the bottom-up assumption – is violated. There are also cases of multiple wh-movement without order preservation effects; consider the Czech example in (36). (36) a. Kdo koho vidí? who whom sees ‘Who sees whom?’ b. Koho kdo vidí? whom who sees ‘Who sees whom?’ From the discussion above we conclude that here we are dealing with ­ ovement to different projections. This seems to be correct because accordm ing to Rudin (1988) Bulgarian and Romanian wh-elements have their ultimate landing ­position in CP, whereas in Czech, Polish and Serbo-Croatian only the



Labeling and other syntactic operations 

 109

first wh-element ends up in CP. Thus, example (36b) will receive the following analysis. (37) [CP koho2 [C’ C [TP kdo1 [TP koho2 [T’ T [vP kdo1 [vP koho2 [vP kdo1 [vP koho2 [V’ v [VP koho2 [VP V koho2]]]]]]]]]]]]] The derivation proceeds like (35); only the final step is different. The wh-subject kdo and the wh-object koho move to TP, preserving their hierarchical relation, but then only one of the syntactic objects moves further because it has a movement feature that cannot be valued in TP; in (37) it is the object. More concretely, since wh-phrases are inherently focused, I assume that the driving force for movement of wh-phrases to TP is a focus movement feature (cf. Horvath (1986), Stjepanović (1999), Bošković (2002b)).5 In addition to this feature, the wh-object koho also has a wh-movement feature, which is responsible for the last step, movement to CP. As to the wh-subject kdo, although it occurs in Spec, TP, it can be interpreted as a wh-word via Agree with C. Note that overt movement is not necessary for interpretation of questions/wh-words, as, for instance, Japanese, Korean, Chinese, French or multiple wh-questions in English show (for an Agree analysis of questions, see e.g. Watanabe (2006), Haida (2007), Vlachos (2014) and for an overview of approaches to wh-phrases in situ see Cheng (2003)). The proposal is supported by Junghanns’ (2002) analysis of Czech reflexive clitics, under which reflexive clitics are generated higher than non-reflexive clitics. We know from section 2.3. that in the case of pronominal clitics, the dative clitic must precede the accusative clitic, as shown in (38), originally (12). This is expected if the clitics have their ultimate landing position in the same projection because in Czech the indirect object is generated higher than the direct object. (38) a. Pavel ti1 ji2 představil t1 t2. Pavel you.dat her.acc introduce ‘Pavel introduced her to you.’ b. *Pavel ji2 ti1 představil t1 t2. Pavel her.acc you.dat introduce What is interesting is that reflexive clitics always precede the pronominal clitics. This is demonstrated in example (39), where the dative clitic jí follows the reflexive accusative clitic se.

5 Nothing hinges on the assumption that the final position of the wh-movement is TP here. CP could also be split into more projections, including FocP, as e.g. in Rizzi (1997).

110 

 Petr Biskup

(39) a. Pavel se1 jí2 představil t1 t2. Pavel self.acc her.dat introduce ‘Pavel introduced himself to her.’ b. *Pavel jí2 se1 představil t1 t2. Pavel her.dat self.acc introduce If Junghanns (2002) is correct and reflexive clitics are merged higher than non-reflexive clitics in the clausal structure and if all clitics end up in the same projection, then the ordering like in (39) is expected. Moreover, it seems that generally the order of clitics in the clitic cluster corresponds to positions of clitics in the clausal structure in Czech; consider the clitic order in (40). (40) question clitic li, modal, auxiliary, reflexive, pronominal dative, pronominal accusative The proposed analysis is also supported by the different behaviour of scrambling and object shift. Scrambled syntactic objects in languages like German, Russian and Czech can occur in various projections and do not show order preservation effects (e.g. Müller (1995), Bailyn (1995), Veselovská (1995), Biskup (2011)), as predicted by the current analysis. In contrast, object shift, for instance, in Danish and Icelandic show order preservation effects (Vikner 1990, Collins and Thráinsson 1996) and the object shift position is fixed (Vikner 1994), again as expected. Consider the following examples demonstrating the difference in the flexibility of landing sites of scrambling and object shift. (41) Gestern hat Peter (das Buch) ohne Zweifel yesterday has Peter (the book) without doubt nicht (das Buch) gelesen. not (the book) read ‘Yesterday, Peter certainly did not read the book.’ (German, Vikner 1994: 493)

(das Buch) (the book)

(42) I gær las Pétur (bókina) eflaust (*bókina) ekki tV (bókina). yesterday read Pétur (the book) doubtlessly (the book) not (the book) ‘Yesterday, Peter certainly did not read the book.’ (Icelandic, Vikner 1994: 494) (43) I går læste Peter (den) uden tvivl (*den) ikke tV (*den). yesterday read Peter (it) without doubt (it) not (it) ‘Yesterday, Peter certainly did not read the book.’ (Danish, Vikner 1994: 494)

Labeling and other syntactic operations 



 111

As to order preservation effects, consider the examples below, which show that in contrast to object shift in Icelandic and Danish, scrambling in German is not order preserving. (44) Peter hat (das Buch) dem Lehrer (das Buch) Peter has the book the teacher the book gezeigt. shown ‘Peter certainly did not show the book to the teacher.’ (45) Ég lána (*bærkurnar) Maríu (?bærkurnar) I lend the.books Maria the.books ‘I did not lend the books to Maria.’ (Collins and Thráinsson 1996: 406, 409)

sicherlich certainly

ekki. not

(46) Peter viste (*den) hende (den) jo. Peter showed it her it indeed ‘Peter indeed showed her it.’ (Danish, Müller 2001: 288, originally Vikner 1990) To summarize this section, if moving syntactic objects end up in the same projection, then, given the bottom-up assumption, the Extension Condition and impatient movement features, order preservation effects arise. If they land in different projections, their order can also be reversed.

4.4 The ban on the acyclic incorporation In section 2.4., we saw that according to Baker (1988), acyclic incorporations like (47), where the second movement reaches down more deeply into the structure than the first one does, are excluded because the higher trace blocks antecedent government of the lower trace. In what follows, I show how such derivations are excluded by the current proposal. (47) X X

X

YP Z2

Y1

1

t1 [uF] 2

ZP

t2 [uF]

112 

 Petr Biskup

Because of the bottom-up assumption and the presence of movement features on the heads Z and Y, Z – which is asymmetrically c-commanded by Y – cannot move after Y.6 More concretely, the movement feature forces Z to move immediately after Merger of Y, resulting in the feature deletion and Transfer of the phase ZP in accordance with Labeling (20) and Transfer (19). Only then can Y (together with Z) move and incorporate into the newly merged head X. In the case of the Baker’s ungrammatical example (15), repeated below as (48), the prepositional complement child corresponds to Z in (47), the preposition/applicative morpheme corresponds to Y and the verb buy corresponds to the head X. Given the argumentation above, when the noun child with its movement feature merges with the preposition, it must incorporate into it; it cannot undergo movement as late as after the preposition incorporation. (48) *Waˀ-khe-yat-wir-ahninv-ˀ-θ. past-1sS/3O-refl-child-buy-asp-appl ‘I sold him to the children.’ (ok as ‘I sold the children to him.’) To sum up this discussion, acyclic incorporations can be excluded without recourse to the notion of government. This can be achieved under the assumption that derivations proceed in the bottom-up fashion and that movement is triggered by features forcing the moving element to move immediately.

5 Conclusion I have proposed some modifications to the minimalist system that on one side can rule out certain types of ill-formed sentences – for instance, with acyclic incorporation or headless XP-movement – and on the other side can derive grammatical sentences with effects like order preservation in multiple movement. We have seen that the proposed system also has other benefits, such as not using the PIC and null phase heads or not stipulating specific projections as phases. Acknowledgements: For comments and discussions, I would like to thank the participants of the GGS 39 conference (May 10–12, 2013), the Frankfurt syntax colloquium (June 26, 2013), the Leipzig grammar colloquium (January 29, 2014),

6 This reasoning holds independently of whether the standard head movement or reprojection movement or combination of both is adopted.



Labeling and other syntactic operations 

 113

the DGfS workshop on labels and roots (March 5–7, 2014), and the Göttingen LinG colloquium (July 2, 2014). Special thanks go to Andreas Blümel and Erich Groat for their helpful suggestions and detailed comments.

References Abels, Klaus. 2003. Successive cyclicity, anti-locality, and adposition stranding. Storrs: University of Connecticut dissertation. Abels, Klaus. 2007. Towards a restrictive theory of (remnant) movement. Linguistic Variation Yearbook 7. 57–120. Adger, David. 2013. A syntax of substance. Cambridge, MA: MIT Press. Agbayani, Brian. 1998. Feature attraction and category movement. Ph.D. dissertation, University of California at Irvine. Anderson, Stephen R. 1992. A-morphous morphology. Cambridge: Cambridge University Press. Bailyn, John F. 1995. A configurational approach to Russian “free” word order. Ithaca, NY: Cornell University dissertation. Baker, Mark C. 1988. Incorporation: A theory of grammatical function changing. Chicago: University of Chicago Press. Biskup, Petr. 2011. Adverbials and the phase model. Amsterdam & Philadelphia: John Benjamins. Biskup, Petr. 2012. Agree, move, selection, and set-merge. In Artemis Alexiadou, Tibor Kiss & G. Müller (eds.), Local modelling of non-local dependencies in syntax, 111–133. Berlin & Boston: Walter de Gruyter. Biskup, Petr. 2013. On the NTC and labeling, Universität Leipzig manuscript. Biskup, Petr. 2014. For, zu and feature inheritance. In Anke Assmann, Sebastian Bank, Doreen Georgi, Timo Klein, PhilippWeisser & Eva Zimmermann (eds.), Topics at Infl, 423–439. Universität Leipzig: Linguistische Arbeitsberichte 92. Blümel, Andreas. 2013. Propagating symmetry. Case studies in exocentric syntax. Frankfurt am Main: Johann Wolfgang Goethe University dissertation. Bobaljik, Jonathan David. 2008. Where’s phi? Agreement as a postsyntactic operation. In Daniel Harbour, David Adger & Susana Béjar (eds.), Phi-theory: Phi-features across modules and interfaces, 295–328. Oxford: Oxford University Press. Boeckx, Cedric. 2003. Islands and chains: Resumption as stranding. Amsterdam: John Benjamins. Boeckx, Cedric. 2008. Bare syntax. Oxford: Oxford University Press. Boeckx, Cedric. & Kleanthes K. Grohmann. 2007. Remark: Putting phases in perspective. Syntax 10. 204–222. Bošković, Željko. 2002a. A-movement and the EPP. Syntax 5. 167–218. Bošković, Željko. 2002b. Multiple wh-fronting. Linguistic Inquiry 33. 351–83. Bošković, Željko. 2007a. On the locality and motivation of Move and Agree: An even more minimal theory. Linguistic Inquiry 38. 589–644. Bošković, Željko. 2007b. Agree, phases, and intervention effects. Linguistic Analysis 33. 54–96. Bresnan, Joan. 2001. Lexical functional syntax. Oxford: Blackwell Publishers.

114 

 Petr Biskup

Broekhuis, Hans. 2006. Extraction from subjects: Some remarks on Chomsky’s ‘On phases’. In Hans Broekhuis, Norbert Corver & Riny Huybregts (eds.), Organizing grammar. Linguistic studies in honor of Henk van Riemsdijk, 59–68. Berlin & New York: Mouton de Gruyter. Cable, Seth. 2010. The grammar of Q: Q-particles, wh-movement and pied-piping. Oxford: Oxford University Press. Carnie, Andrew. 2010. Constituent structure. Oxford: Oxford University Press. Cecchetto, Carlo & Caterina Donati. 2010. On labeling: Principle C and head movement. Syntax 13. 241–278. Cheng, Lisa Lai-Shen. 2003. Wh-in-situ. Glot International 7: 4 and 5. 103–109; 129–137. Chomsky, Noam. 1995a. Bare phrase structure. In Gert Webelhuth (ed.), Government and binding theory and the minimalist program, 383–439. Oxford, Cambridge: Blackwell. Chomsky, Noam. 1995b. The minimalist program. Cambridge, MA: The MIT Press. Chomsky, Noam. 2000. Minimalist inquiries: The framework. In Roger Martin, David Michaels & Juan Uriagereka (eds.), Step by step: Essays on minimalist syntax in honor of Howard Lasnik, 89–156. Cambridge, MA: MIT Press. Chomsky, Noam. 2001. Derivation by phase. In Michael J. Kenstowicz (ed.), Ken Hale: A life in language, 1–52. Cambridge, MA: MIT Press. Chomsky, Noam. 2004. Beyond explanatory adequacy. In Adriana Belletti (ed.), Structures and beyond, 104–131. Oxford University Press. Chomsky, Noam. 2005. Three factors in language design. Linguistic Inquiry 36. 1–22. Chomsky, Noam. 2007. Approaching UG from below. In Uli Sauerland & Hans-Martin Gärtner (eds.), Interfaces + recursion = language?, 1–29. Berlin: Mouton de Gruyter. Chomsky, Noam. 2008. On phases. In Robert Freidin, Carlos P. Otero & Maria Luisa Zubizarreta (eds.), Foundational issues in linguistic theory. Essays in honor of Jean-Roger Vergnaud, 133–166. Cambridge, MA: MIT Press. Chomsky, Noam. 2013. Problems of projections. Lingua 130. 33–49. Collins, Chris. 1994. Economy of derivation and the Generalized Proper Binding Condition. Linguistic Inquiry 25. 45–61. Collins, Chris. 2002. Eliminating labels. In Samuel David Epstein & T. Daniel Seely (eds.), Derivation and explanation in the minimalist program, 42–64. Oxford: Blackwell Publishers. Collins, Chris & Edward P. Stabler 2011. A formalization of minimalist syntax. Available at http://ling.auf.net/lingbuzz/001691. Collins, Chris & Höskuldur Thráinsson. 1996. VP-internal structure and object shift in Icelandic. Linguistic Inquiry 27 (3). 391–444. Diesing, Molly. 1992. Indefinites. Cambridge, MA: MIT Press. Di Sciullo, Anna Maria & Edwin Williams. 1987. On the definition of word. Cambridge, MA: MIT Press. Epstein, Samuel David & T. Daniel Seely. 1999. SPEC-ifying the GF “subject”: Eliminating A-chains and the EPP within a derivational model. Ms., University of Michigan and Eastern Michigan University. Epstein, Samuel David & T. Daniel Seely. 2002. Rule applications as cycles in a level-free syntax. In Samuel David Epstein & T. Daniel Seely (eds.), Derivation and explanation in the minimalist program, 65–89. Oxford: Blackwell Publishers. Fanselow, Gisbert. 1991. Minimale syntax. University of Passau habilitation thesis. Fanselow, Gisbert. 1993. The return of the base generators. Groninger Arbeiten zur germanistischen Linguistik 36. 1–74.



Labeling and other syntactic operations 

 115

Fox, Danny & David Pesetsky. 2005. Cyclic linearization of syntactic structure. Theoretical Linguistics 31. 1–46. Freidin, Robert. 1992. Foundations of generative syntax. Cambridge, MA: MIT Press. Gallego, Ángel J. 2009. On freezing effects. Iberia: An International Journal of Theoretical Linguistics 1. 33–51. Gärtner, Hans-Martin. 2002. Generalized transformations and beyond. Reflections on minimalist syntax. Berlin: Akademie-Verlag. Grewendorf, Günther. 2001. Multiple wh-fronting. Linguitic Inquiry 32. 87–122. Haida, Andreas. 2007. The Indefiniteness and Focusing of Wh-Words. Berlin: Humboldt University dissertation. Haider, Hubert. 1990. Topicalization and other puzzles of German syntax. In Günther Grewendorf & Wolfgang Sternefeld (eds.), Scrambling and barriers, 93–112. Amsterdam: John Benjamins. Haider, Hubert. 2000. OV is more basic Than VO. In Peter Svenonius (ed.), The derivation of VO and OV, 45–67. Amsterdam: John Benjamins. Haider, Hubert. 2013. Symmetry breaking in syntax. Cambridge: Cambridge University Press. Heck, Fabian. 2008. On pied-piping. Wh-movement and beyond. Berlin: Mouton de Gruyter. Hornstein, Norbert & Jairo Nunes. 2008. Adjunction, labeling, and bare phrase structure. Biolinguistics 2.1. 57–86. Horvath, Julia.1986. Focus in the theory of grammar and the syntax of Hungarian. Dordrecht: Foris. Horvath, Julia. 2006. Pied-piping. In Martin Everaert & Henk van Riemsdijk (eds.), The Blackwell companion to syntax, Volume III, 569–630. Malden, MA: Blackwell Publishing. Johnson, Kyle. 1991. Object positions. Natural Language and Linguistic Theory 9. 577–636. Junghanns, Uwe. 2002. Untersuchungen zur Syntax und Informationsstruktur slavischer Deklarativsätze. Universität Leipzig: Linguistische Arbeitsberichte 80. Kučerová, Ivona. 2007. The syntax of givenness. Cambridge, MA: MIT dissertation. Lasnik, Howard & Myung-Kwan Park. 2003. The EPP and the subject condition under sluicing. Linguistic Inquiry 34. 649–660. Legate, Julie Anne. 2005. Phases and cyclic agreement. In Martha McGinnis & Norvin Richards (eds.), Perspectives on phases, 147–156. MIT Working Papers in Linguistics 49. Cambridge, MA: MIT Working Papers in Linguistics. Lenerz, Jürgen. 1995. Klammerkonstruktionen. In Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld & Theo Vennemann (eds.), Syntax, Vol. II, 1266–1276, Berlin: Mouton de Gruyter. Lohndal, Terje. 2011. Freezing effects and objects. Journal of Linguistics 47. 163–199. Manzini, Maria Rita. 1994. Locality, minimalism, and parasitic gaps. Linguistic Inquiry 25. 481–508. Matushansky, Ora. 2006. The status of head-movement in linguistic theory. Linguistic Inquiry 37 (1). 69–109. Moro, Andrea. 2000. Dynamic antisymmetry. Cambridge, MA: MIT Press. Müller, Gereon. 1995. A-bar syntax. A study in movement types. Berlin & New York: Mouton de Gruyter. Müller, Gereon. 1998. Incomplete category fronting: A derivational approach to remnant movement in German. Dordrecht: Kluwer. Müller, Gereon. 2001. Order preservation, parallel movement, and the emergence of the unmarked. In Géraldine Legendre, Jane Grimshaw & Sten Vikner (eds.), Optimalitytheoretic syntax, 279–313. Cambridge, MA: MIT Press.

116 

 Petr Biskup

Müller, Gereon. 2004. Phase impenetrability and wh-intervention. In Arthur Stepanov, Gisbert Fanselow & Ralf Vogel (eds.), Minimality effects in syntax, 289–325. Berlin: Mouton de Gruyter. Müller, Gereon. 2010. On deriving CED effects from the PIC. Linguistic Inquiry 41. 35–82. Müller, Stefan. 2002. Com­plex pred­i­cates: Ver­bal com­plex­es, re­sul­ta­tive con­struc­tions, and par­ti­cle verbs in Ger­man. Stan­ford: CSLI Pub­li­ca­tions. Müller, Stefan. 2005. Zur Analyse der scheinbar mehrfachen Vorfeldbesetzung. Lin­guis­tis­che Berichte 203. 297–330. Neeleman, Ad & Hans van de Koot. 2010. A local encoding of syntactic dependencies and its consequences for the theory of movement. Syntax 13. 331–372. Ott, Dennis. 2011. Local instability: The syntax of split topics. Cambridge, MA: Harvard University dissertation. Postal, Paul M. 1972. On some rules that are not successive cyclic. Linguistic Inquiry 3. 211–222. Richards, Marc. 2012. Probing the past: On reconciling long-distance agreement with the PIC. In Artemis Alexiadou, Tibor Kiss & Gereon Müller (eds.), Local modelling of non-local dependencies in syntax, 135–154. Berlin & Boston: Walter de Gruyter. Rizzi, Luigi. 1997. The fine structure of the Left Periphery. In Liliane Haegeman (ed.), Elements of grammar. A handbook in generative syntax, 281–337. Dordrecht: Kluwer. Ross, John Robert. 1967. Constraints on variables in syntax. Cambridge, MA: MIT dissertation. Rudin, Catherine. 1988. On multiple questions and multiple WH fronting. Natural Language and Linguistic Theory 6. 445–501. Seely, T. Daniel. 2006. Merge, derivational c-command, and subcategorization in a label-free syntax. In Cedric Boeckx (ed.), Minimalist essays, 182–217. Amsterdam: John Benjamins. Spencer, Andrew. 2005. Word-formation and syntax. In Pavol Štekauer & Rochelle Lieber (eds.), Handbook of word-formation, 73–97. Amsterdam: Springer. Stjepanović, Sandra. 1999. What do second position cliticization, scrambling, and multiple wh-fronting have in common? Storrs, CT: University of Connecticut dissertation. Stjepanović, Sandra & Shoichi Takahashi. 2001. Eliminating the Phase iImpenetrability Condition. Kanda University of International Studies mansucript. Takahashi, Daiko. 1994. Minimality of movement. Storrs, CT: University of Connecticut dissertation. Takano, Yuji. 2000. Illicit remnant movement: An argument for feature-driven movement. Linguistic Inquiry 31. 141–156. Veselovská, Ludmila. 1995. Phrasal movement and X-morphology: Word order parallels in Czech and English nominal and verbal projections. Olomouc: Palacký University dissertation. Vikner, Sten. 1990. Verb movement and the licensing of NP positions in the Germanic languages. Genève: Université de Genève dissertation. Vikner, Sten. 1994. Scandinavian object shift and West Germanic scrambling. In Norbert Corver & Henk van Riemsdijk (eds.), Studies on Scrambling, 487–517. Berlin: Mouton de Gruyter. Vlachos, Christos. 2014. Wh-Inquiries into Modern Greek and their theoretical import(ance). Journal of Greek Linguistics 14. 212–247. Watanabe, Akira. 2006. The Pied-Piper feature. In Lisa Lai-Shen Cheng & Norbert Corver, (eds.), WH-movement: Moving on, 47–70. Cambridge, MA: MIT Press. Wexler, Kenneth & Peter W. Culicover. 1980. Formal principles of language acquisition. Cambridge, MA: MIT Press. Williams, Edwin. 1976. A grammar of Tuscarora. New York: Garland. Williams, Edwin. 2003. Representation theory. Cambridge, MA: MIT Press.

Miki Obata

Is Transfer strong enough to affect labels? Abstract: This paper explores how strong the Transfer operation (Chomsky 2004) is. Is it strong enough to eliminate syntactic representations completely from the narrow syntax? Or does Transfer only make transferred domain inaccessible to further operations in the narrow syntax (i.e. weak Transfer)? By considering structure-building of adjuncts, this paper demonstrates that Transfer needs to be weak enough to leave some pieces in the narrow syntax for further application of Merge and the Labeling Algorithm.

1 Introduction Transfer is an operation to send syntactic representations constructed in the narrow syntax to the Sensory-Motor (SM) and Conceptual-Intentional (CI) interfaces. Although the operation has been discussed in various contexts since Chomsky (2004), some of the mechanisms are still unclear. One of those issues is how strong Transfer is. The goals of this paper are to provide an overview of strong and weak versions of Transfer and also present additional evidence in favor of a weak version of Transfer by focusing on Labeling. The paper is organized as follows: Section 2 clarifies some differences between Spell-Out and Transfer and raises some problems this paper focuses on. Section 3 reviews several previous works regarding strong and weak Transfer. Section 4 presents additional evidence to support weak Transfer and discusses theoretical consequences. Section 5 mentions some remaining problems and concludes the study.

2 Framing the issues: Spell-Out and Transfer In the Standard Theory (Chomsky 1965) and Government and Binding theory (Chomsky 1981), there was no specific operation to send representations to semantic and phonological computations, but rather two syntactic levels (Deep Structure and Surface Structure), which are the input for those computations directly. Since Chomsky’s (1995) Minimalist Program, those two syntactic levels of representations have been abolished. This shift indicates that syntactic operations are applied not in order to construct syntactically independent representations (i.e. at the surface structure) but in order to construct representations computed DOI 10.1515/9781501502118-006

118 

 Miki Obata

in the semantic component. That is, syntactic derivations are executed aiming for the semantic component. On the way, phonological features are sent to the phonological component by Spell-Out, which is illustrated in (1a). Chomsky (2000: 118) argues: “…in the course of construction of LF, an operation Spell-Out delivers the structure already formed to the phonological component, which converts it to PF. If lexical items express Saussurean arbitrariness in the conventional way, then Spell-Out ‘strips away’ the true phonological features, so that the derivation can converge at LF…” In addition, Uriagereka (1999) suggests that Spell-Out is applied not once, but multiple times, which is so-called “multiple Spell-Out” as in (1b). Chomsky (2000, 2001) also employs Uriagereka’s system and proposes that narrow syntax representations are spelled out phase by phase, which is a unit/cycle of linguistic derivation (i.e. vP and CP). (1) a. Spell-Out (Chomsky 1995)

b. Multiple Spell-Out (Uriagereka 1999, Chomsky 2000, 2001) Numeration/Lexical Array

Numeration

PF PF

PF LF

LF

Under these Spell-Out systems, it is clear what is sent from narrow syntax and what is left behind in narrow syntax after Spell-Out. That is, Spell-Out sends representations bearing phonological features from narrow syntax to the phonological component. On the other hand, representations without phonological features are left behind in narrow syntax and later reach the semantic component. In Chomsky (2004), Spell-Out is replaced with Transfer, which sends representations in narrow syntax to the semantic component and to the phonological component at the same time as illustrated in (2): (2) Transfer (Chomsky 2004) Lexical Array PHON

SEM

PHON

SEM

PHON

SEM



 Is Transfer strong enough to affect labels? 

 119

Under this system, the semantic component is not the goal to which the derivation is directed, but the place to which representations are sent from narrow syntax. In other words, it is not clear what is left behind in narrow syntax after Transfer, in contrast to after Spell-Out. This paper pursues one of the unclear issues regarding Transfer: How strong is Transfer? If it is strong enough to delete narrow syntax representations completely, nothing is left in narrow syntax. If it is weak enough to keep narrow syntax representations, something is left in narrow syntax. The next section overviews some relevant previous works focusing on the problem of how strong Transfer is.

3 Previous works: Strong vs. weak Transfer 3.1 Strong Transfer Ott (2011) explains differences between free relatives in (3) and wh-clauses in (4) by appeal to ‘powerful’ Transfer. (3) I eat what you cook. (4) I wonder what you cook. Since the complement position of eat can be filled only with a DP, what you cook in (3) should be a DP. On the other hand, the complement position of wonder requires a CP marked with [+Q], so what you cook in (4) should be a CP. How can they be differentiated in the course of the derivation? Ott (2011) assumes two kinds of C: C in a wh-clause has both unvalued phi-features and interpretable Q-features, while C in free relatives has only unvalued phi-features (i.e. no interpretable features). Based on Chomsky’s (2008) feature-inheritance system, unvalued phi-features on both kinds of C are inherited by T. Let us see how whclauses are derived: (5) The derivation of wh-clauses (cf. (4)) Step1: [CP what C[Q] [TP you cook ]] TRANSFER Step2: [CP what C[Q] ] Step3: wonder [CP what C[Q] ] At Step1, unvalued phi-features are inherited by T and C keeps only interpretable [Q]. At Step2, TP is transferred. Note that Ott (2011) assumes that transferred TP is completely deleted from narrow syntax representations by applying strong Transfer.

120 

 Miki Obata

At Step3, the matrix V and CP are merged. That is, the verb wonder takes [+Q] CP as its complement. On the other hand, free relatives are derived as follows: (6) The derivation of free relatives (cf. (3)) Step1: [CP what C[--] [TP you cook ]] TRANSFER Step2: [CP what ] Step3: [__ what ] Step4: eat [DP what ] In this derivation, strong Transfer plays a crucial role. At Step1, unvalued phi-features are inherited by T and C has no features. At Step2, C as well as TP undergo Transfer. Why is C included in the transferred domain? Ott (2011: 186) argues: “assuming the logic of Full Interpretation, a phase head that does not bear any interpretable feature after inheritance ought to be removed from the workspace along with its complement upon Transfer.” This is why both C and TP are deleted from narrow syntax by strong Transfer. Only the wh-phrase what remains in the workspace. Since C is removed from narrow syntax by Transfer, the CP label is also lost, as seen in Step3. At Step4, the matrix V is merged. Since only the DP (i.e. what) is visible at the next higher phase, the complement position of eat is filled with the DP. That is, strong Transfer is crucial for this analysis. Transfer is strong enough to remove a phase head (and its complement) completely from narrow syntax, which causes the CP label to disappear. In addition to Ott (2011), Narita (2011) and Epstein, Kitahara and Seely (2012) also employ strong Transfer, and it plays decisive roles in their analyses.

3.2 Weak Transfer How does weak Transfer work in contrast to strong Transfer? Weak Transfer only makes certain domains inaccessible to operations in narrow syntax, and all the elements/features consisting of representations in narrow syntax are preserved as is after Transfer. Chomsky (2013) argues that representations do not disappear after Transfer because representations which are already transferred can undergo Internal Merge (pied-piping) as parts of bigger phrases, based on Obata’s (2009, 2010) analysis: (7) a. Whose claim that John bought the book did Mary believe? b. That John bought the book was denied.



 Is Transfer strong enough to affect labels? 

 121

In (7a), the DP object containing TP and VP undergoes wh-movement. In (7b), the CP subject containing TP and VP is passivized. In both cases, the moved phrases undergo Transfer twice before movement. If the derivation of (7a) is executed by strong Transfer, ungrammatical output is overgenerated as follows: (8) Step1: [DP whose claim that [TP John bought the book ]] TRANSFER Step2: Mary (did) believe [whose claim that [TP ]] Step3: [DP whose claim that [TP ]]1 did Mary believe t1 (9) Resulting output: *whose claim that did Mary believe John bought the book? Putting aside vP phases for the sake of argument, TP undergoes strong Transfer at Step1. As a result, all the features contained in TP are completely deleted from narrow syntax, as seen in Step2. Then, the higher phase is introduced and the entire DP is moved to Spec-CP at Step3. (9) is the resulting output obtained from this derivation. The TP within the DP is deleted from narrow syntax by strong Transfer before the entire DP undergoes Internal Merge to Spec-CP, so that neither the semantic component nor the phonological component can get the information that the transferred TP is in fact contained within the DP at the matrix Spec-CP. Therefore, the output in (9) is obtained under strong Transfer. How is this problem solved? Transfer needs to leave some ‘clue’ in narrow syntax, which lets the interfaces know where the transferred TP is. If Transfer only makes transferred domains inaccessible (i.e. narrow syntax still keeps transferred representations), the transferred TP in (8) is never lost and can be pronounced within the fronted DP. The discussion here demonstrates that Transfer has to be weak enough to leave representations in narrow syntax. In addition, Bošković (2007) suggests that Agree can probe into transferred domains. This indicates that representations are left behind after Transfer. Obata (2009, 2010) suggest that Transfer leaves a copy of labels, not entire representations, and those labels play crucial roles in reconstructing transferred pieces. In the case of (8), only the TP label is left behind after Transfer. The TP label undergoes wh-movement contained within the DP. Since the TP label is at Spec-CP, the transferred TP representations can be inserted into the right position.1 Although I do not further discuss the issue of whether labels or

1 One might wonder what happens if multiple TPs are waiting for reconstruction. In this case, TPs need to be distinguished by appeal to e.g. indices, which is pointed out by a reviewer. However, this problem never happens if reconstruction of transferred pieces takes place at the interfaces every time Transfer is applied (i.e. cyclically).

122 

 Miki Obata

entire representations are left after Transfer, one clear conclusion from the discussion here is that Transfer is not strong enough to delete labels and representations from narrow syntax.

4 Problems: Transferring entire phases This section raises another problem with strong Transfer by focusing on the case of entire phases being transferred and discusses some theoretical consequences including labeling.

4.1 Root and adjunct clauses Since Chomsky (2000), it has been assumed that Spell-Out/Transfer applies only to the complement of a phase head. However, it is obvious that entire phases need to be transferred in some cases. Root CP is one of those cases: root C and its edge positions are never included in the complement of any phase head. As long as it is true that root C and its edge are pronounced and participate in semantic interpretation, the entire CP must be somehow transferred. With respect to this issue, Obata (2009, 2010) suggests that Transfer must send as many representations as possible, which can be deduced from an economy condition, in that the workspace of the next phase is minimized by sending as many representations as possible and also the number of applying Transfer can be reduced (e.g. in the case of transferring root CP). Another case is adjunct CP. In fact, Root CP and adjunct CP share some of the same properties. First, both CPs are unselected. Root CP is not selected by any verb in contrast to complement CP, since there is no higher structure. Also, adjunct CP is independent of c-selection and s-selection imposed by selecting verbs. Second, no element can be extracted out of either root or adjunct CPs. There is no higher landing site than root CP, so that it is impossible to move any element out of root CP. Also, adjunct CPs form islands. Extraction out of the islands renders the sentences ungrammatical. These two points, at least, are shared between root CP and adjunct CP and imply the possibility that the entire adjunct CP undergoes Transfer, not only the complement of adjunct C. Why is Transfer limited to the complement of a phase head? In the case of CP phases, there seem to be two reasons. One is that the edge of CP works as an escape hatch for successive-cyclic (inter-phasal) movement. The other reason is that the selectional requirement is satisfied by merging a selecting verb and



 Is Transfer strong enough to affect labels? 

 123

its complement C. Also, there are some cases that a selecting verb specifies C’s phonological realization, which is similar to T-D and V-D relations for specifying the morpho-phonological realization of Case. (10) is an example from Cape Verdean Creole: (10) Cape Verdean Creole a. Joao pensa ki/*ma/*Ø Maria kunpra libru. John think C Mary bought book “John thinks Mary bought the book.” b. Joao fra-m ma/*ki/*Ø Maria kunpra libru Joao told+me C Maria bought book “John told me Mary bought the book.” (Obata and Baptista, 2009) In Cape Verdean Creole, selecting verbs decide C’s phonological realization. The verb think in (10a) forces C to be realized as ki. On the other hand, the illocutionary verb tell in (10b) requires C to be realized as ma. That is, the problems of escape hatches and selectional requirements imply that Transfer applies only to the complement of a phase head for convergent derivation. Remember that neither root CP nor adjunct CP is relevant to these two problems: neither root nor adjunct CPs provide escape hatches for successive-cyclic movement because no extraction is allowed out of them, and also they are free from selectional requirements. Also, Uriagereka (1999) explains adjunct islands by applying Spell-Out to the entire adjunct CP. Since there is no reason to leave a phase head and its edge position in the case of root and adjunct CPs, the entire CPs are transferred, by the economy condition.

4.2 Strength of Transfer and labeling What does the discussion in the last section tell us about the strength of Transfer? If the discussion above is on the right track, Transfer sends entire adjunct CPs to the interfaces at once. Under strong Transfer, that indicates that representations contained in adjunct CPs are completely removed from narrow syntax. That is, there is no way to merge adjunct CP with certain syntactic objects. If Transfer is weak enough to leave representations (Chomsky 2013) or only copies of labels (Obata 2009, 2010), on the other hand, this problem never happens. With respect to the Labeling Algorithm discussed in Chomsky (2013, 2014), syntactic objects generated by Merge are presupposed. If there is no dependency constructed by Merge, in other words, the Labeling Algorithm does not go into operation. Strong Transfer makes

124 

 Miki Obata

it impossible not only to merge adjunct CP with certain syntactic objects but also to apply the Labeling Algorithm.2 Based on the present discussion, it can be concluded that Transfer needs to be weak enough to leave some element/features in narrow syntax.

5 Remaining problems and conclusion This paper mainly focused on the issue of how strong Transfer is and also considered several theoretical implications of the proposed analysis. Although strong Transfer provides some insightful analyses, it faces several serious problems at the same time. Especially in the case of adjunct CP, it appears to be plausible that the entire phase undergoes Transfer. If strong Transfer is employed in this case, adjunct CP is never merged with certain syntactic objects and the problem of undergeneration arises. Based on the discussion, this paper demonstrated that Transfer needs to be weak enough to leave some pieces in narrow syntax for further application of Merge and the Labeling Algorithm. Before concluding the study, two remaining problems are pointed out below. First, if weak Transfer leaves whole representations (not only copies of labels) in narrow syntax, the Phase Impenetrability Condition (PIC) needs to be independently postulated.3 On the other hand, strong Transfer can nullify PIC because no element is left in narrow syntax after Transfer. That is, one extra assumption is needed under weak Transfer. The second problem is the labeling of adjuncts. Although the discussion here mentions Chomsky’s (2013, 2014) Labeling Algorithm, the labeling of adjuncts is another unclear issue under this algorithm. One possibility based on the proposals in this paper is that syntactic objects already transferred are invisible for the Labeling Algorithm. Therefore, an adjunct CP does not become a label when it is merged with e.g. vP. Chomsky himself mentions that there are some heads which cannot be labels (e.g. Root, Conjunction, etc.) and Saito (2013, 2014) also suggests that Japanese Case renders Case-marked DP invisible

2 Hornstein and Nunes (2008) demonstrate that adjuncts need to be concatenated with certain syntactic objects and labeling of adjuncts is optional. Even under this system, strong Transfer makes concatenation impossible, so that it is predicted that sentences including adjunct CP are never derived. 3 As summarized in Section 3.2, Obata (2009, 2010) demonstrates that Transfer leaves a copy of labels but representations are completely deleted from narrow syntax. Under this view, PIC can be nullified. (contra. Chomsky’s weak Transfer)



 Is Transfer strong enough to affect labels? 

 125

for the Labeling Algorithm. It appears to be possible that these heads invisible for the Labeling Algorithm can be explained in relation to Transfer, but further careful consideration is required. Acknowledgments: Aspects of this work were presented at Workshop on Labels and Roots (March 2014) and the syntax/semantics support group at University of Michigan (March 2014). I would like to thank the audiences at those meetings for insightful comments and suggestions. Also, I am very grateful to Noam Chomsky, Samuel Epstein, Masayuki Ikeuchi, Hisatsugu Kitahara, Will Nediger, Daniel Seely and Shigeo Tonoike for extremely valuable and helpful discussion, suggestions and comments.

References Bošković, Željko. 2007. Agree, phases and intervention effects. Linguistic Analysis 33: 54–96. Chomsky, Noam. 1965. Aspects of the theory of syntax. MIT Press. Chomsky, Noam. 1995. The minimalist program. Cambridge: MIT Press. Chomsky, Noam. 2000. Minimalist inquiries: The framework. In R. Martin, D. Michaels and J. Uriagereka, Step by step: Essays on minimalist syntax in Honor of Howard Lasnik, 89–155. Cambridge: MIT Press. Chomsky, Noam. 2001. Derivation by phase. In M. Kenstowicz (ed.), Ken Hale: A life in language, 1–52. Cambridge: MIT Press. Chomsky, Noam. 2004. Beyond explanatory adequacy. In A. Belletti (ed.), Structures and beyond: The cartography of Syntactic Structures, Volume 3, 104–131. Oxford: Oxford University Press. Chomsky, Noam. 2007. Approaching UG from below. In U. Sauerland and H-M. Gärtner (ed.), Interfaces + recursion = language? 1–29. New York: Mouton de Gruyter. Chomsky, Noam. 2008. On phases. In R. Freidin, C. P. Otero, and M. L. Zubizarreta (ed.), Foundational issues in linguistic theory: Essays in honor of Jean–Roger Vergnaud, 133–166. Cambridge: MIT Press. Chomsky, Noam. 2013. Problems of projection. Lingua 130: 33–49. Chomsky, Noam. 2014. Problems of projection: Extensions. MIT, unpublished manuscript. Epstein, Samuel, Hisatsugu Kitahara and Daniel Seely. 2012. Structure building that can’t be! In M. Uribe–Etxebarria and V. Valmala (eds.), Ways of structure building, 253–270. Oxford University Press. Hornstein, Norbert and Jairo Nunes. 2008. Adjunction, labeling, and bare phrase structure. Biolinguistics 2: 57–86. Narita, Hiroki. 2011. Phasing in full interpretation. Harvard University Ph.D. thesis. Obata, Miki. 2009. How to move syntactic objects bigger than a phase: On the formal nature of transfer and phasal re-assembly. Proceedings of the 27th English Linguistic Society of Japan (JELS) 27: 207–216. Obata, Miki. 2010. Root, successive-cyclic and feature-splitting internal merge: Implications for feature-inheritance and transfer. University of Michigan, Ann Arbor, Ph.D. thesis.

126 

 Miki Obata

Obata, Miki and Marlyse. Baptista. 2009. Complementizer-alternation in Cape Verdean Creole: New evidence for spec-head agreement. Poster presented at The 83rd Annual Meeting of the Linguistic Society of America (LSA2009). San Francisco, CA. Ott, D. 2011. A note on free relative clauses in the theory of phases. Linguistic Inquiry 183–192. Saito, Mamoru. 2013. Case and labeling in a language without φ–feature agreement. Nanzan University unpublished manuscript. Saito, Mamoru. 2014. Kukoozoo-ni okeru bunpookaku-no yakuwari [The role of case in phrase structure]. Handout presented at Keio Linguistic Colloquium. Uriagereka, Juan. 1999. Multiple spell-out. In S. D. Epstein, and N. Hornstein (eds.), Working minimalism, 251–282. Cambridge: MIT Press.

Dennis Ott

Clausal arguments as syntactic satellites: A reappraisal Abstract: This paper defends the Satellite Hypothesis of clause-initial CPs (sentential subjects and fronted complement clauses) proposed by Koster (1978) and Alrenga (2005). According to this hypothesis, clause-initial CPs are not simply fronted to the edge of the superordinate clause, but are in fact leftdislocated and hence structurally external to their host. This hypothesis has recently come under fire because of its failure to account for connectivity effects (Takahashi 2010, Moulton 2013). I show in this paper that once the Satellite Hypothesis is couched in terms of Ott’s (2014) theory of left-dislocation, it avoids these problems and persists as a plausible representation for peripheral sentential arguments. The fact that clause-initial CPs must occur in dislocated (rather than fronted) position is tentatively related to a linearization failure arising at the root when the fronted XP and its complement are non-distinct in category.

1 Introduction Koster (1978) famously argued that clausal and nominal subjects are rather different creatures, beyond their categorial distinctness. Specifically, he showed that sentential subjects (SSs) have various properties in common  with leftdislocated constituents, leading him to conclude that SSs  are likewise clauseexternal ‘satellites,’ linked to a clause-internal (overt or covert) proform. According to this Satellite Hypothesis (SH), the SS in (1a) is syntactically represented as shown in (1b), analogously to the left-dislocation in (2). (1)

a. Dat hij komt is duidelijk. that he comes is clear ‘That he will come is clear.’ b. [[CP dat hij komt ]i [[NP e ]i [IP is duidelijk ]]] (2) Dat hij komt, dat is that he comes that is ‘That he will come is clear.’

DOI 10.1515/9781501502118-007

duidelijk. clear

(Dutch)

128 

 Dennis Ott

Alrenga (2005) generalizes Koster’s SH to all kinds of clause-initial argumental CPs, i.e. SSs and fronted complement clauses. According to Alrenga, (3a) has the representation in (3b): (3) a. That the Giants would lose, John never expected. b. [[CP that the Giants would lose ]i [CP [NP e ]i [IP John never expected tNP ]]] Like the SS in (1a), the fronted clause in (3a) is linked to a clause-internal null NP (3b), which in this case has undergone A̅ -movement. The SH has drawn heavy criticism from two different angles. First, it has been shown that a rigidly syntactic interpretation of the SH, according to which the grammar categorically enforces the clause-external representation of sentential arguments, is too strong; rather, the SH captures a violable preference related to performance factors. A second line of cricitism has drawn attention to the fact that the structures in (1b) and (3b) are fundamentally incompatible with certain syntactic facts, in particular connectivity effects. I will argue that this criticism is invalidated once the SH is paired with a particular theory of left-dislocation (LD) that uses somewhat richer (but independently motivated) representations. Specifically, the ellipsis approach to LD pioneered in Ott 2012, 2014, 2015 holds that dislocated constituents are surface fragments of what is underlyingly a full clause, parallel to the following ‘host clause.’ According to this amended version of the SH, the biclausal representation of (1a) after PFdeletion is as follows: (4) [CP1 [CPΣ dat hij komt]i [tΣ is duideljik]] [CP2 dati is duidelijk] Based on data from German and English, I will show that this implementation of the SH rebuts the criticisms raised against the original Koster/Alrenga ­proposal. The paper is structured as follows. In section 2, I summarize the core facts that led Koster and Alrenga to postulate the (extended) SH as well as their particular implementation. In section 3 I summarize a number of problems for this analysis highlighted by Takahashi (2010) and Moulton (2013). In section 4 I proceed to show that these problems are avoided by adopting Ott’s biclausal ellipsis analysis of LD. Amended in this way, the revised SH correctly accounts for the relevant facts while preserving Koster’s central insight. Section 5 offers some speculations as to why clausal arguments are represented grammatically as dislocated satellites when in clause-initial position. Section 6 concludes.



Clausal arguments as syntactic satellites: A reappraisal 

 129

2 Clause-initial clauses as syntactic satellites 2.1 A category-mismatch puzzle and its solution Building on observations going back to Ross (1967), Kuno (1973), Emonds (1976) and others, Koster (1978) made two related but logically distinct claims: 1. SSs must surface clause-initially. 2. Clause-initial SSs behave like left-dislocated XPs. I will postpone the discussion of the first claim to section 2.2 and focus for now on the second. Koster observes a number of asymmetries between NP subjects and clausal subjects, which render the distribution of the latter akin to that of left-dislocated XPs. One such asymmetry is that NP subjects but not SSs permit A̅-movement to cross them: (5) a. John, the story shouldn’t have bothered. b. *John, that the Giants lost the World Series shouldn’t have bothered. (6) a. What does this prove? b. *What does that he will come prove? Similarly, SSs typically don’t permit subject–aux inversion, e.g. in questions: (7) a. Did Joop’s getting fired surprise you? b. *Did that Joop got fired surprise you?

(de Vries 2010)

Verb movement in V2 languages behaves analogously, see Lohndal 2014. Furthermore, SSs are typically degraded in embedded contexts: (8) a. Although the house’s emptiness depresses you, it pleases me. b. ?*Although that the house is empty depresses you, it pleases me. SSs cannot be demoted in passives: (9) a. That the children are always late shows the necessity of discipline. b. *The necessity of discipline is shown by that the children are always late. c. The necessity of discipline is shown by the fact that the children are always late. Koster’s conclusion is that SSs are not ‘true’ structural subjects but rather syntactic ‘satellites’ that bind an empty subject, as shown in Fig. 1. He takes this structure to be equivalent to the structure of LD as in (10), modulo deletion of the connecting pronoun.

130 

 Dennis Ott

CP CP

CPi dat hij komt

IP

DPi e

ti

VP is duidelijk

Fig. 1: The Satellite Hypothesis.

(10) a. My father, he won’t come today. b. Dat hij komt, dat is duidelijk. that he comes that is clear ‘That he will come is clear.’

(Dutch)

Indeed, much like what we saw for SSs, no Ā-movement or head movement can cross dislocated constituents, and LD is a root phenomenon: (11) a. *Won’ti, my father, he ti come today? b. *Wer, den Mann, kennt den? who the man knows him ‘Who knows the man?’ (12) a. Peter sagt, den Mann, den kennt er. Peter says the man him knows he ‘Peter says that he knows the man.’ b. *Peter bedauert, den Mann, den kennt Peter regrets the man him knows ‘Peter regrets that he doesn’t know the man.’

(German)

er he

nicht. not (German)

German sagen ‘say’ but not bereuen ‘regret’ permit pseudo-embedded root-clause objects (‘embedded V2’), and hence left-dislocation is possible with the former but not the latter. According to the SH, then, SSs are in fact left-dislocated constituents, accounting for the effects seen above. We also explain straightforwardly why SSs are only permitted where NP subjects are permitted, but not otherwise (data from Alrenga 2005): (13) a. That sucks. b. That the Giants lost the World Series sucks. (14) a. *That seems. b. *That the Giants lost the World Series seems.



Clausal arguments as syntactic satellites: A reappraisal 

 131

The reason is that the actual, structural subject is the nominal empty placeholder, as per the SH. Koster suggests in passing that the analysis extends to clause-initial object clauses, i.e. that (15a) should be taken to be parallel to (15b): (15) a. Dat hij komt betreur ik. that he comes regret I ‘That he will come, I regret.’ b. Dat hij komt, dat betreur ik.

(Dutch)

There is indeed good reason to assume that the SH covers clause-initial CPs of any kind, as we will see presently. The original SH claimed that SSs are satellites at Surface Structure, and Koster provided no direct evidence for his assumption that they are base-generated in rather than raised to this position.1 Evidence supporting this assumption is adduced by Webelhuth (1992, chapter 3) and Alrenga (2005), who point out that traces of fronted clauses generally behave like NPs, not like CPs. To see this, consider the following contrast. German glauben ‘believe’ permits both NP and CP complements, as shown in (16); like NP complements, a CP complement can be fronted (16c). By contrast, sich freuen ‘to be glad’ selects only CP but not DP complements (17), however it does not permit a clausal complement to be fronted, as shown in (17c): (16) a. Er glaubt das nicht. he believes that not b. Er glaubt nicht dass sie kommt. he believes not that she comes c. Dass sie kommt glaubt er nicht. ‘He doesn’t believe that (she comes).’ (17) a. *Ich freue mich das. I am glad refl that b. Ich freue mich dass Hans I am glad refl that Hans c. *Dass Hans krank ist freue ich mich. ‘I’m glad that Hans is sick.’

krank ill

ist. is

(German; Webelhuth 1992)

1 As pointed out by Alrenga (2005), a movement version of the SH is in fact assumed by Emonds (1976) and Stowell (1981); see also section 3 below.

132 

 Dennis Ott

Such correlations suggest that complement clauses can only appear in fronted position when the predicate licenses an NP trace. Alrenga (2005) gives parallel examples from English:2 (18) a. It had been expected that John would be unqualified. b. That had been expected. c. That John would be unqualified had been expected. (19) a. It had been objected that John would be unqualified. b. *That had been objected. c. *That John would be unqualified had been objected. (20) a. M  ost baseball fans hoped/felt/wished/insisted/reasoned that the Giants would win the World Series. b. *Most baseball fans hoped/felt/wished/insisted/reasoned that. c. *That the Giants would win the World Series was hoped/felt/wished/ insisted/reasoned (by most baseball fans). In short, only predicates that select both clauses and NPs allow for a clausal complement to be fronted, suggesting that fronted clauses are in some sense related to an NP position.3 This generalization has been stated in various forms; two examples are given below: (21) The Moved Clausal Complement Generalization (Takahashi 2010) A clausal complement is allowed to move only if its base-generated position is one in which a DP is allowed to appear. (22) The DP Requirement (Moulton 2013) The gap of a fronted CP (sentential subject or topic) must be a DP.

2 I set aside here the structure of it-extraposition, although it is not entirely clear that the object CP is a complement of the verb in these cases. This uncertainty is not damaging to the point at hand, however, since object clauses can clearly occur in complement positions, compare Mary expected that John would be unqualified, Mary objected that John would be unqualified, etc. 3 An anonymous reviewer points out that peripheral CP-arguments can have nominal qualities even when not displaced, e.g. in coordination cases discussed in Sag et al. 1985: (i) a. You can depend on my assistant and that he will be on time. b. *You can depend on that he will be on time. A speculative extension of the analysis developed in this paper could handle such cases by assuming that what is being coordinated in (ia) is the sentence You can depend on my assistant and the left-dislocation That he will be on time, that you can depend on, with topic-drop and deletion of CP2. The first coordinate could not be represented in this way, explaining why (as Sag et al. show) it cannot be clausal in cases like (ia). On the other hand, it is not obvious that such an analysis could be reconciled with the prosodic properties of the construction. I will not investigate the issue here.



Clausal arguments as syntactic satellites: A reappraisal 

 133

Evidently, the above generalizations are descriptive statements, begging the question of why this state of affairs should obtain. Alrenga (2005) argues that the answer (or rather, a first step towards an answer) is Koster’s SH. As suspected by Koster, fronted object clauses, too, are satellites; their clause-internal empty placeholder NP moves from object position to the edge of the clause, much like the empty operator in Chomsky’s (1977) analysis of topicalization: (23) a. That the Giants would lose, John never expected. b. [[CP that the Giants would lose ]i [CP [NP e ]i [IP John never expected tNP ]]] This explains straightforwardly why cases like (20c) wind up unacceptable: unlike its in situ counterpart in (20a), the fronted clause is a clause-external satellite, but in this case no placeholder NP is licensed clause-internally (as evidenced by (20b)). We arrive at the following schematic picture: it appears that the structure in (24a) is ruled out, and the relevant strings are parsed as involving the more elaborate structure in (24b). Call this the Extended Satellite Hypothesis (ESH), summarized schematically below. (24) Extended Satellite Hypothesis a. *[CP [CP …] T [vP …]] b. [CP [CP …]i [CP/TP [DP e ]i T [vP …]]] Bear in mind that Koster and Alrenga both take the structure in (24b) to be directly parallel to LD configurations, i.e. satellite CPs are identical to left-dislocated CPs as in the following: (25) a. That the Giants would lose, that John never expected. b. Dat hij komt, dat is duidelijk. that he comes that is clear ‘That he will come is clear.’

(Dutch)

This concludes our discussion of the second claim of Koster’s SH. Before moving on, we need to briefly digress to address the first claim, according to which SSs must surface clause-initially.

2.2 The (E)SH as a performative preference Recall from the previous section that Koster’s original formulation of the SH comprised two distinct claims, viz. the claim that SSs cannot occur in non-initial positions and the claim that clause-initial clauses are satellite constituents. A number of researchers have pointed out that the first of these claims is too

134 

 Dennis Ott

strong as it stands (see Davies and Dubinsky 2010 for a comprehensive review). Once prosodic and information-structural factors are taken into account, SSs are acceptable in clause-medial positions in many cases, contrary to Koster’s assumption. Consider, for instance, the following examples from Delahunty (1983): (26) a. It seems that that Fred left early so bothered all of the people who have been waiting for him that they now refuse to do business with him. b. To what extent did that Fred failed to show up anger those of his devoted fans who had waited by the stage door since dawn of the previous day? (27) Bill wants that Fred lied to be obvious to everyone. With regard to complement clauses, there is typically a strong preference for extraposition, however this preference is not absolute either. De Vries (2010) points out that the following examples are quite acceptable in Dutch with proper focus intonation (the same holds for German): (28) a. (?)Ons heeft [dat Joop ontslagen is] in het geheel us has that Joop fired is in the whole niet verbaasd. not surprised ‘That Joop has been fired didn’t surprise us at all.’ b. (?)Of [dat Joop ontslagen is] een vergissing was, if that Joop fired is a mistake was is nog maar de vraag. is yet but the question ‘Whether it is a mistake that Joop was fired is still questionable.’ (Dutch) Dryer (1980) investigates the crosslinguistic distribution of sentential arguments and finds that there is a strong tendency for subject and object clauses to occur in peripheral positions. Dryer emphasizes that this distribution is not categorical but merely a strong preference; for instance, in situ CP objects in SOV languages are possible, but their peripheral position is often favored by discursive factors. Dryer (1980, 126) states the following “universal hierarchy of preferred positions for sentential NPs [sic]”: (29) sentence final > sentence initial > sentence internal Dryer assumes that (29) is largely a reflection of processing factors, citing Grosu and Thompson 1977. There is, then, good reason to conclude that the first claim of Koster’s SH is too strong, and that an amalgam of factors involving at least prosodic and parsing considerations rather than syntactic constraints sensu stricto yield the preference for peripheral positioning of clausal arguments; see Davies



Clausal arguments as syntactic satellites: A reappraisal 

 135

and Dubinsky 2010 for discussion.4 I will not attempt to disentangle the relevant factors here but assume the weak version of the (E)SH to be essentially correct without further argument. Note, however, that this weakening leaves claim 2 untouched, i.e. it does not absolve us from the need to answer the question why clausal arguments, when they appear clause-initially, are grammatically represented as ‘satellites’ rather than simply as fronted CPs. That is, why does a peripheral CP appear to be connected to a clause-internal empty NP rather than to a trace/copy, as brought out by the evidence (partially) reviewed in the preceding section? This question arises regardless of whether grammatical or performance factors are responsible for Dryer’s observation summarized in (29), and I will sketch a possible answer to it in section 5. First, however, we need to clarify what it means to be a syntactic ‘satellite,’ and this task is taken up in the following two sections.

3 Problems for the Koster/Alrenga implementation of the ESH 3.1 Root adjunction and CP–NP binding Recall the structure assigned to clauses with initial CPs according to the ESH: (30) [CP [CP …]i [CP/TP [DP e ]i T [vP …te …]]] The CP satellite, construed as an adjunct to what would otherwise be the root, is said to bind an empty operator at the edge of the host clause.5 This assumption is not unproblematic, however. As pointed out by de Vries (2010), the premise that a CP can bind an NP below it is at variance with the general ‘structure-preservingness’ of movement and binding relations. Just like the trace left behind by CP cannot ‘switch’ to category NP,6 a base-generated satellite CP should not be able to bind NP either.

4 At least in English, there appears to be abundant inter-speaker variation with regard to the effects mentioned above; see Lohndal 2014 for a survey. 5 The analysis is analytically equivalent to Cinque’s (1990) analysis of Clitic Left-dislocation as involving binding chains. Note that it is not crucial that the correlate always moves to the left edge, although I assume this here for ease of exposition. 6 Although this is in fact the analysis presented in Webelhuth 1992. Assuming the Copy Theory of movement, it is not obvious how it could be implemented, or even stated coherently.

136 

 Dennis Ott

In addition, at least for languages like Dutch and German, permitting adjunction to root CP amounts to a violation of the V2 constraint. In other words, the ESH implies that any sentence involving a left-peripheral clausal argument is in fact a V3 structure. This, of course, is a general problem for analyses that take leftdislocated constituents to be adjuncts to CP (and no less for analyses that split up CP into distinct projections, as per Rizzi 1997). Furthermore, it is left open by this analysis what exactly the empty NP is. This is a particularly pressing problem as it does not appear to correspond to any known empty category occurring in configurations described by (30): English, Dutch, German etc. are not pro-drop languages and do not permit ellipsis of full argument NPs either. It thus appears that Koster’s and Alrenga’s empty NP is an essentially constructionspecific element, licensed (in some way that is not spelled out) by the adjoined satellite. I argue in section 4.2 below that once constructions with initial clauses are analyzed as bi-sentential, the problem vanishes, as the optional omission of the resuming NP can be assimilated to the well-known phenomenon of Topic Drop.

3.2 Connectivity effects The central claim of the ESH is that peripheral clausal arguments are extra-­ sentential constituents: being base-generated in a peripheral position, they receive neither case nor θ-role. More generally, we expect sentential arguments in peripheral positions to not exhibit any kind of connectivity into the following clause. Unfortunately, this prediction is false: fronted CPs do show connectivity into their host, exactly like A̅ -fronted nominal arguments. The following examples are from Takahashi 2010 and Moulton 2013, supplemented with a German example: (31) a. That some student from hisi class cheated, I think that [every professor]i brought out. b. That hei’ll end up looking like his father, [every young man]i expects. c. (i)  That anyone would take offence, I did not expect. (ii) *That anyone would take offence, I expected. d. Dass seini Garten der schönste ist, glaubt that his garden the most beautiful is believes [jeder Gärtner]i. every gardener ‘Every gardener believes his garden to be the most beautiful one.’ (German) The examples show that variables/NPIs inside the initial CP argument can be bound/licensed by clause-internal binders/licensors. (Note that Condition A/B



Clausal arguments as syntactic satellites: A reappraisal 

 137

connectivity cannot be readily tested for fronted finite clauses; Condition C effects will be discussed in section 4.3 below.) This is not what the SH leads us to expect. Consider the structure of (31b) according to the SH: (32) [[CP that he’ll end up looking like his father]i [CP [NP e ]i every young man expects tNP ]] Evidently, since the CP satellite is base-generated in its extra-sentential position, no connectivity is expected, as it is not c-commanded by the QP binder at any stage of the derivation. The fact that a satellite-internal element can be bound by a clause-internal element thus suggests the presence of a lower copy of the fronted CP left behind by movement. This would entail that satellites do originate clause-internally after all, accounting for connectivity but re-introducing the category-mismatch puzzle of section 2.1. We are thus facing a somewhat paradoxical situation. While initial clausal arguments appear to be grammatically represented as satellites, they show connectivity into their host clauses nonetheless. There are two plausible options. Either the connectivity facts show that the Koster/Alrenga hypothesis is simply wrong: initial CPs must originate clause-­ internally after all. This is the route taken by Takahashi (2010), who argues that moved clausal arguments are dominated by a DP-shell (see also Davies and Dubinsky 2002, 2010 and Hartman 2012). This assumption is necessary for the satellite clause and its trace to match in category, given the facts reviewed in section 2.1. (33) a. That the Giants would lose, John never expected. b. [CP [DP ∅ [CP that the Giants would lose]] John never expected tDP ] While it is undeniably the case that clausal arguments in many languages wear their clausal nature on their morphosyntactic sleeves (see Hartman 2012 for a review), there is no evidence for the putative additional projection in languages like English, German, or Dutch. Moreover, as pointed out by Moulton (2013), the availability of the shell structure must be constrained such that it is available only in case CP moves, for otherwise we would incorrectly predict that unmoved clauses can appear in NP position, e.g. as complements of prepositions: (34) a. I spoke about *(the fact) that he left. b. *There is no indication of that she arrived yet. (35) a. *The necessity of discipline is shown by that the children are always late. b. *John likes that Peter kissed Mary. Without this additional complication we would also falsely predict in situ complement clauses to be opaque for extraction, as this would consitute a violation of the Complex-NP Constraint. Takahashi (2010) stipulates his empty D accordingly,

138 

 Dennis Ott

but this merely begs the question. At the very least, then, the DP-shell hypothesis must postulate an empty D or N head whose properties are not independently motivated.7 The other option, pursued in Moulton 2013, is to maintain the ESH but reject the idea that connectivity – in particular, variable binding as in (31b) and NPI-licensing as in (31c) – has a syntactic basis; that is, reconstruction can be achieved by semantic means alone and does not require c-command. This claim comes at the cost, of course, of nullifying the many cases in which connectivity effects do seem to have a syntactic basis. I take it that for this reason alone, it is worth exploring a less radical route that preserves a conservative notion of connectivity as a reflex of structure. An important detail remains to be mentioned. Both Takahashi and Moulton capitalize on the fact that a particular type of connectivity effect, namely reconstructed Condition C violations, fails to obtain with fronted object clauses. This is shown by the following contrast: (36) a. That Ms. Browni would lose Ohio, shei never expected. b. *Shei never expected that Ms. Browni would lose Ohio. Moulton takes this to be a vindication of his strategy to maintain the original ESH (with base-generation of CP satellites) and amend it with semantic reconstruction, which does not apply in cases like (36a). Takahashi attributes the absence of Condition C effects to Late Merger of the clausal complement to the moving empty D-head. That is, what moves is in fact the empty determiner only; its complement merges countercyclically to the moved D and is therefore, crucially, not represented in trace position: (37) a. [D ∅ ] shei never expected 〈D〉 b. [D ∅ [CP that Ms. Browni would lose Ohio]] shei never expected 〈D〉 I assume here that Late Merge as a countercyclic operation is a theoretical device that should ideally be eschewed from the theory of UG. While this is in and of itself not a reason to reject Takahashi’s analysis, I take it to be sufficient motivation to look for an alternative. In what follows, I will therefore propose an analysis that reconciles the SH with connectivity (as well as anti-connectivity as in (36a)) while maintaining the conservative position that connectivity has a syntactic basis and countercyclic Late Merge is unavailable.

7 It is noteworthy in this connection that part of Koster’s motivation for the SH was the desire “to do without dubious PS rules like NP → S.”



Clausal arguments as syntactic satellites: A reappraisal 

 139

4 Satellites are sentential fragments 4.1 The ellipsis approach to left-dislocation As noted above, the ESH assimilates sentences containing satellite CPs to LD constructions. Both Koster and Alrenga assume that LD involves a base-generated adjunct (the dislocated XP), resumed by a clause-internal pro-form. As we saw, this analysis can account for a variety of otherwise puzzling effects relating to the apparent nominal nature of CP traces, but runs afoul of the fact that satellites show connectivity into the clause they precede. Unlike Takahashi (2010) but similar to Moulton (2013), I propose to preserve the ESH, but in a revised form. While Moulton combines the ESH as as implemented by Koster and Alrenga with a radically non-syntactic theory of connectivity, I will argue instead that the SH can be maintained once a more refined theory of LD is adopted. The paradoxical situation of satellites showing connectivity into their host clauses is by no means specific to sentence-initial CP arguments. Rather, it obtains quite generally in LD constructions in languages like German and Dutch, as has been known since seminal work of Vat (1981) and others (see Alexiadou 2006). Consider the following example of a left-dislocated NP: (38) Seineni Garten, den pflegt his.Acc garden it.Acc tends ‘Every gardener tends his garden.’

[jeder every

Gärtner]i. gardener

(German)

As indicated, the dislocated NP is case-marked so as to match its correlate, and furthermore the NP-internal possessive pronoun is interpreted as being bound by the host-internal QP. This illustrates the schizophrenic nature of LD, as it is not easily amenable to a movement analysis: the host clause contains a correlate instead of a gap (and is hence syntactically complete by itself), and the dislocated NP is separated from it by a prosodic break, akin to an extra-clausal parenthetical. Yet, the dislocated NP exhibits connectivity, just like a fronted NP connected to a gap.8 To explain connectivity in LD, Ott (2012, 2014, 2015) argues that LD is in fact a bisentential construction. To illustrate, (38) is underlyingly equivalent to the following sequence of (syntactically unconnected) root clauses: (39) [CP1 [NP seinen Garten]i [pflegt jeder Gärtner ti]] [CP2 denk [pflegt jeder Gärtner tk ]]

8 Case connectivity and variable binding are not the only signs of reconstruction in LD; see Ott 2014, 2015 for extensive discussion of connectivity effects in LD.

140 

 Dennis Ott

Evidently, case-marking and connectivity are now accounted for, simply as a consequence of the NP seinen Garten being embedded in a clause that parallels the clause containing its correlate. Given the parallelism of CP1 and CP2, the surface pattern of (38) can be derived from (39) by backward deletion: (40) [CP1 [NP seinen Garten]i [∆ pflegt jeder Gärtner ti]] [CP2 denk [A pflegt jeder Gärtner tk ]] This deletion in CP1 is recoverable due to the fact that the deleted domain is given (or e-given in Merchant’s 2001 sense): under existential closure, the deleted constituent ∆ is equivalent to the constituent labeled A in CP2, ∃x.every gardener tends x. Parallelism of CP1 and CP2 is thus a prerequisite for recoverability of deletion in CP1.9 The two clauses are doubly anaphorically linked: the pronoun inside CP2 anaphorically resumes the satellite surviving deletion in CP1, and CP1 is cataphorically linked to CP2 as a result of ellipsis. Each CP1 and CP2 mapping onto separate intonation phrases, the prosodic break separating the satellite from its host follows automatically. Satellite connectivity and satellite externality are reconciled. The PF-deletion operation in (40) is not specific to LD. Ott’s claim is essentially that left-dislocated XPs are derivationally equivalent to fragment responses in Merchant’s (2004) analysis, analogously to B’s reply in the following: (41) A: Was pflegt [jeder Gärtner]i? what tends every gardener ‘What does every gardener tend?’ B: Seineni Garten. his.Acc garden ‘His garden.’

(German)

Note that we find the same connectivity here as in the LD case (38). Following Merchant (2004), we can attribute this to the unpronounced syntactic structure in B’s response parallel to A’s question: (42) [CP [NP seinen Garten]i [pflegt jeder Gärtner ti]] →P F [CP [NP seinen Garten]i [pflegt jeder Gärtner ti]] 9 Here and throughout, I assume (following Merchant 2004) without further argument that ellipsis remnants A-move to the edge of their clause to enable deletion of the resulting remnant constituent (∆ in (40)) at PF. This is by no means an innocent assumption; see Ott and Struckmeier to appear for recent discussion.

Clausal arguments as syntactic satellites: A reappraisal 



 141

There is, thus, good evidence to analyze left-dislocated constituents as sentential fragments, on a par with other superficially non-sentential utterances whose ­properties nevertheless betray underlying clausal syntax. We are now in a position to reappraise the ESH from this new angle. Couched in terms of Ott’s analysis, we assume that sentences with initial satellite CPs are composed of two root clauses (call them CP1 and CP2), the linearly first of which is elliptical (a qualification will be introduced in section 4.3 below); call this the Revised Satellite Hypothesis (RSH), illustrated in fig. 2.

CP1 CPΣ

CP2 C′

α tΣ β

NP

C′ α tNP β

Fig. 2: The Revised Satellite Hypothesis (RSH).

CPΣ is the satellite CP, which is embedded within CP1, which is underlyingly parallel to CP2, the host clause, in which the null or overt correlate stands in for and discourse-anaphorically resumes CPΣ (indicated by the dashed arrow in fig. 2). The following examples provide real-life illustrations: (43) a. Dat hij komt, (dat) is duidelijk. that he comes that is clear ‘That he will come is clear.’ (Dutch) b. [CP1 [CPΣ dat hij komt ]i [∆ tΣ is duideljik] ] [CP2 (dati/∅i) [A t is duidelijk ]] (44) a. That the Giants would lose, (that) John never expected. b. [CP1 [CPΣ that the Giants would lose ]i [∆ John never expected tΣ] ]  [CP2 (thati/∅i) [A John never expected t]] The CP satellite (the remnant of the elliptical CP1) is anaphorically related to its correlate in CP2. Crucially, this correlate – (a cognate of) that – is of category NP. The analysis thus automatically solves the category-mismatch puzzle, in essentially the way envisioned by Koster: the actual clause-internal argument is the correlate of left-dislocated XPs. Crucially, it is ‘structure-preserving,’ requiring no CP-to-NP trace conversion or other artificial devices, since it locates the CP satellite and its NP correlate in separate clauses. As in the analysis of (38) above, the NP is simply a free pronoun discourse-anaphorically resuming the clausal remnant of CP1, exactly as in the following: (45) [Everyone here takes good care of their garden.]i At least thati’s what they say.

142 

 Dennis Ott

As will be shown in section 4.2 below, the optional non-overtness of this pronoun in the configuration in 2 is expected on this analysis as well.10 It is worth emphasizing again that while the representations assigned to sentences containing satellite CPs may appear somewhat cumbersome, the analysis adds nothing to what has to be assumed to be generated anyway. CPΣ in (44a) corresponds derivationally to the fragment answer in the following (assuming, with Merchant 2004 and others, that such short answers are derived by PF-deletion): (46) A: What did John not expect? B: That the Giants would lose. The RSH inherits the bulk of the explanatory power of the ESH. Specifically, the explanation for the facts reviewed in section 2.1 remains essentially unchanged. For ellipsis to be licit, and hence for the surface pattern described by the SH to be possible, CP1 and CP2 must be parallel. This will be possible only if each

10 Note that the analysis naturally accounts for number agreement with coordinated CPs, assuming that in this case the covert pronoun corresponds to a plural form: (i) a. That he’ll resign and that he’ll stay in office seem at this point equally possible.  (McCloskey 1991) b. [CP1 [CPΣ [that he’ll resign] and [that he’ll stay in office]]i…] [CP2 (thosei) [seem at this point t equally possible]] Speakers tend to accept singular agreement as well, showing that it depends on the choice of the covert correlate. Unlike English, German does not allow plural agreement with coordinated CPs (iia). On the present analysis, this follows straightforwardly from the absence of an appropriate correlate: as shown in (iic), there is no plural pro-form that could occur (overtly or covertly) at the edge of CP2. (ii) a. Dass Fritz lacht und dass Ingrid weint hat /*haben mich that Fritz laughs and that Ingrid cries has have me nicht überrascht. not surprised b. Dass Fritz lacht und dass Ingrid weint, das hat mich nicht that Fritz laughs and that Ingrid cries that has me not überrascht. surprised c. *Dass Fritz lacht und dass Ingrid weint, die(se) haben mich that Fritz laughs and that Ingrid cries those have me nicht überrascht. not surprised ‘That Fritz is laughing and Ingrid is crying hasn’t/haven’t surprised me.’ (German) These facts thus strongly support the claim, defended further in section 4.2 below, that the SS is linked to a potentially covert placeholder correlate within its host. Thanks to an anonymous reviewer for pointing out the above facts.



Clausal arguments as syntactic satellites: A reappraisal 

 143

clause contains a predicate that can select both CP (in CP1) and NP (in CP2). This condition is met in (47) but not in (48): (47) a. That John would be unqualified had been expected. b. [CP1 [CPΣ that John would be unqualified] [had been expected tΣ]] [CP2 (that) [had been expected t]] (48) a. *That John would be unqualified had been objected. b. [CP1 [CPΣ that John would be unqualified] [had been objected tΣ]] *[CP2 (that) [had been objected t]] As regards the category–mismatch puzzle, the RSH does not differ substantially from the original (E)SH; its main virtue and advantage, as will be discussed presently, is that it avoids the problems pointed out in section 3.11 Note also that we have so far not answered the question why this prima facie more complicated representation must be assigned to sentences involving clause-initial clauses rather than a simpler one; we return to this question in section 5.12

11 A remark on a potential problem introduced by the analysis is in order. On the RSH, CPΣ within CP1 is an argument of the same predicate occurring in CP2, where it combines with CPΣ’s correlate. In apparent contradiction to this assumption, Alrenga (2005) contends that certain predicates which select NP only (and hence disallow CP complements) permit satellite CPs nonetheless. His examples and judgments are the following: (i) a. This formulation of the rule {expresses/captures/reflects/brings out} the fact that these nouns behave differently. b. *This formulation of the rule {expresses/captures/reflects/brings out} that these nouns behave differently. c. That these nouns behave differently is {expressed/captured/reflected/brought out} by this formulation of the rule. The acceptability of (ic) would thus be a problem for the biclausal analysis, since CP1 in these cases would correspond directly to the illicit (ib). However, the native speakers I informally ­consulted readily accepted (ib), suggesting that these predicates can in fact take clausal complements at least for some speakers. For speakers who agree with Alrenga’s judgment, only the ‘hanging-topic’ parse to be discussed in section 4.3 below should be available. I leave the issue open here, pending further systematic empirical investigation of cases like (ib). 12 Hartman (2012) rejects the (E)SH for the reason that peripheral CP arguments lack the information-structural properties of left-dislocated ‘topics’ (but see Miller 2001 for a partially opposing view). This is not a valid argument, however, in case CP dislocation is forced (rather than optional), in which case no particular pragmatic marking is expected (cf. Baker 1996 on LD in Mohawk); see section 5 below. Ott (2015) notes that the information-structural import of LD varies crosslinguistically, i.e. satellites are not directly associated with any particular discourse role.

144 

 Dennis Ott

Before moving on, however, let us briefly note one advantage of the RSH over the (E)SH. Recall that, modulo the general qualifications discussed in section 2.2, movement operations displacing host-internal constituents never cross the s­ atellite CP (as was shown in (6b) for wh-movement and in (7b) for subject–aux inversion). On the RSH this follows automatically, simply because the satellite is structurally properly external to the host. We return to these cases in the following section. Let us now explore some consequences of this analysis. In the following section, I show that the apparent optionality of the satellite’s NP correlate reduces to the optionality of Topic Drop, obviating the need for a constructionspecific proform. In section 4.3 I address the connectivity effects discussed by Takahashi (2010) and Moulton (2013), which are straightforwardly accounted for by the RSH.

4.2 Sentential arguments and Topic Drop Recall that one problemantic aspect of the ESH is its reliance on an essentially construction-specific empty category, viz. the empty NP bound by the satellite CP. According to the RSH, this empty category is a deleted proform occupying the prefield (Spec-C) of CP2. In this section, I show that omission of the CP2-initial pronoun indeed reduces to the phenomenon known as Topic Drop (TD, Huang 1984; cf. Ross’s 1982 Pronoun Zap), as implied by this analysis. A well-known property of TD in German is that it is most natural with nominative and accusative proforms but marked for oblique ones. The following illustrate this discrepancy: (49) a. (Das) bringt doch nichts. that brings prt nothing ‘It’s in vain.’ b. (Den) kenn’ ich nicht. him know I not ‘I don’t know him.’ (50) a. *(Dessen) bin ich mir bewusst. that.gen am I refl aware ‘I’m aware of that.’ b. *(Darüber) hab ich mich gewundert. about that have I refl wondered ‘I was surprised about that.’

(German)



Clausal arguments as syntactic satellites: A reappraisal 

 145

As pointed out by Berman (1996) (see also Oppenrieder 1991, 291f.), we find the exact same restriction at work with fronted object clauses: (51) Dass die Erde rund ist ,… that the earth round is ‘That the earth is round...’

a. (das) hat ihn gewundert. that has him surprised ‘(that) surprised him.’ b. (das) hat er nicht gewusst. that has he not known ‘(that) he didn’t know.’ c. *(dessen) war sie sich bewusst. of that was she refl aware ‘(that) she was aware of.’ d. *(darüber) hat sie sich gewundert. about that has she refl wondered ‘(that) surprised her.’ (German)

The above facts strongly suggest that optional omission of the pro-form resuming clause-initial clauses reduces to TD.13 Assuming the ellipsis approach to LD, this makes perfect sense: the satellite clause is a properly host-external constituent, so that the resuming pronoun occupies Spec-C of CP2, where it can be dropped freely: (52) [CP1 [CPΣ dass die Erde rund ist … ] [CP2 (dasi/∅i) … ] Assuming that a similar process is available in English for the pro-form that, yielding the analogous alternation shown in (53), no special empty category connecting clausal satellites to their hosts is necessary. (53) That he was in danger, (that) no boy would ever believe.

(Moulton 2013)

13 Oppenrieder notes one exceptional case: topic drop of fronted R-pronouns as in (ia) is possible, however he finds it to be illicit in conjunction with CP-fronting: (i) a. (Da) weiß ich nichts von. there know I nothing of ‘I don’t know that.’ b. Dass uns Herr Brösele besuchen will, *(da) weiss ich that us Mr. Brösele visit wants.inf there know I nichts von. nothing of ‘I don’t know that Mr. Brösele is planning to visit us.’ To my ear, this judgment is too strong, and (ib) is in fact quite acceptable, as expected. Pending further clarification of the issue, I set it aside here.

146 

 Dennis Ott

The RSH thus avoids the need for a construction-specific empty category, by reducing constructions involving clausal satellites to LD and optional TD. This is made possible by the structural separation of the satellite from the host.14 Note that TD can only apply to pro-forms fronted to the prefield, i.e. it is precluded by some other constituent moving to this position: (54) a. (Das) Weiß ich nicht. that know I not ‘I don’t know (that).’ b. Woher weißt du *(das)? from where know you that ‘How did you learn *(that)?’

(German)

It follows that where Spec-C of CP2 is occupied by a wh-phrase, the satellite must be resumed overtly: (55) a. That he will come, what does *(that) prove? b. Dass die Erde rund ist, woher that the earth round is from where ‘How did you learn that the earth is round?’

weißt du know you

*(das)? that (German)

The satellite CP necessarily surfaces to the left of the fronted wh-phrase (recall (6b)), the latter being properly contained within CP2 while the former is juxtaposed in discourse. (By the same token, subject–aux inversion cannot cross a satellite CP, recall (7b): any host-internal head movement is necessarily confined to CP2.) This leftmost positioning is a further property in which satellite CPs match left-dislocated elements (misleadingly refered to as ‘Topics’ in many cartographically-inspired analyses); see Ott 2014, 2015 for discussion.

14 Dislocation coupled with TD is likely to also be involved in what has been called emphatic topicalization (ET) by Bayer (1984, 2001) and others. In ET, an XP is fronted within a clausal complement that itself appears fronted: (i) A Audo dass da Xaver kafft hot glaub-e net. a car that the Xaver bought has believe-I not ‘As for a car, I don’t believe that Xaver bought one.’ On the present approach, such a construction would involve topicalization within CP1 (made possible by the suspension of the Doubly-filled COMP Filter in Bavarian), which is linked to CP2 by a silent correlate (Bavarian des ‘that’): (ii) [CP1 [NP a Audo]k dass da Xaver tk kafft hot]i [CP2 (desi/∅i) glaub-e net] I leave an extension of the present analysis to ET, suggested by an anonymous reviewer, to future work.



Clausal arguments as syntactic satellites: A reappraisal 

 147

4.3 Connectivity explained Recall that the main problem for the original (E)SH is the fact that it base-­generates clause-initial clauses in their peripheral surface position, which was shown in section 3 to be at variance with the connectivity effects observed in cases of CP-fronting. Some relevant examples are given/repeated below: (56) a. That some student from hisi class cheated, I think that [every professor]i brought out. b. That hei’ll end up looking like his father, [every young man]i expects. c. Dass seini Garten der schönste ist, glaubt [jeder that his garden the most beautiful is believes every Gärtner]i. gardener ‘Every gardener believes his garden to be the most beautiful one.’ (German) (57) a. (i)  That anyone would take offence, I did not expect. (ii) *That anyone would take offence, I expected. b. Dass auch nur ein Student die Aufgabe korrekt lösen that even a single student the exercise correctly solve würde hat {kein / *jeder} Professor erwartet. would has no every professor expected ‘No/Every professor expected that even a single student would be capable of correctly solving the exercise.’ (German) The examples in (56) show binding of a variable pronoun embedded within the satellite CP by a host-internal QP. Those in (57) show that NPIs contained within satellites can be licensed by host-internal negation/negative QPs (for further examples, see Takahashi 2010, Moulton 2013). Note that we cannot test for Condition A/B reconstruction since the satellites themselves correspond (roughly) to binding domains. The ESH has no way of accounting for such effects on the assumption that connectivity effects of this kind require syntactic reconstruction to and concomitant interpretation in a position occupied by a trace/copy of the satellite CP. Note, however, that the assumption that fronted CPs move to the prefield would undermine the ESH, in particular its central claim that the satellite binds a clause-internal NP operator that ‘stands in’ for it for purposes of case/θ-assignment.

148 

 Dennis Ott

The RSH, couched in terms of the ellipsis approach to LD, rectifies this situation. It allows us to have our cake and eat it, too: the CP satellite is in a properly host-external position but behaves as though it were embedded within the host, by virtue of being embedded within the parallel CP1. To illustrate,­ consider (31d), repeated in (58), where reconstruction of the satellite CP permits variable binding, and its analysis under the RSH in (59). (58) Dass seini Garten der schönste ist, glaubt [jeder that his garden the most beautiful is believes every Gärtner]i. gardener ‘Every gardener believes his garden to be the most beautiful one.’ (German) (59) [CP1 [CPΣ dass seini Garten …]k [glaubt [jeder Gärtner]i tΣ]] [CP2 (dask) glaubt jeder Gärtner] Contrary to appearances, the possessive pronoun sein is bound not by the hostinternal QP but by its counterpart within the underlyingly parallel elliptical CP1, exactly as in the fragment answer shown in (41). The net result is that the satellite CP behaves as though it were structurally embedded within its host clause, while in actual fact merely being discourse-anaphorically connected it. No additional assumptions are necessary, and only independently motivated operations of A-movement and clausal ellipsis are required. In this way, the RSH reconciles Koster’s original approach with satellite connectivity uncovered by subsequent work. Recall from section 3, however, that with regard to reconstruction the situation is somewhat more complex than this. In particular, both Takahashi (2010) and Moulton (2013) observe that Condition C effects do not appear to obtain under satellite-CP reconstruction. Some relevant examples are repeated below: (60) That Texas would be a surprise was always possible, but. . . a.   that Ms. Browni would lose Ohio, (that) shei never expected. b. *shei never expected that Ms. Browni would lose Ohio. (61) a. Dass Romneyi Florida verlieren würde, (das) hatte that Romney Florida lose would that had eri nie gedacht. he not thought ‘That Romney would lose Florida, he had not expected at all.’



Clausal arguments as syntactic satellites: A reappraisal 

 149

b. *Eri hatte nie gedacht dass Romneyi Florida verlieren wuerde. ‘He had not expected at all that Romney would lose Florida.’ (German) (62) a. Was Hansi nicht kennt, (das) isst eri what Hans not knows that eats he ‘What Hans doesn’t know, (that) he doesn’t eat.’ b. *Eri isst nicht was Hansi nicht kennt. ‘He doesn’t eat what Hans doesn’t know.’

nicht. not

(German)

Importantly, no such obviation is observed with left-dislocated NPs, which always elicit Condition C violations in the relevant configurations: (63) a. *Diesen Artikel über Romneyi, den hat eri nie gelesen. this.Acc article about Romney that has he never read ‘He never read this article about Romney.’ (German) b. *Annekei d’r broer, die vindt zei wel aardig. Ann her brother him finds she PRT nice ‘Ann likes her brother.’ (Dutch; Vat 1981) While the examples in (60)–(62) by themselves might lead us to suspect that reconstruction in LD is generally optional, (63) shows that this cannot be the case. I propose that the difference between left-dislocated clauses and leftdislocated NPs is due to the fact that CPs are propositional whereas NPs are not. A dislocated NP will always be embedded within a surrounding elliptical clause,15 since by itself it is not a propositional unit. This yields reconstruction effects, including Condition C violations. But CP satellites are different: they are propositional units by themselves. As a result, I claim, the syntactic structure I have labeled CP1, parallel to the host clause, is optional when CPs are dislocated, whereas it is obligatorily present when a non-clausal constituent is dislocated. To illustrate, while (64) can only be assigned the structure in (64a), (65) is ambiguously associated with both structures in (65a) and (65b): (64) *Diesen Artikel über Romneyi, den hat eri nie gelesen. this.Acc article about Romney that has he never read ‘He never read this article about Romney.’ (German)

15 Unless it is a ‘hanging topic’ with invariant default/nominative case (see Cinque 1977, Alexiadou 2006), in which case it exhibits no connectivity.

150 

 Dennis Ott

a. Option 1: parallel structure + deletion ⇒ Condition C violation [CP1 [NP diesen Artikel über Romneyk]i [hat erk nie tN P gelesen]] [CP2 deni hat er nie gelesen] b. Option 2: bare NP satellite ⇒ not available *[NP diesen Artikel über Romneyk]i [CP2 deni hat erk nie gelesen]

(65) Dass Romneyi Florida verlieren würde, das hatte eri that Romney Florida lose would that had he nicht erwartet. not expected ‘Romney had not expected to lose Florida.’ (German) a. Option 1: parallel structure + deletion ⇒ Condition C violation [CP1 [CPΣ dass Romneyk Florida verlieren würde]i [hatte erk nicht tΣ erwartet]] [CP2 dasi hatte er nicht erwartet] b. Option 2: bare CP satellite ⇒ Condition C obviation [CPΣ dass Romneyk Florida verlieren würde]i [CP2 dasi hatte erk nicht erwartet]

Given the availability of the structure in (65b) in which the initial CP is akin to a hanging topic (see footnote 15), Condition C is bled. That is, on this ‘WYSIWYG’ parse the satellite in (65) is as much a free-standing unit as the first sentence in the following sequence: (66) [Romneyi hat Florida verloren.]k Dask hatte eri nicht erwartet. ‘Romneyi lost Florida. Hei hadn’t expected that.’

No such alternative structure is available for (64), the case-marked NP being structurally dependent on the functional infrastructure of a clausal host. It is thus not the case that reconstruction is optional per se; rather, what is optional in the particular case of CP satellites is the elided syntactic structure supporting connectivity.16 One final puzzle remains to be addressed. Moulton (2013) notes that in cases like (67), the reconstruction behavior of satellite CP is schizophrenic: while the

16 One would, of course, like to see some independent evidence for this possibility. The ideal testing ground should be clausal fragment responses, however in this case discourse factors render the relevant cases virtually untestable, given the infelicity of switching from pronoun (in the antecedent question) to name (in the response): What did shei not expect? – #That Mrs. Browni would lose Ohio. For lack of a better test I therefore have to leave open at this point the question whether there is independent evidence for bare that-CP fragments.



Clausal arguments as syntactic satellites: A reappraisal 

 151

pronoun within the satellite can receive a bound-variable interpretation under reconstruction, the satellite-internal R-expression is free to corefer with the coindexed host-internal pronoun: (67) a. T  hat hei might be too old for Mrs. Brownk, I don’t think shek would want [any man]i to believe. b. *I don’t think shek would want [any man]i to believe that he might be too old for Mrs. Brownk. A similar pattern seems to obtain in German counterparts of Moulton’s examples: (68) a. ?Dass eri zu alt sei für Frau Müllerk, (das) möchte siek that he too old was for Ms. Müller that wants she keinen Manni glauben machen. no man believe make b. *Siek möchte keinen Manni glauben machen dass eri zu alt sei für Frau Müllerk. ‘She doesn’t want to make any man believe that he was too old for Ms. Müller.’(German) It is cases of this kind that lead Moulton to abandon altogether the idea that connectivity for variable binding is based on syntactic reconstruction. I would like to propose an alternative: the above examples are structurally ambiguous, conflating the two scenarios illustrated in (65). That is, I propose that the satellite CP in (67) can be parsed either as a free-standing ‘hanging topic’ CP (voiding the Condition C violation) or as a remnant of deletion, in which case variable binding is achieved under CP1-internal reconstruction. The following show the two possible parses of the satellite CP according to this proposal: (69) a. Option 1: parallel structure + deletion ⇒ pronoun bound, Condition C violation [CP1 [CPΣ that hei might be too old for Ms. Brownk]  [I don’t think shek would want any mani to believe tΣ]] b. Option 2: bare CP satellite ⇒ Condition C obviation, pronoun unbound [CPΣ that hei might be too old for Ms. Brownk] The parse in (69a) supports reconstruction, enabling the bound-variable reading of the satellite-internal pronoun. The parse in (69b) does not support any connectivity, thus voiding a Condition C violation in connection with the

152 

 Dennis Ott

satellite-internal R-expression. Both structures are available and are procedurally conflated in (67) and analogous examples, giving rise to the schizophrenic behavior discovered by Moulton.17 The idea that a single string can be associated with two different structures and as a result yield ‘ambiguous’ reconstruction effects is neither incoherent nor unheard of. For instance, sentences exhibiting so-called Lebeaux effects similarly bleed reconstruction for Condition C while still permitting idiomatic interpretations. Consider the following (from Larson and Hornstein 2012): (70) a. Which picture that Billi hated did hei say that Mary took? b. Which habit that Billi hated did hei say that Mary finally kicked? The complex wh-phrases here exhibit connectivity (for idiom interpretation) and anti-connectivity (for Condition C) simultaneously. In such cases, too, the grammar associates the string ambiguously with two structures, yielding the observed simultaneous occurrence of obviation and reconstruction. My claim is that this effect is analogous to the schizophrenic reconstruction behavior of satellite CPs noted by Moulton, as illustrated by (67).

17 Iatridou (1991) discusses connectivity in conditionals, which show remarkably similar behavior. Specifically, the protasis appears to reconstruct into the apodosis for purposes of variable and reflexive/reciprocal binding: (i) a. [Every boy]i gets upset if hisi mother is late. b. If hisi mother is late, [every boy]i gets upset. (ii) a. If pictures of himselfi are on sale, Johni will be happy. b. If pictures of [each other]i are on sale, [John and Bill]i are happy. At the same time, Condition C violations are obviated: (iii) a. *Shei yells at Bill if Maryi is hungry. b.   If Maryi is hungry, shei yells at Bill. The data could be taken to suggest that at least in cases like (i) and (ii), the protasis is a sentence fragment connected to an optionally covert then in the host (see Ebert et al. 2014 for recent discussion, from a different point of view, of the similarities between if -clauses and left-dislocated elements): (iv) [CP1 [CPΣ if his mother is late]i [every boy gets upset tΣ]] [CP2 (theni/∅i) [every boy gets upset]] (Compare Q: In which case would [every boy]i get upset? – A: If hisi mother were late.) Where reconstruction within an elliptical CP1 would cause a Condition C violation, the protasis could be parsed as a free-standing root clause: (v) [CPΣ if Maryi is hungry]k [CP2 (thenk/∅k) shei yells at Bill] Somewhat unexpectedly from this point of view, however, Iatridou claims that Condition C blocks reconstruction for variable binding, unlike what we saw in (67). I leave further investigation of this issue to future work.



Clausal arguments as syntactic satellites: A reappraisal 

 153

4.4 A ban on remnant-CP fronting Let us consider one final perk of the RSH. Taking left-peripheral CPs to be left-dislocates provides a straightforward solution to a puzzle noticed by Müller (1998): while extraction from complement clauses is possible (71a), and while complement CPs can marginally be fronted out of wh-islands (71b), a combination of the two conditions is impossible: remnant complement CPs created by extraction into the matrix cannot be fronted (71c).18 (71) a. Ich weiß nicht, weni sie gesagt hat [tiʹ dass Fritz ti liebt]. I know not who she said has that Fritz loves ‘I don’t know who she said Fritz loves.’ b. ??[Dass Fritz Caroline liebt]i weiß ich nicht [ob er ti that Fritz Caroline loves know I not if he zugeben würde] admit would ‘I don’t know if he would admit that Fritz loves Caroline.’ c. *[t′i Dass Fritz ti liebt]k weiß ich nicht [weni sie tk gesagt that Fritz loves know I not who he said hat] has ‘I don’t know who she said that Fritz loves.’ (German) Cases like (71c) are virtually uninterpretable, quite unlike both (71b) and structurally (seemingly) analogous instances of remnant-VP fronting out of a wh-island: (72) ??[ti Auf den Mund geküsst]k weiß ich nicht [weni sie on the mouth kissed know I not who she tk hat]. has ‘I don’t know who she kissed on the mouth.’ (German) We are now in a position to understand this effect. It is a well-known fact that while languages like German in principle permit remnant categories to be fronted, stranded material cannot be associated with a left-dislocated XP. Thus, while remnant-VP fronting is permissible, remnant-VP left-dislocation is not (data from Ott 2014): (73) a. Zugegeben hat er nicht dass er falsch lag. admitted has he not that he wrongly laid

18 Müller explains the restriction by postulating a ban on unbound intermediate traces. I will not discuss this alternative here.

154 

 Dennis Ott

b. *Zugegeben, das hat er nicht dass er falsch lag. admitted that has he not that he wrong laid ‘He hasn’t admitted that he was wrong.’ (German) As discussed in Ott 2014, the ellipsis approach to left-dislocation straightforwardly explains this discrepancy: while the fronted remnant VP in (73a) occupies the edge of its host, the dislocated remnant VP in (73b) is a clause-external element. Consequently, there is no way to associate the stranded clause with its internal-argument position, and the host clause by itself is ill-formed for lack of a matrix predicate: (74) *Das hat er nicht dass er falsch lag. intended: ‘He didn’t admit that he was wrong.’ The effect illustrated by (71c) now finds an analogous explanation. Since the satellite CP is not part of the structure of its host, the latter is necessarily illformed by itself: (75) *[CP1 [CPΣ dass Fritz ti liebt]i …] [CP2 (dask ) weiß ich nicht [weni er tk gesagt hat]] As in (73b), there is no way to associate the stranded wh-phrase within CP2 with its base position inside the satellite, since this would require some sort of intersentential rightward movement. The only alternative parse, equally illicit, is one on which the wh-phrase has been subextracted from (the base position of) the das correlate.19 Thus, on the assumption that the fronted object CP in (71c) is necessarily covertly dislocated (i.e., that it is a satellite), the otherwise puzzling ban on remnantCP preposing discovered by Müller follows automatically from the RSH without further stipulations.

5 Why are initial clauses satellites? I have argued that the original (E)SH can and should be maintained in the light of the facts reviewed in section 2.1, but that it must be supplemented with a theory of LD that accounts for the connectivity effects characteristic of this construction. I then proceeded to show that a revised form of the SH, drawing on Ott’s theory of LD, avoids the problems observed for the original SH. Specifically, this theory of LD holds that dislocated XPs are sentential fragments derived by ellipsis under

19 In addition, the fronted CPΣ within CP1 containing a wh-trace is not a permissible ellipsis fragment, since this would require deletion of the extracted wh-phrase and hence of a focus, barred by general conditions on recoverability. Given that it is syntactically incomplete, CPΣ could not be parsed as a free-standing CP either.



Clausal arguments as syntactic satellites: A reappraisal 

 155

parallelism of two clauses. The obviation of Condition C effects with clausal satellites prompted me to claim that the unpronounced structure embedding the dislocated XP is optional for clausal satellites but obligatory otherwise. Having now specified a syntactic representation for clausal satellites, we are left with the why-question mentioned at the outset: Why are left-peripheral CPs dislocated (clause-external) rather than fronted (to Spec-C), and what rules out the simple fronting option? That is, why can we only have (76b) but not (76a) whenever a CP is at the left edge of a higher CP? (76) a. *[CP CPi …ti …] b. [CP1 CP …] [CP2…] I would like to suggest two possible answers to this why-question, without attempting to conclusively decide between the two. Prosody. One plausible possibility is that the structural ‘outsourcing’ of CP-arguments accommodates prosodic preferences. More specifically, assuming a strict matching of root clauses in syntax and intonation phrases (ιPs) in the prosodic representation—as per, e.g., Gussenhoven’s 2004 Align (S, ι) and Selkirk’s 2011 Match(Clause, ι))—dislocation of sentential subjects and fronted object clauses straightforwardly permits mapping satellite and host onto separate ιPs, rather than prosodically integrating the former within the latter. In this way, the prosody provides a strong incentive to resort to the representation in (76b) while effectively discarding that in (76a). CP1 and CP2 being structurally independent root clauses, each will be mapped onto a separate ιP: (77) ([CP1 CPΣ ∆])ιP ([CP2…])ιP The important point here, then, is that dislocation of CPs as opposed to simple fronting to Spec-C of the host automatically permits their phrasing as separate intonation units. For purposes of assigning a prosodic contour, this is arguably the naturally preferential option. Assuming this to be on the right track, we expect that prosodic/phrasal weight of initial clauses will be a causal factor in the variability adumbrated in section 2.2, in that the externalization of heavy CPs as satellites is virtually obligatory whereas lighter CPs are more readily prosodically integrated into the matrix; see Davies and Dubinsky 2010 for related observations. Analogous effects are found with non-clausal left-dislocated constituents; see Ott 2015 and references therein for data and discussion. Labeling. I offer the following as an alternative or potentially additional syntactic explanation for why CP arguments are construed as satellites when in initial position: the structure resulting from internally merging a CP to the root clause is not linearizable. To see this, consider what happens when a CP (sentential subject or complement clause) is internally merged to the edge of CP, i.e. the transition from (78) to (79).

156 

 Dennis Ott

(78)

CP C

(79) TP

. . . CP . . .

CP

CP C

TP . . . CP . . .



Following Chomsky (2013), I assume that merging C and TP yields a constituent labeled by C, i.e. (what we refer to as) CP (this corresponds to what is traditionally labeled Cʹ, unavailable in Chomsky’s dynamic-labeling framework). Internal Merge of a lower (subject or object) CP with this root CP, as required for deletion of the remainder of the clause (but see footnote 9), yields {CP,CP}, as in (79). I claim that this structure cannot be linearized, since the two CPs are nondistinct in the sense of Richards (2010): a linearization statement 〈CP,CP〉 is uninformative and hence dismissed at the PF-interface. Assume, with Richards (cf. also Fox and Pesetsky 2005), that linearization statements generated in the course of the derivation refer to category labels (e.g., DP VP). Assume furthermore Richards’s (2010) Distinctness Condition on linearization: (80) Distinctness If a linearization statement (α,α) is generated, the derivation crashes. While Richards assumes that linearization statements express the linear order of hierarchically ordered constituents, at the level of the root it is necessary to linearize two hierarchically equivalent constituents (CP and whatever has been merged to its edge). It then follows that whenever what is merged to the root CP to-be in (78) is of the same category, i.e. CP, the resulting structure in (79) will violate Distinctness, giving rise to the linearization statement 〈CP,CP〉. If, by contrast, some αP (for α ≠ C) is merged (a noun phrase, say), it can be linearized to the left of CP in accordance with (80): 〈αP,CP〉. If these speculations are on the right track, they furnish an explanation for why a clausal argument cannot occupy the edge of another CP, providing an incentive for dislocation.20 Note, however, that the RSH as I have advocated it in this paper claims that CPΣ does in fact internally merge to the edge of a clause, namely Spec of CP1.

20 It could be objected at this point that sentential subjects in languages like English do not necessarily occupy the leftmost edge of root CP, but rather Spec-T (recall from section 2.2 that the SH must be weakened such as to permit this option at least in principle). The distinction between CP and TP is largely a relic of phrase-structure grammar, however; on current terms, esp. when Chomsky’s (2007) feature inheritance is adopted, the two layers effectively collapse into one. At least in subject-initial contexts, a subject is then plausibly located at the edge of root CP (C comprising the features traditionally associated with C and T) even in a language like English.

Clausal arguments as syntactic satellites: A reappraisal 



 157

(81) [CP1 CPΣ [∆…tΣ…]] [CP2 …] Why, then, does this not lead to the same impasse as the derivation we just excluded, i.e. generating CPΣ within its host and raising it? I suggest that the answer lies in the fact that CP1 is elliptical. While the structure {CPΣ,CP1} is unlinearizable per se, consider what happens once one of the sisters undergoes ellipsis: (82)

CPΣ

CP1

In this case, no linearization statement is generated at all at this level, since one of the two CPs is silent: the violation of the Distinctness Condition that would otherwise ensue is voided by deletion.21 Note that either of the two CPs can delete to void linearization failure. But given that only CP1 is parallel to the subsequent host, deleting it rather than CPΣ is the only felicitous (recoverable) option in the configurations under consideration here. By contrast, deletion of CPΣ could not be resolved against the host clause and is thus excluded by the general requirement of recoverability. Deletion of either CP can then be assumed to be free from the point of view of syntax. Note how this differs from fronting of some category other than CP: (83)

CP

αP C

TP . . . αP . . .



In this case, the two members of the set making up the root node are distinct, and hence permit linearization αP CP. This accounts for the fact that fronted NPs, PPs, VPs, etc. do not require their complement to undergo ellipsis, and hence for the fact that these categories can occur in simple fronting contexts, whereas clause-initial CPs are necessarily dislocated. I have outlined two speculative explanations for the unusual behavior of peripheral clausal arguments, one capitalizing on the prosodic status of clauses, the other on their categorial identity with the structure that embeds them. I will not attempt to decide between these options here, and they are not mutually exclusive. Further research will hopefully elucidate these matters.

21 See Takahashi 2004 for related ideas in the context of pseudogapping.

158 

 Dennis Ott

6 Conclusion In this paper, I have defended a slightly revised version of Koster’s (1978) Satellite Hypothesis, extended by Alrenga (2005), which holds that left-peripheral clausal arguments are extra-sentential constituents, related to their host clauses by an anaphoric clause-internal placeholder element. I have shown that this view of clauseinitial clauses can be maintained once the possibility of unpronounced syntactic structure is acknowledged, an automatic result of the theory of LD advanced in Ott 2014. I have also offered some speculations as to why clause-initial clauses are ‘evicted’ from their host clauses rather than occupying a clause-internal edge position, relating the phenomenon to Richards’s theory of Distinctness. Acknowledgments: Thanks to the volume editors for their patient support, and to the audience at the Labels and Roots workshop as well as an anonymous reviewer for helpful feedback.

References Alexiadou, Artemis. 2006. Left-dislocation (including CLLD). In The Blackwell companion to syntax, ed. Martin Everaert and Henk van Riemsdijk, 668–699. Oxford: Blackwell. Alrenga, Peter. 2005. A sentential subject asymmetry in English and its implications for complement selection. Syntax 8:175–207. Baker, Mark. 1996. On the structural position of themes and goals. In Phrase structure and the lexicon, ed. J. Rooryck and L. Zaring, 7–34. Dordrecht: Kluwer. Bayer, Josef. 1984. COMP in Bavarian syntax. The Linguistic Review 3:209–274. Bayer, Josef. 2001. Asymmetry in emphatic topicalization. In Audiatur vox sapientiae, ed. Caroline Féry and Wolfgang Sternefeld, 15–47. Berlin: Akademie Verlag. Berman, Judith. 1996. Topicalization vs. left-dislocation of sentential arguments in German. In The proceedings of the LFG ‘96 conference, ed. Miriam Butt and Tracy Holloway King. Stanford, CA: CSLI Publications. Chomsky, Noam. 1977. On WH-movement. In Formal syntax, ed. Peter W. Culicover, Thomas Wasow, and Adrian Akmajian, 71–132. San Diego, CA: Academic Press. Chomsky, Noam. 2007. Approaching UG from below. In Interfaces + recursion = language?, ed. Uli Sauerland and Hans-Martin Gärtner, 1–29. Berlin/New York: Mouton de Gruyter. Chomsky, Noam. 2013. Problems of projection. Lingua 130:33–49. Cinque, Guglielmo. 1977. The movement nature of left-dislocation. Linguistic Inquiry 8:397–412. Cinque, Guglielmo. 1990. Types of A-bar dependencies. Cambridge, MA: MIT Press. Davies, William D., and Stanley Dubinsky. 2002. Functional architecture and the distribution of subject properties. In Objects and other subjects, ed. William D. Davies and Stanley Dubinsky, 247–280. Dordrecht: Kluwer. Davies, William D., and Stanley Dubinsky. 2010. On the existence (and distribution) of sentential subjects. In Hypothesis A/hypothesis B, ed. Donna B. Gerdts, John C. Moore, and Maria Polinsky, 211–228. Cambridge, MA: MIT Press.



Clausal arguments as syntactic satellites: A reappraisal 

 159

Delahunty, Gerald. 1983. But sentential subjects do exist. Linguistic Analysis 12:379–398. Dryer, Matthew S. 1980. The positional tendencies of sentential noun phrases in Universal Grammar. Canadian Journal of Linguistics 25:123–195. Ebert, Christian, Cornelia Ebert, and Stefan Hinterwimmer. 2014. A unified analysis of conditionals as topics. Linguistics and Philosophy 37:353–408. Emonds, Joseph. 1976. A transformational approach to English syntax. San Diego, CA: Academic Press. Fox, Danny, and David Pesetsky. 2005. Cyclic linearization of syntactic structure. Theoretical Linguistics 31:1–45. Grosu, Alexander, and Sandra A. Thompson. 1977. Constraints on the distribution of NP clauses. Language 53:104–151. Gussenhoven, Carlos. 2004. The phonology of tone and intonation. Cambridge: Cambridge University Press. Hartman, Jeremy. 2012. Varieties of clausal complementation. Doctoral Dissertation, MIT. Huang, C.-T. James. 1984. On the distribution and reference of empty pronouns. Linguistic Inquiry 15:531–574. Iatridou, Sabine. 1991. Topics in conditionals. Doctoral Dissertation, MIT. Koster, Jan. 1978. Why subject sentences don’t exist. In Recent transformational studies in European languages, ed. Samuel Jay Keyser, 53–64. Cambridge, MA: MIT Press. Kuno, Susumu. 1973. Constraints on internal clauses and sentential subjects. Linguistic Inquiry 4:363–385. Larson, Bradley, and Norbert Hornstein. 2012. Copies and occurrences. Ms., University of Maryland. Lohndal, Terje. 2014. Sentential subjects in English and Norwegian. Syntaxe et Sémantique 15:81– 113. McCloskey, James. 1991. There, it, and agreement. Linguistic Inquiry 22:563–567. Merchant, Jason. 2001. The syntax of silence. Oxford: Oxford University Press. Merchant, Jason. 2004. Fragments and ellipsis. Linguistics & Philosophy 27:661–738. Miller, Philip H. 2001. Discourse constraints on (non-)extraposition from subject in English. Linguistics 39:683–701. Moulton, Keir. 2013. Not moving clauses: Connectivity in clausal arguments. Syntax 16:250–291. Müller, Gereon. 1998. Incomplete-category fronting. Dordrecht: Kluwer. Oppenrieder, Wilhelm. 1991. Von Subjekten, Sätzen und Subjektsätzen. Tübingen: Niemeyer. Ott, Dennis. 2012. Movement and ellipsis in Contrastive Left-dislocation. In Proceedings of WCCFL 30, ed. Nathan Arnett and Ryan Bennett, 281–291. Somerville, MA: Cascadilla Proceedings Project. Ott, Dennis. 2014. An ellipsis approach to Contrastive Left-dislocation. Linguistic Inquiry 45:269–303. Ott, Dennis. 2015. Connectivity in left-dislocation and the composition of the left periphery. Linguistic Variation 225–290. Ott, Dennis, and Volker Struckmeier. to appear. Particles and deletion. Linguistic Inquiry. Richards, Norvin. 2010. Uttering trees. Cambridge, MA: MIT Press. Rizzi, Luigi. 1997. The fine structure of the left periphery. In Elements of grammar, ed. Liliane Haegeman, 281–337. Dordrecht: Kluwer. Ross, John Robert. 1967. Constraints on variables in syntax. Doctoral Dissertation, MIT.

160 

 Dennis Ott

Ross, John Robert. 1982. Pronoun-deleting processes in German. Paper presented at the annual meeting of the LSA, San Diego, CA. Sag, Ivan A., Gerald Gazdar, Thomas Wasow, and Steven Weisler. 1985. Coordination and how to distinguish categories. Natural Language & Linguistic Theory 3:117–171. Selkirk, Elisabeth. 2011. The syntax–phonology interface. In The handbook of phonological theory, ed. John Goldsmith, Jason Riggle, and Alan C. L. Yu, 435–484. Oxford: Blackwell. Stowell, Timothy A. 1981. Origins of phrase structure. Doctoral Dissertation, MIT. Takahashi, Shoichi. 2004. Pseudogapping and cyclic linearization. In Proceedings of NELS 34, ed. Keir Moulton and Matthew Wolf, 571–585. Amherst, MA: GLSA. Takahashi, Shoichi. 2010. The hidden side of clausal complements. Natural Language and Linguistic Theory 28:343–380. Vat, Jan. 1981. Left-dislocation, connectedness, and reconstruction. Groninger Arbeiten zur germanistischen Linguistik 20:80–103. de Vries, Mark. 2010. Empty subjects and empty objects. In Structure preserved, ed. Jan-Wouter Zwart and Mark de Vries, 359–366. Amsterdam: John Benjamins. Webelhuth, Gert. 1992. Principles and parameters of syntactic saturation. Oxford: Oxford University Press.

Michelle Sheehan

A labelling-based account of the Head-Final Filter Abstract: Greenberg’s (1963) ‘Universal 21’, later rediscovered as Williams’ (1982) ‘Head-Final Filter’ (HFF) bans anything from intervening between the head of a prenominal modifier and the phrase which it modifies, as in the following infamous example from Abney (1987): (1) *a [NP [AdjP proud of his son] man]. This paper proposes that the HFF should be considered to be an instance of the more general Final-over-Final Condition (see Holmberg 2000, Biberauer, Holmberg & Roberts 2008, 2014, Sheehan 2013a, Sheehan, Biberauer, Holmberg & Roberts forthcoming) and that both should be attributed to a linearization problem associated with right-branching non-atomised specifiers. This problem arises if (i) all labels and their terminal nodes are segments of the same category, and (ii) linearization is sensitive to c-command relations between categories rather than phrases, as proposed by Sheehan (2013a, b). Crucially, by hypothesis, category c-command only mediates linearization where the combination of local selectionbased PF parameters fail, meaning that (i) harmonic head-initial/head-final orders are easier to linearise and (ii) head-final phrases in specifier positions differ from their head-initial counterparts in failing to trigger a linearization challenge.

1 Introduction: The role of labels in linearization In the context of the Minimalist Program, there has been fairly extensive discussion of (i) whether labels are needed and (ii) to the extent that they are, how they can be generated (see amongst others Chomsky 1995, 2013, Collins 2002, Seely 2006, Hornstein 2009). One domain which has not been much discussed in this context, but in which labeling might, however, play a key role is linearization. While the structure-building operation Merge is generally taken to be fundamentally symmetrical (yielding binary sets), the word order patterns attested across the world’s languages are clearly not symmetrical. Much careful research since Greenberg has revealed a number of signature word order asymmetries which pose a substantial challenge for a symmetrical mapping of hierarchical structure to linear order (see Biberauer & Sheehan 2012, 2013 for an overview). To mention just two examples, it seems to be the case that wh-movement is always to the left DOI 10.1515/9781501502118-008

162 

 Michelle Sheehan

and never to the right (Kayne 1994), and that the order of modifiers of N respects a variant of Greenberg’s Universal 20, being much more limited in the prenominal as compared to the postnominal field (Cinque 2005). It is a challenge to provide a principled account of such asymmetries whilst maintaining the idea that (symmetrical) Merge is the core structure-building mechanism.1 There are at least three different ways to do this, all of which have been explored in the (fairly) recent literature. Firstly, one could maintain the idea that phrase structure and the mapping to PF are both symmetric and attribute word order asymmetries to something independent, such as a requirement that movement be leftwards (see Abels and Neeleman 2012). Secondly, one could make the mapping to PF sensitive to an asymmetrical relation (i.e. asymmetric c-command) and then force phrase structure to become asymmetrical despite the fact that Merge itself is not. This would involve imposing additional movement operations (assuming traces do not count for linearization) (Moro 2000, Chomsky 1995, 2013) or additional applications of spell-out (see Uriagereka 1999, Nunes and Uriagereka 2000, cf. Sheehan 2013b) to eliminate points of symmetry. Finally, there is the possibility of maintaining the idea that Merge itself is symmetrical but reintroducing asymmetries into phrase structure via labels. In this last case what labels do, essentially, is to take a symmetrical structure and make its c-command relations asymmetrical, meaning that these asymmetric c-command relations can then be used to determine linear order (in the original spirit of Kayne 1994). While most recent accounts pursue the first two kinds of solutions, we will see below that there are strong empirical reasons to favour the third possibility, relying on labels. In fact, early proponents of Antisymmetry adopted a version of the third approach described above. Even in a pre-Merge era, Kayne (1994) was forced to make a distinction between projection to category and projection to segment in labelling so as to ensure that branching specifiers could be linearized (see Abels and Neeleman 2012, Sheehan 2013b). His definitions of c-command led to a predicted ban on multiple specifiers, which in turn led to the fruitful cartographic approach (Rizzi 1997, Cinque 1999). Some more recent antisymmetric approaches do not require a ban on multiple specifiers but nonetheless require a similar distinction between segments and categories in order to permit the linearization of complex specifiers (see Biberauer, Holmberg and Roberts 2014: 207 who stipulate that only categories can c-command and take categories to be heads and XPs but not X-bar projections). As we shall see

1 Indeed, for this reason, some people have rejected the idea that merge is symmetrical (Fortuny 2008, Zwart 2011), an alternative we leave to one side here.

A labelling-based account of the Head-Final Filter 



 163

below, such approaches, although fairly widely adopted, are somewhat problematic, as there is no principled account of when labeling involves projection to segment and when it involves projection to category (see also Abels and Neeleman 2012). As such, in this chapter we explore a different version of option 3 whereby labeling always involves segmentation. Essentially, then, all terminal and non-terminal copies of a given lexical item are segments of a single category. This simpler approach to labeling, in conjunction with a category-based definition of c-command and a version of the LCA serves to derive a number of word order asymmetries as well as Huang’s (1982) Condition on Extraction Domain (Sheehan 2013c) and certain kinds of extraposition (Sheehan 2010). Most crucially for present purposes, though, the approach provides a unified account of Universal 21 (Greenberg 1963)/the Head-Final Filter (Williams 1981) and the Final-over-Final Condition (Biberauer, Holmberg & Roberts 2014), whereby both result from the linearization challenge posed by right-branching specifiers. In section 2, I introduce the proposed copy theory of labeling and mention some of its benefits. Section 3 then introduces the word order asymmetries under discussion: Greenberg’s Universal 21/the Head-Final Filter (HFF) and the Final-over-Final Condition (FOFC) in some detail. Section 4 considers the evidence that the HFF, like FOFC, is a universal despite a number of apparent counterexamples. Section 5 considers the challenges involved in unifying the two asymmetries, particularly in relation to the analysis of FOFC in Biberauer, Holmberg and Roberts (2014). Section 6 offers a unified account based on the copy theory of labeling and compares it to the most influential previous accounts of the HFF. Finally, section 7 concludes and raises some questions for future research.

2 A copy theory of labelling Sheehan (2013b, c) proposes a copy theory of labelling (CoL) whereby labels and terminal nodes of the same lexical item form a single multi-segment category. Consider the following example: (1)

Y X

Y Y

Z

164 

 Michelle Sheehan

Whereas in Bare Phrase Structure, Y projects to two distinct categories in (1) (the terminal node and the combined X-bar-XP category), under CoL all three projections of Y form a single category. As Sheehan shows, this forces us to a revised category-based definition of c-command along the following lines: (2) A c-commands B iff A and B are categories, A ≠ B, A does not partially dominate B, and any category that totally dominates A also totally dominates B (Sheehan 2013b: 370) Where total and partial dominance are defined as follows (Sheehan 2013b: 370): (3) Total Dominance: a category X totally dominates a category Y iff X ≠ Y and the sum of the paths from every terminal segment of Y to the root includes every nonterminal segment of X exactly once. (4) Partial category dominance: a category X partially dominates a category Y iff X ≠ Y and the path from every segment of Y to the root includes a copy of X but X does not totally dominate Y. The main conceptual advantage of this approach to labelling over previous approaches is that all projection is uniform. In other approaches (Kayne 1994, Chomsky 1995, BHR 2014) it has to be stipulated that where a terminal merges with its complement, this gives rise to projection of a distinct category but where this object further merges with a specifier/adjunct and projects, this gives rise to projection of a segment. As Abels (2010) and Abels and Neeleman (2012) note, these two distinct kinds of projection are stipulated to hold exactly where they need to in order to make specifiers asymmetrically c-command heads and complements, but there is no principled reason why the label of a head and its complement should not be a segment rather than a new category, or why where two phrases merge this gives rise to segmentation rather than category projection.2 The approach pursued here eliminates this distinction and takes all labelling to involve segmentation. All of the copies of X in (1) are therefore true copies in that they are segments of a single category.

2 In subsequent work, Chomsky (2000, 2004) reformulates this distinction so that it involves two distinct kinds of Merge: set Merge and pair Merge. Whereas set Merge gives rise to a symmetrical set, pair Merge creates an ordered pair. This provides an alternative solution to the problem set out above: the existence of an asymmetrical Merge operation creates structures which are inherently asymmetrical. I abstract away from this possibility here as it seems unnecessary. Labels are arguably required for independent reasons and as they can introduce the necessary asymmetries into phrase structure, simple (set) Merge is arguably enough (see also Epstein, Kitahara & Seely 2012, Oseki 2014).

A labelling-based account of the Head-Final Filter 



 165

Assume, furthermore, that the copying operation involved in movement also involves segmentation and (re)Merge, as in the following tree where X is first merged with Y and then with W: (5) X

W W

Y X



Y Y

Z

Under standard approaches, where copying involves cloning, there are two distinct categories X in (5), each with distinct c-command domains. As such, it will be necessary to delete one of the copies of X in (5) in order for linearization to proceed (see Nunes 2004). If both copies of X are segments of a single category, however, no deletion is required as the whole (discontinuous) category X has only one c-command domain.3 As X is partially dominated by W and totally dominated by the same categories as W (the empty set), it follows that X asymmetrically c-commands W. The effect of movement, then, is to extend the c-command domain of a category without the need for deletion. There is a derivational stage at which W asymmetrically c-commands X and a later derivational stage at which X asymmetrically c-commands W. Assuming that, all else being equal, linearization targets the later stage of the derivation, it follows that X will precede W (if its position is determined by asymmetric c-command). As Sheehan (2013a, b) shows, CoL removes, without further stipulation, certain idiosyncratic points of symmetry which arise under Bare Phrase Structure (the bottom pair problem discussed by Chomsky 1995, Richards 2004 and the specifiercontained-in-complement problem discussed by Hallman 2004 and Barrie 2005). When taken in conjunction with a version of the LCA, the CoL also makes novel, empirically supported predictions. Sheehan proposes the following modified version of the LCA whereby asymmetric c-command maps to precedence only as a last resort. Under this approach, the order between categories which stand in a c-selection relation is determined via PF-parameters: a category X which c-selects a category Y is specified to precede or follow Y in the PF component (see also

3 This makes the CoL approach to movement similar to multidominance accounts of movement, whereby a single constituent can simply occupy multiple positions.

166 

 Michelle Sheehan

Richards 2004). This avoids the need to posit anti-locality-violating comp-to-spec movement in the syntax simply to derive head-final orders (see Sheehan 2013b for arguments in favour of this move). Linearization therefore relies first on c-selectionbased head parameters and then, where necessary, uses asymmetric c-command to fill in missing precedence commands: (6) Revised LCA (i) If a category A c-commands and c-selects a category B, then A precedes or follows B at PF. (ii) If no order is specified between A and B even transitively by (i), then A precedes B at PF if A asymmetrically c-commands B. The basic effect of (6) is that harmonic word orders can be linearized without the need to consider non-local asymmetric c-command relations. Asymmetric c-command will step in, however, wherever disharmonic orders are concerned or were unselected categories (specifiers/adjuncts) are present. Consider the following example of a harmonically head-initial specifierless structure: (7)

WP WP

W>X,X>Y,Y>Z, XP

XP

YP YP

Z

= W>X>Y>Z,

In such a structure, all categories are specified to precede (at PF) the category which they c-command and select as indicated by subscript P. Crucially, the copy theory of labelling clarifies the fact that it is only the order of the selecting and selected categories which is specified by this parameter. Nonetheless, the sum of all the PF-parameters in a harmonic system serves to provide a single unambiguous order of categories by transitivity: W>X>Y>Z. In such cases, W c-commands Y and Z and also precedes them but this information is not, by hypothesis, required to linearize the categories in (7). If X>Y and Y>Z then it follows that X>Z, irrespective of any direct c-command relation between X and Z (cf. Fox and Pesetsky 2005 for discussion). This fact becomes more salient when we consider harmonically head-final orders such as that in (8), where all categories are specified to follow their selected complement at PF, as indicated by subscript F, though, this information plays no role in the narrow syntax:

A labelling-based account of the Head-Final Filter 



(8)

WF WF

X>W, Y>X, Z>Y XF

XF

 167

YF YF

Z

= Z>Y>X>W

In (8), once again, we obtain a total linear order of categories by the sum of locally defined PF-parameters: Z>Y>X>W. In such cases, despite the fact that W still asymmetrically c-commands Y and Z, it fails to precede them. The fact that W c-commands Y and Z is irrelevant as local PF-parameters based on c-selection are sufficient to order all the categories in (8). As Sheehan (2010, 2013b, c) shows, however the CoL makes different predictions from other approaches in relation to the linearization of specifiers and adjuncts. Most notably, it means that what Uriagereka (1999) terms ‘the induction step’ of Kayne’s (1994) LCA, relying on dominance is untenable: (9) A more explicit version of Kayne’ s LCA (based on Uriagereka 1999:252) (a) Base step: If α asymmetrically c-commands β, then α precedes β. (b) Induction step: If α precedes β and α completely dominates ɣ then ɣ precedes β. Step (b) of (9) is untenable under CoL because a single category can both dominate and c-command another category. In (1), for example, Y both totally dominates and c-commands Z. The application of (b) would thus create a paradox whereby Z must precede itself. Sheehan argues, however that this is a virtue rather than a sin, as Uriagereka has independently argued that the elimination of (b) from the LCA renders the latter more explanatory in that it serves to derive Huang’s (1982) Condition on Extraction Domain (CED). Following Uriagereka (1999) and Nunes and Uriagereka (2000), I propose that complex right-branching phrases must be linearized and atomized prior to being externally merged as specifiers/adjuncts in order to avoid the linearization problem they pose. As Uriagereka points out this derives the basic specifier/complement asymmetry of the CED from the LCA: externally merged right-branching specifiers/adjuncts cannot be targeted for subextraction because they are syntactic atoms which lack internal structure. Right-branching complements, however, are not atomized: (10) a. What kind of films do you like [watching t]? Complement b. *What kind of films does [watching t] annoy you? Specifier c. *?What kind of films do you eat popcorn [watching t]? Adjunct

168 

 Michelle Sheehan

Under the CoL, atomization is triggered because right-branching specifiers are unlinearizable. Consider the following slightly simplified example. I assume that the causer ‘PRO watching films’ is externally merged in spec vP, and I take it to be a TP for ease of exposition. (11)

v T

(i) by selection: T > watching, watching > films, v > annoy, annoy > you

v

PRO

T TP

annoy

vp

watching annoyp

watchingp

you

(ii) by asymmetric c-command: T > v, T > annoy, T > you

films

The linearization problem posed by (11) is the following: it is not possible to linearize watching or films with respect to v, annoy or you as no c-selectional or c-command relations hold between these categories. While T precedes watching (by selection) and films, annoy and you (by asymmetric c-command), watching and films are not ordered with respect to annoy or you by either (i) or (ii). Sheehan (2013) proposes that in such contexts, the only possibility is atomization of the complex specifier. Once atomized, the whole specifier PRO watching films behaves like a single category and thus c-commands annoy and you, allowing linearization to take place. If atomization also blocks subextraction then this also explains why externally merged specifiers are subject to the CED (Uriagereka 1999). Crucially, right-branching specifiers differ from right-branching complements in that the latter do not pose any such linearization challenge: (12)

like like p

(i) by selection: like > C, C > T, T > watching, watching > films,

C Cp

(ii) by asymmetric c-command: like > PRO, C > PRO, PRO > T, PRO > watching, PRO > films

T PRO

T Tp

watching watchingp

films

A labelling-based account of the Head-Final Filter 



 169

Even if null categories such as C, T and PRO need to be linearized, (12) still poses no linearization challenge for the Revised LCA in conjunction with the CoL. Either a selection or asymmetric c-command relation holds between all categories in (12), allowing a total linear order to be established. The CED therefore follows from a basic geometric difference between complements and specifiers, more or less as proposed by Uriagereka (1999) in relation to BPS. Where the CoL differs from BPS (and Uriagereka’s 1999 approach) is in relation to (i) head-final specifiers and (ii) derived head-initial specifiers. Whereas, from the BPS-perspective all complex specifiers are expected to be subject to atomization and hence to CED, the CoL makes more nuanced predictions. Firstly, consider head-final specifiers, which at least in Turkish and Japanese are reported to permit subextraction, in apparent violation of the CED: (13) [Opi [Ahmet-in ti kırma-sı ]-nın beni üz-dü-ǧü] bardak. Ahmet-gen break.inf-3sg-gen I.acc sadden-past-3sg glass ‘The glass that [Ahmet’ s breaking t ] saddened me.’ (Turkish, Kural 1997:502) This follows under CoL from the fact that head-final specifiers can be linearized without the need for atomization. Consider the following (with English glosses and PF order represented visually for presentation purposes): (14)

v

D Ahmet’s

v sadden

D

breaking glass

(i) by selection: breaking > D, glass > breaking, sadden > v, me > sadden

DF

breakingF

me

vF

saddenF

(ii) by asymmetric c-command: Ahmet’s > D, Ahmet’s > breaking, Ahmet’s > glass, Ahmet’s > v, Ahmet’s > sadden, Ahmet’s > me, D > v, D > sadden D > me

A crucial difference between (11) and (14) is that the latter unlike the former can be linearized without the need for atomization. This is because, while breaking and glass fail to c-command or be c-commanded by v, sadden or me, they are specified to precede D and D must precede v, sadden and me. As a result a total linear order between categories can be established without the specifier being atomized: Ahmet’s>glass>breaking>D>me>sadden>v. The prediction, then, is that externally-merged consistently head-final specifiers need not be strong islands, unlike externally merged head-initial specifiers: a prediction which appears to hold in Turkish and Japanese.

170 

 Michelle Sheehan

Sheehan further shows that CoL makes different predictions from BPS with respect to derived specifiers, accounting both for (i) the fact that they are often not subject to CED and (ii) that nor are their extraposed complements: (15) a. From which types of clauses has extraction taken place? b. ?Which types of clauses has extraction taken place from? c. *Which types of clauses has extraction from taken place? While a full discussion of this would take us too far afield, the basic difference between externally merged head-initial specifiers (11) and derived head-initial specifiers (15) is that with the latter it is possible to avoid atomization via scattered deletion. This gives rise to the discontinuous constituency labelled extraposition (see Sheehan 2010 for details). All in all then, CoL serves both to resolve some of the problems facing BPS and to derive certain nuanced facts regarding exceptions to CED and the connection to extraposition. Below we shall see that CoL also provides a potential account of certain cross-linguistic word-order asymmetries. We first consider these phenomena in some detail before providing an analysis of them in section 6.

3 Universal 21, the Head-Final Filter and the Final-over-Final Condition Greenberg (1963: 70) notes that of the four possible word order combinations of Adj, Adv and N, (16d) alone is unattested in his representative sample of languages: (16) a. [N [Adv-Adj]] b. [N [Adj-adv]] c. [[Adv-Adj] N] d. *[[Adj-Adv] N] As well as holding across languages, this restriction also holds within languages. Thus Greenberg further notes that while there are N-Adj languages allowing both Adj-Adv and Adv-Adj orders, there are no Adj-N languages allowing both orders of Adv and Adj (in this prenominal position). This asymmetry is given as his Universal 21: (17) Universal 21 (U21, Greenberg 1963: 70) If some or all adverbs follow the adjective they modify, then the language is one in which the qualifying adjective follows the noun and the verb precedes the object as its dominant order.



A labelling-based account of the Head-Final Filter 

 171

English fairly uncontroversially adheres to this universal, generally disallowing the order Adj-Adv in prenominal position (examples from Sadler and Arnold 1994: 187, 190): (18) a. *a [dressed in blue] man b. *the [navigable by boat] rivers c. *a [running smoothly] meeting d. *a [long for such a late hour] journey e. *a [skilful for a novice] surgeon Where adverbials precede the adjective in this prenominal position, however, the result is fully grammatical: (19) a. a [smartly dressed] man b. the [easily navigable] rivers c. a [smoothly running] meeting d. a [most extremely long] meeting e. a [highly skilful] surgeon Recently, the significance of Greenberg’s word order universals for theories of linearization has received renewed attention (see Cinque 2005, Abels and Neeleman 2012 on U20 and Kayne 2013 for other examples). Interestingly, though, the relevance of U21 in this regard has been largely overlooked and it has rather been considered an effect to be explained in its own right (see Abney 1987, Bošković 2005 in particular, and section 6.4 below) following Emonds’ (1976) and Williams’ (1980) rediscovery of it in slightly more general terms. Williams (1982: 161), in particular, observes that there is a more general restriction on prenominal modification and claims that it equates to “a constraint barring post-head material in prenominal modifiers” (his ‘Head-Final Filter’ - HFF). Post-adjectival adverbial modifiers are banned as in (17), but so too are complements, as in the following examples: (20) a. *a [bored of French] student b. *a [sick of waiting] patient c. *an [afraid of his contemporaries] writer d. *a [sporting a mackintosh] man A plausibly related phenomenon is the class of adjectives which derive historically from prepositional phrases: asleep, aslant, ajar, atilt, and cannot surface prenominally. Larson and Marušič (2004: 270, fn 2) propose that a- is still a head in such examples, meaning they have the same status as (20). In all these cases, then, what seems to be at stake is a restriction on material following the projecting category in a specifier position within a nominal projection.

172 

 Michelle Sheehan

The HFF bears a striking surface resemblance to a more general restriction on disharmonic word orders first pointed out by Holmberg (2000) and explored at length in Biberauer, Holmberg and Roberts (BHR) (2008, 2014), Biberauer, Newton and Sheehan (2009a, b), Biberauer, Sheehan and Newton (2010), Sheehan (2009, 2013a, b) and Sheehan, Biberauer, Holmberg and Roberts (forthcoming): the Final-over-Final Condition: (21) The Final-over-Final [Condition] (FOFC) (informal statement – BHR 2014: 171) A head-final phrase ɑP cannot dominate a head-initial phrase βP, where ɑ and β are heads in the same extended projection. The evidence for (21) suggests that it holds in a range of grammatical contexts across a wide range of unrelated languages. Take for example the lack of V-O-Aux word orders in synchronic and diachronic Germanic varieties as well as Latin, Finnish, Saami and Basque, discussed at length by BHR. Taking Aux to be a surface instantiation of T, this is ruled out as an instantiation of the FOFC-violating structure: (22) *[TP [vP V DP] T] This gap is all the more striking, as BHR point out, given that all other potential orderings between V, O and Aux are attested, and that languages such as Basque and Finnish permit considerable word order freedom, including the non-FOFCviolating disharmonic order, but still fail to permit (22). Other evidence for FOFC comes from the fact that VO languages very generally seem to lack final complementizers, as BHR also note (citing Hawkins 1990:256–257, 2004, Dryer 1992:102, 2009:199–205, Kayne 2000:320–321): (23) *[CP …[vP V DP] C] This follows because FOFC holds transitively up the clause: if vP is head-initial, this rules out a final T (as already discussed) and if TP is head-initial then this in turn rules out a final complemetizer. The prediction of (21) is therefore that once an extended projection is head-initial it cannot be head-final at any higher point. Likewise, Biberauer, Sheehan & Newton (2010) give evidence for a lack of the surface word order Pol-TP-C (where Pol is a Polarity question marker) both amongst South Asian languages (citing Davison 2007) and also cross-linguistically (citing Dryer 2005a, b). They argue that this, again, is evidence for (21), assuming, following Laka (1994), Rizzi (2001) and Holmberg (2003) that Pol occupies a position between T and C: (24) *[CP [PolP Pol TP] C] As such FOFC has been shown to be a pervasive word order asymmetry (see BHR 2014, Sheehan, Biberauer, Holmberg and Roberts forthcoming for further evidence



A labelling-based account of the Head-Final Filter 

 173

as well as a discussion of apparent counterexamples). Moreover, unlike other word order asymmetries discussed in the literature, FOFC stands alone in that it involves the ordering of heads rather than phrases. As such, it provides important potential support that some version of Antisymmetry (Kayne 1994, 2013) applies to heads as well as phrases (contra Richards 2004, Abels and Neeleman 2012). The superficial similarity between the HFF and FOFC is striking, but there are actually substantial challenges in attempting to subsume one under the other, not least of which is the fact that the HFF has been argued to hold only in certain languages, unlike FOFC which is taken to be universal (again, see Sheehan, Biberauer, Holmberg and Roberts forthcoming on apparent counterexamples to FOFC). The following section assesses the evidence in favour of the HFF first in relation to English, then in a broader cross-linguistic context, before providing a unified account of the two word order asymmetries.

4 The empirical status of the HFF 4.1 English In addition to (18) and (20) above, there are further contexts in English which have been argued to provide evidence for the HFF. Grosu & Horvath (2006), for example, argue at length that the obligatory extraposition seen with comparatives and degree modifiers in English and other languages is directly attributable to the HFF: (25) a. *John is [more than Bill (is)] intelligent. b. *John is [more than he is fit] intelligent. c. *John is [-er than Bill (is)] tall. (26) a. *John is [too to be honest] kind. b. *John is [as as Mary] smart. Note that this extraposition is not due to the realization of the degree modifier as a bound or free morpheme. The assimilation of these patterns to the HFF is possible, as Grosu & Horvath (2006) note, if, in all cases, more, -er, too and as are projecting heads, as has been independently proposed. Grosu, Horvath & Trugman (2007: 13) further claim that the ban on right-hand modifiers inside comparative DPs also has the same explanation: (27) a. *John is a [more intelligent than Bill] man. b. *John is a [more unusually than any of you] dressed student.

174 

 Michelle Sheehan

For this to be the case, the underlying structure of comparatives and degree modifiers must be basically as follows, as they note: (28) a. [AdjP [DegP more [CP than Bill (is)]] intelligent]] b. [DP a [NP [AdjP [DegP more [CP than Bill (is)]] intelligent]] man]] The HFF then (descriptively) forces the complement of the Deg head (the category than and all it dominates) to be extraposed to the right periphery of AdjP or DP. Note that this pattern holds irrespective of the order between the degree modifier and the adjective. Enough differs from other degree modifiers in English in surfacing to the right of the adjective it modifies. In all cases, though, complements are obligatorily extraposed, arguably because of the HFF: (29) a. *John is [DP[as smart as Pete] a guy] b. John is as smart a guy as Pete. (30) a. *John is [DP [too smart to argue with] a guy] b. John is too smart a guy to argue with (31) a. *John is [DP a [tall enough to play basketball] guy. b. John is a tall enough guy to play basketball. Finally, consider the behaviour of tough- and other adjectives selecting a clausal complement, which are also subject to the HFF: (32) a. *a [difficult for anyone to read] book b. *an [easy to persuade someone to read] book (33) a. *A [pretty for anyone to look at] flower b. *An [unlikely to choose] film c. *A [willing to help out] receptionist The ungrammatical examples in (32), named ‘tough-nuts’ by Berman (1974), can be ruled out by the HFF under the assumption that they share a basic structure with clausal tough-constructions. Hicks (2009) argues for the following structure for clausal tough-constructions whereby the non-thematic subject of a toughconstruction is base generated inside the null operator as the object of the most embedded verb. Once this operator has moved to the edge of CP, the DP then becomes visible for raising to the matrix subject position, as per the following slightly simplified structure: (34) [[This book]k is [AdjP difficult [CP [DPj Op tk] C [TP PRO to read tj ]]]] If the tough-nut structure is parallel, then in instances of indirect adjectival modification, the nominal contained in the null operator (books) would raise to spec



A labelling-based account of the Head-Final Filter 

 175

CP, as per Kayne (1994), with the reduced relative clause then free to front or remain in situ: (35) a. [CP [books]k C [AdjP difficult [CP [DPj Op tk] C [TP PRO to read tj ]]]] b. [DP D [FP [AdjP difficult [CP [DP Opj tk] C [TP PRO to read tj ]]m F [CP [books]k C tm ]]] Where AdjP moves to the prenominal position (Spec FP), the fact that its CP complement is obligatorily extraposed would thus be a further effect of the HFF.4 There is thus considerable evidence that the HFF applies to several different kinds of prenominal modifiers in English (AdjPs, PPs, DegPs). But although the effect of the HFF is pervasive, it is not, apparently, absolute. Even in English, ‘tough-adjectives’ with non-finite clausal complements can often surface in the preverbal position (corpora examples from Leung & Van der Wurff 2012): (36) a. an easy-to-understand book b. a hard to refute argument c. some difficult-to-reach places

4 Actually, as Fleisher (2008) notes, citing Berman (1974), there are some surprising differences between tough-constructions and tough-nuts, notably regarding the thematic status of overt for-arguments: (i) a. This is a tough building for there to be a riot in. b. July is an unusual month for it to snow (in). c. *This building is tough for there to be a riot in. d. *July is unusual for it to snow (in). He concludes from this that whereas for-arguments are selected by the tough-adjective in toughconstructions, they are contained in CP in tough-nuts. In other respects, though, tough nuts share many properties with tough constructions. Fleisher (2011) further discusses a superficially similar construction, which he calls the nominal attributive-with-infinitive construction (nominal AIC): (i) a. Middlemarch is a long book to assign. b. Bob is a short guy for the Lakers to draft. On the surface, this might be taken to be a further example of the HFF in action. Fleisher argues at length, however, that this construction has a wholly distinct structure, resulting from a nonfinite relative clause. Whereas only a limited class of adjectives can surface in tough-nut constructions (the same which can surface in clausal tough-constructions), many more adjectives can participate in nominal AICs. There are thus no paraphrases of (i. a) equivalent to: (ii) a. *It is long to assign Middlemarch. b. *Middlemarch is long to assign. Fleisher (2011) argues convincingly that the CP in nominal AICs is a non-finite relative clause rather than a complement of the adjective. This accounts for the fact that not all adjectives participating in this construction can select a clausal complement as well as the fact that nominal AICs can only surface in predicative positions, like other nominals bearing non-finite relatives.

176 

 Michelle Sheehan

The usual explanation for these kinds of counterexamples is that the AdjPs in question are ‘complex lexical items’ or ‘atomic units’, hence the tendency for hyphenation (cf. Nanni 1980: 574, citing Roeper and Siegel 1978). This is arguably the case also with other apparently right-branching prenominal modifiers, which certainly have a lexical ‘frozen’ flavour and are also often written with hyphenation: (37) a. his holier-than-thou attitude b. the Final-over-Final Condition c. his down-to-earth demeanour Note that this is also possible in some restricted cases with comparatives, and as O’Flynn (2008, 2009) notes, a small group of adjectives which cannot appear in tough-constructions: (38) a. Mary is a [taller than average] player b. there are [more than six] players in our team c. an eager-to-please boyfriend Crucially, these structures share certain properties with compounds. For example, regular plural morphology is blocked inside complex prenominal modifiers, as Sadler and Arnold (1994: 189) note, just as it is inside compounds. In a postnominal position, however, such morphology is required: (39) a. more than ten mile(*s) long walk b. a walk more than ten mile*(s) long (40) a. a bug(*s)-catcher b. a catcher of bug*(s) Another indication that these examples are frozen lexicalized structures stems from the fact that they cannot contain adverbial modifiers, as Nanni (1980: 575) notes: (41) a. *an easy to quickly clean room b. *a hard to find in the attic manuscript c. *a simple to neatly sew pattern
 Observe also that overt experiencers are also banned in these lexicalizations (Nanni 1980: 575), as are parasitic gaps and multiple embeddings: (42) a. *a difficult for anyone to read book b. *a difficult to file without reading paper c. *an easy to persuade someone to read book



A labelling-based account of the Head-Final Filter 

 177

The construction is also limited in its productivity, being highly marginal with most tough-adjectives which are not on the easy-hard scale: (43) This is a(n) *unpleasant/*annoying/??amusing/?fun to read book This idiosyncratic restriction as well as the ban on internal syntactic structure are the hallmarks of a lexical phenomenon. Leung & Van der Wurff (2012) observe, moreover, that examples like those in (36) are only attested in corpora since the 1920s, suggesting that they are part of a recent trend towards heavy prenominal modification. The alternative ‘tough-nut’ construction in (44), which complies with the HFF, is, however, attested from Old English onwards, and is fully productive (Leung & Van der Wurff 2012):5 (44) a. a tough nut to crack b. a difficult person to please c. a difficult place to reach I therefore assume that these apparent counterexamples to the HFF in English are atomic lexical units, which do not represent a genuine counterexample to the HFF. In fact, that such atomized chunks are not subject to the HFF follows immediately from the analysis put forth in section 5 whereby the HFF results from a linearization problem associated with right-branching specifiers. Crucially, in these terms, atomic units with no internal structure pose no such linearization problem.

4.2 Other languages subject to the HFF The HFF can be seen to hold in (at least) German, Dutch, Swedish, Finnish, Hungarian, French, Spanish, Portuguese, Italian, Romanian6, Czech, Slovak, Sorbian, Serbo-Croatian, Slovene and Persian (cf. Abney 1987, Sadler and Arnold

5 Van Riemsdijk (2001) discusses other apparent counterexamples involving a superficially right-branching structure, but where the rightmost Adj is semantically the head of AdjP: (i) A close to trivial matter (ii) a far from trivial matter These seem to be the adjectival equivalent of ‘measure’ nouns, which, contrary to appearances, are not the heads of the DPs in which they are contained: (iii) A load of people are/*is waiting outside. They do not, therefore, represent robust counterexamples to the HFF. 6 Romanian appears to permit violations of the HFF in comparative constructions, for unclear reasons.

178 

 Michelle Sheehan

1994 on English; Williams 1982, Haider 2004 on German; Zwart 1996, Hoekstra 1999 on Dutch; Platzack 1982, Delsing 1993 on Scandinavian; Grosu & Horvath 2006 on Hungarian; Bouchard 1998, 2002, Abeillé & Godard 2000 on French; Luján 1973 on Spanish; Giorgi 1988, González Escribano 2004: 1, fn 2 on Italian; Grosu & Horvath 2006 on Romanian; Siewierska & Uhlířová 2000 on Slavic; and Cinque 2010: 44-49 for a brief overview). In all such languages, prenominal adjectives cannot be followed by a complement CP/PP:7 (45) de [ trotse (*op zijn vrouw)] man the [ proud (*of his wife) ] man intended ‘the man proud of his wife’

[Dutch] (Zwart 1996: 85, fn 3)

(46) ein [ unzufriedener (*damit)] Syntaktiker an unsatisfied (*it.with) syntactician intended ‘a syntactician unsatisfied with it’ (47) une [ facile (*à remporter) ] a easy  to win intended ‘a victory easy to win’ (48) uma [ boa (*a matemática)] a good    at maths intended ‘a student good at maths’

victoire victory aluna student

[German] (Haider 2004:783)

[French] (Abeillé & Godard 2000: 344) [Portuguese]

(49) O [interesantǎ (*pentru noi toţi)] propunere [Romanian] an interesting for us all proposal intended ‘an interesting proposal for us all’ (based on Grosu & Horvath 2006: 28) Although comparative DPs are not generally discussed in relation to the HFF, my investigations suggest that at least Dutch, Spanish, French, Portuguese and Hungarian show the same restriction in this domain too. The effect is therefore a pervasive property of a number of natural languages. Note that this ban applies even in languages which only allow restricted classes of prenominal adjectives, where its salience as a constraint would be less manifest, making it difficult to acquire. In Persian, adjectives usually follow the noun, which bears ezafe marking. One exception to this comes from superlatives which precede the noun and do not require ezafe (Samiian 1994): (50) kûechek-tarin mive small-est fruit

7 In actual fact, as discussed in section 4.3, Serbo-Croatian and Slovene actually seem to be subject to the constraint in a weaker form.



A labelling-based account of the Head-Final Filter 

 179

Interestingly, in such contexts, Persian also disallows a complement to occur between the prenominal superlative adjective and the noun: (51) *[vafadar-tarin be shohar-am] loyal-est to husband-my

zan woman

[Persian]8

As such, there is suggestive evidence that the ban results from a synchronically active constraint, rather than an historical idiosyncrasy, as it holds even in hidden pockets of some languages.9 As expected, more strongly head-final languages trivially conform to the HFF, as left-branching prenominal modifiers are fully permitted. Kornfilt (1997: 96) shows that the following are well formed in Turkish, for example:10 (52) [Ben-im kadar yorgun] I-gen as.much.as tired ‘A person as tired as me’

bir a

(53) [Koca-sın -a çok sadık] Husband-gen dat very loyal ‘A woman loyal to her husband’

insan person bir a

[Turkish]

kadın woman

In Japanese, likewise, left-branching prenominal modifiers are fully acceptable, and indeed are the only available option:11

8 Thanks to Yalda Kazemi Najafabadi for all Persian judgments. 9 The languages under discussion use various strategies to comply with the HFF including extraposition/fronting of the offending material. We return to these patterns and how they can be explained in section 6. 10 Note that in Turkish all adjectival modifiers of NP precede the indefinite article (cf. Tat 2010 for an account of this based on Kayne 1994). 11 In fact, as Larson & Takahashi (2007) show, prenominal relative clauses in head-final languages behave unlike post-nominal relative clauses and like prenominal AdjPs in requiring a strict ordering, with direct (individual-level) modifiers occurring closer to N than indirect (stage-level) modifiers: atta] [tabako-o suu] hito-wa Tanaka-san desu. 
 (i) a. [Watashi-ga kinoo [1SG-NOM yesterday met][tobacco-ACC inhale] person-TOP Tanaka.-COP ‘The person who smokes who I met yesterday is Miss Tanaka.’ b. ?*[Tabako-o suu] [watashi-ga kinoo atta] hito-wa Tanaka-san desu. (ii) a. The invisible visible stars (stage level > individual level) b. #The visible invisible stars (individual level > stage level) c. The stars which are visible which are invisible. d. (?)The stars which are invisible which are visible. In English, the strict ordering between stage-level and individual-level modifiers does not apply to post-nominal relative clauses. This follows if Japanese can RCs occupy the specifier positions of dedicated functional projections in the extended nominal field. In languages with right-branching relative clauses, this will be blocked by the HFF.

180 

 Michelle Sheehan

(54) chocolate daisuki josei chocolate love woman ‘a woman fond of chocolate’

[Japanese]12

(55) kuroi fuku o kiteiru josei black clothes acc wearing woman ‘A black clothes wearing woman’

[Japanese]

Thus far, the data from a range of Indo-European languages as well as Finnish, Hungarian, Turkish and Japanese reveal the HFF to be more than a languagespecific idiosyncrasy. The question remains open, though, whether it is a universal constraint, or a pervasive trait only of certain languages. The existence of apparent counterexamples to the effect even within the European area in certain Balkan/Slavic languages might appear to suggest the latter. I argue in the following section, however, that these often only marginally acceptable counterexamples do not serve to undermine the potential universality of the constraint, as there is independent evidence that prenominal modifiers involve considerable hidden structure at least in some of these languages.

4.3 Apparent counterexamples to the HFF It has long been noted that a number of Balkan languages appear to permit surface violations of the HFF. Cinque (2010: chapter 4) notes that there are apparent exceptions to the HFF in Russian, Bulgarian, Macedonian, Polish, Ukrainian and Greek. These languages all allow right-branching adjectival phrases to appear prenominally (cf. also Babby 1975, Grosu & Horvath 2006 and Pereltsvaig 2007 on Russian, Siewierska & Uhlířová 2000 on Slavic and Androutsopoulou 1995 on Greek). Grosu & Horvath 2006 further note exceptions from Romanian comparatives: (56) a. [dovol’nyi vyborami] prezident [Russian] satisfied elections.instr president ‘the president satified with the elections’ (Cinque 2010: 46, citing Bailyn 1994: 25) b. [mnogo gordiyat săs svoeto [Bulgarian] dete] bašta very proud.the with self.the child father ‘the father very proud of his child’ (Cinque 2010: 46, citing Tasseva-Kurktchieva 2005: 285)

12 Thanks to Makiko Mukai for the Japanese judgments.

A labelling-based account of the Head-Final Filter 



 181

c. i [ perifani ja to jos tis] mitera [Greek] the proud of the son her mother ‘the mother proud of her son’ (Cinque 2010: 46, citing Androutsopoulou 1995: 24) In actual fact, it seems that Serbo-Croatian and Slovene also permit slightly marginal surface violations of the constraint, arguably more so, even than Macedonian.13 There are three possible implications of such counterexamples. In the worst case scenario, they might force one to abandon the HFF as a deep property of grammar and to posit it as a fairly superficial though (curiously) recurrent language-specific rule. Alternatively, it might be the case that the effect is parameterized to hold only in certain languages. Finally, the HFF might still reveal a deep property of grammar with some other independent property serving to give rise to apparent surface violations in some languages. The first two interpretations of the counterexamples are the most common in the literature, but I argue tentatively for the final possibility here.

4.4 Russian and Polish Polish permits surface violations of the HFF, though in many contexts, such orders are slightly marginal, and wholesale extraposition is strongly preferred for the speaker I consulted: (57) a. ??[ubrany dressed b. mezczyzna man

w czern] mezczyzna [Polish]14 in black man [ubrany w czern] dressed in black

(58) a. ?[starszy od Johna] przyjaciel older than John friend ‘an older friend than John’

13 A crucial difference between Serbo-Croatian/Slovene and Polish/Bulgarian, though, is that in the former HFF-violating orders are always highly marginal and significantly worse than other potential orders. From a sampling perspective, note that all of the languages which apparently fail to adhere to the HFF are either Slavic or from the Balkan Sprachbund, meaning that on a typological scale the scope of the counterexamples is quite limited (more so, it must be noted, than the languages adhering to the HFF). 14 Thanks to Malgorzata Krzek for Polish judgments.

182 

 Michelle Sheehan

b. przyjaciel [ starszy od Johna] friend older than John ‘a friend older than John’ In Polish, non-branching non-classificational adjectives must precede the noun (Rutkowski 2002, 2007, Rutkowski and Progovac 2005). Extraposition of AdjP therefore appears to be triggered only where AdjP is right branching, in compliance with the HFF. These data appear to suggest that prenominal AdjPs in Polish can be right-branching, but that this is only a marginal possibility, and, in fact, the awkwardness serves to make the postnominal position possible, even preferred, suggesting that the HFF does hold to some degree. In Russian, too, such word orders are stylistically marked according to Grosu, Horvath & Trugman (2007), and wholesale extraposition is preferred. It is not clear, therefore, whether Russian and Polish are any different from Serbo-Croatian and Slovene with respect to the HFF. All languages marginally permit surface violations of the constraint but the latter two have additional word orders available via preposing and CP/PP extraposition which are not possible in Polish or Russian (an issue which we return to in section 6).

4.5 Greek In Greek, likewise, the HFF-violating order is slightly marginal (as Grosu, Horvath & Trugman 2007 also note).15 Once again, the prenominal order alternates with wholesale extraposition. However, as is more generally the case with post-nominal adjectives, this is only possible where the adjectival phrase is also marked for definiteness (see Androutsopoulou 1995)): (59) a. (?)i [perifani ja to jos tis] mitera [Greek] the proud of the son her mother ‘the mother proud of her son’ (Cinque 2010: 46, citing Androutsopoulou 1995: 24) b. i mitera *(i) [perifani ja to jos tis] [Greek] the mother the proud of the son her ‘the mother proud of her son’ As such, Greek, too, displays a weak sensitivity to the HFF, though the marginal acceptability of (59a) remains problematic.

15 Thanks also to Dimitris Michelidoukakis for discussion of the Greek data.



A labelling-based account of the Head-Final Filter 

 183

4.6 Bulgarian In Bulgarian, the HFF-violating order appears to be fully acceptable, though CP complements in comparatives can still surface in an extraposed position. There is reason to believe, though, that complements of Adj might raise separately from Adj to a prenominal position. In Bulgarian, which has enclitic determiners, where an AdjP is fronted, any material preceding the adjective is obligatorily pied-piped along to the pre-determiner position, whereas any material following it is left behind (cf. Dimitrova Vulchanova and Giusti 1998, Embick & Noyer 2001, Bošković 2005, Dost & Gribanova 2006, amongst others): (60) a. mnogo xubavi-te knigi very nice-the books ‘the very nice books’ b. *mnogo-te xubavi knigi

[Bulgarian]

(61) a. kupena-ta ot Petko kniga bought-the by Petko book ‘the book bought by Petko’ b. *kupena ot Petko-ta kniga c. vernij-at na Vera muž truthful-the to Vera husband ‘the husband truthful to Vera’ d. *veren na Vera-ta muž

(Bošković 2005: 31: fn 39)

Note that in (60), PP complements/modifiers of Adj also surface in a pre-nominal position in apparent violation of the HFF, though the enclitic definite article intervenes between them and the adjective. One way to analyse these word orders, based on Kayne’s (1994) analysis of relative clauses is to posit two separate phrasal movements in such cases. The PP complement of the Adjective first vacates AdjP, possibly moving to spec CP and then the AdjP remnant moves to spec DP, where it serves as host to the enclitic D:16 (62) [DP The [CP [XP yellow]j C [IP [book] I [e]j ]]]

16 Giuliana Giusti reminds me that not all analyses of enclitic determiners posit movement. While Dimitrova Vulchanova and Giusti 1998 give strong evidence against N-to-D movement, however, they fail to give compelling evidence against phrasal movement, so in the absence of evidence to the contrary, we tentatively pursue the idea that movement is nonetheless involved.

184 

 Michelle Sheehan

(63) a. [FP F [CP [PP na Vera]i C [IP [NP muž ] I [Adj vernij ti ]] b. [DP [Adj vernij ti ]j –at [CP [PP na Vera]i C [IP [NP truthful the to Vera

muž] husband

I tj]]

In the case of definite DPs, this gives rise to overt discontinuity of AdjP, but where indefinites are concerned D is covert and so the adjective and its complement will be string adjacent, giving the surface appearance of an HFF violation. This analysis seems empirically superior to either a straight head or phrasal-movement account as well as a non-syntactic account because of the facts in (60-61). Although there is no direct evidence for such an analysis in the other Slavic and Balkan languages which permit HFF-violations because (with the exception of Macedonian and Greek) they lack overt articles, it might nonetheless be the case that a similar derivation applies in these cases. For this reason it seems preferable to maintain the HFF as a universal in the belief that an independently motivated explanation for marginal violations in Russian, Polish and Greek will emerge, possibly also based on remnant movement. In the following section we therefore consider the possibility of unifying the HFF with FOFC, taking both to be universal.

5 Challenges facing a unified account Biberauer, Holmberg & Roberts adopt a version of Antisymmetry and offer an account of FOFC whereby it results from restrictions on the distribution of a head-final movement trigger ^. Essentially, they propose that this head final trigger (^), if present at all in a given extended projection, must be present on the lexical head. The trigger feature can then spread through the extended projection monotonically stopping at any point but without skipping any intermediate head positions. The result is a system in which, within an extended position, head-finality, if present, must begin at the bottom and spread upwards. While this proposal can deal quite well with the core FOFC gaps (*V-O-Aux, *Pol-TP-C) and can be extended to deal with others (*C-TP-V), it is not easy to see how it can explain the HFF.17

17 Moreover, FOFC is not restricted to instances of complementation: Sheehan (forthcoming) shows, for example, that [V-Adv]-Aux orders are just as rare as [V-O]-Aux orders.



A labelling-based account of the Head-Final Filter 

 185

As noted above, it nonetheless seems desirable to provide a unified account of HFF and FOFC, given the superficial similarity between the two (something which BHR 2008 also note): (64) HFF         *[NP [AdjP Adj β] N]. (65) Basic FOFC:          *[γP [αP α β ] γ ] The HFF, like FOFC, holds both cross-linguistically and in languages with variable word orders, making this similarity even greater. An important difference between the HFF and FOFC (in its simplest form), however, is that examples of the latter involve complementation between γ and αP, whereas in (64) AdjP is a modifier of NP. For the HFF to fall under the account of FOFC provided by BHR, it would have to be the case that N is in the extended projection of Adj. While it is fair to say that there is no general consensus as to the structural status of (the various kinds of) adjectives (Adjs), with many different possibilities being entertained in the literature, certain possibilities can nonetheless be ruled out quite uncontroversially. As is well known, the argument/adjunct distinction is far from clear-cut, both empirically and theoretically, but there nonetheless seem to be strong reasons to reject the idea that adjectival phrases are the complements of N. It also seems implausible that adjectives would be the lexical head at the base of the nominal extended projection, rather than the noun. Without serious revision, then, it seems unclear how BHR’s account can be extended to cover the HFF cases. Interestingly, the opposite possibility, that N is the complement of Adj has been pursued, notably by Abney (1987). We return to this proposal and the problems it faces in section 6.4, but note immediately that under Abney’s proposal (also taken up by Boškovic 2005), the HFF is not a FOFC effect, but rather a direct consequence of the fact that Adj can take only one complement (cf. Svenonius 1994 and below for critical discussion). Sheehan (2013a, b) offers an alternative account of FOFC based on CoL. As she notes, FOFC is unusual as it is a word order generalization which appears to hold both of base-generated and derived orders. While BHR’s account accounts neatly for the lack of base-generated V-O-Aux orders, it provides no account of why derived word orders would display the same effect. Consider, by way of illustration, the fact that in Basque and Finnish, two languages with variable VO/OV word orders, the ban on V-O-Aux is still observed (as noted by BHR). CoL can account for the fact that FOFC holds both of base-generated and derived word orders. The absence of V-O-Aux as a basic word order is immediately explained by the combination of the head-parameter and the revised LCA, because asymmetric c-command regulates the order of categories in disharmonic combinations. Consider the two disharmonic combinations, initial-over-final (inverse FOFC) and final over-initial (FOFC-violating).

186 

 Michelle Sheehan

(66) a.

b. *

AuxP AuxP

VF O

Aux AuxF

VP VF

 VP

O

The inverse-FOFC order in (66a), which is widely attested, poses no linearization problem. V c-commands and c-selects O and is specified to follow it, yielding O>V. Aux c-commands and c-selects V and is specified to precede it, yielding Aux>O. The sum of these two precedence pairs fails to give any order between Aux and O, and so (ii) of the revised LCA applies, yielding Aux>O. The result is the unambiguous linear order Aux>O>V. The FOFC-violating order in (66b), on the other hand, is different. V c-commands and selects O and is specified to precede it, yielding V>O. Aux c-commands and c-selects V and is specified to follow it, yielding V>Aux. The sum of these two precedence pairs again fails to give any order between Aux and O, and so again (ii) of the revised LCA applies, yielding Aux>O. The linear order of such a structure is therefore specified by the revised LCA as V>Aux>O rather than V>O>Aux. The apparent prediction is that where what is descriptively a head-initial phrase is selected by a head-final phrase, the result will be discontinuous linearization of the lower phrase. As such the absence of V-O-Aux in base-generated orders falls out as an effect of the mapping to PF. In other words, FOFC is not a narrow syntactic constraint, nothing bans a head-final phrase from dominating a headinitial phrase in the syntax, but when linearized, such a structure will yield a discontinuous surface order (V-Aux-O rather than V-O-Aux). This is a welcome result if linear order is a property only of PF. This explains why VOAux is unattested as a basic word order, but it does not immediately explain why it is ruled out as a derived order, where movement has taken place. The basic order in Finnish is SVO, and there is good evidence that Auxfinal order is derived via phrasal movement (Holmberg 2000, 2001). What also needs to be ruled out, then, is the linearization of a head-initial VP in a specifier position above the position of the auxiliary, which would give derived V-O-Aux. Interestingly, this is precisely the kind of structure which creates a more general linearization problem under the CoL (as discussed above) because the complement of the derived specifier (VP) cannot be ordered with respect to the clausal spine (Aux). Crucially, the word order combinations O-V-Aux and Aux-O-V, even if they are movement-derived, present no such linearization challenge. CoL thus predicts V-O-Aux to be ruled out as the surface order even with derived word orders (all else being equal). Consider the following simplified representation, where a head-initial VP has raised past Aux: (67) [FP [VP V O] F [AuxP Aux [tVP]



A labelling-based account of the Head-Final Filter 

 187

This structure will pose the same kind of linearization problem as the head-initial specifiers discussed in section 2. In some cases, the problem can be resolved via extraposition, with VP being spelled out discontinuously as V-Aux-O, but in other cases discontinuous spell-out seems to be ruled out and the result is ungrammaticality.18 The CoL account of FOFC can be extended to cover the HFF cases once certain independently motivated assumptions about adjectival modification are made. In the next and final section, I outline how this would work, considering also the advantages of this approach over the main alternative analyses of the HFF.

6 Towards a unified account 6.1 Kayne’s (1994: section 8.4) discussion Kayne (1994: 97–101) proposes the following derivation for preverbal adjective phrases, based on his raising analysis of relative clauses, drawing on a long tradition of raising prenominal adjectives from a post-verbal position (following Chomsky 1957): (68) [DP The [CP [AdjP yellow] j C [IP [book] I [e]j ]]] In Kayne’s terms, after the head noun book has raised to spec IP to satisfy the EPP, the AP yellow raises to spec CP, giving the surface word order. The implication is that much adjectival modification involves a covert relative clause. As evidence for this derivation, Kayne cites the fact that prenominal adjectives, like RCs and reduced RCs render it possible for a definite determiner to surface with indefinite nominals, which otherwise reject the: (69) *The sweater of John’s is beautiful. (70) The sweater of John’s that was lying on the sofa is beautiful. (71) ?The yellow sweater of John’s is beautiful. (72) ?The recently arrived sweater of John’s is beautiful. In Kayne’s terms, (69) is ungrammatical because the definite determiner the simply cannot select an indefinite NP such as sweater of John’s. In (70), the presence of a

18 Sheehan (2013a) proposes that where a Tense/Mood/Aspect marker lacks features and so does not need to probe its complement, the VP can be atomized, leading to surface FOFC-violations with so-called final particles. In such cases, however, VP is predicted to resist subextraction, and this appears to hold, at least in some languages.

188 

 Michelle Sheehan

relative clause attenuates this incompatibility, plausibly because in such cases D selects CP rather than NP: (73) [The [CP [NP sweater of John’s]i that ti was lying on the sofa]] is beautiful. If (71) and (72) also involve covert relatives, he proposes, then the fact that they are much more acceptable than (69) follows. Kayne’s approach to adjectival modification offers no account of the HFF, as it stands. It does, however, offer the basis of a FOFC-based explanation. If all prenominal AdjPs are derived relative clauses, then the lack of head-initial pre-nominal AdjPs is also a ban on derived head-initial specifiers, making the effect look much more akin to a FOFC violation, as construed by Sheehan (2013a, b). There is a serious problem, though, with assuming that all prenominal modifiers are base generated as relative clauses, namely the well-known fact that not all adjectives can participate in indirect modification (i.e. function predicatively and surface in relative clauses, see Emonds 1976, Cinque 2010: chapter 4). I address this problem in the next section, before offering an account of the HFF in section 6.3.

6.2 Direct/indirect modification In the discussion of adjectival phrases, a distinction is often made between ‘direct’ and ‘indirect’ modification. Cinque (2010), following Sproat and Shih (1988, 1990) and many others, makes a distinction along the following lines between the two kinds of modification:19 direct (attributive) modification –– obeys the universal adjective hierarchy –– permits only a non-intersective reading indirect (predicative) modification –– need not obey the universal adjective hierarchy –– permits only an intersective reading20

19 We limit my discussion to modification within DP in this chapter. The possibility remains, however, that the same constrast exists in the clausal domain, giving rise to intersective vs. nonintersective readings. Moreover, as Emonds (1976) and Haider (2004) note, some version of the HFF is also observed in the clausal domain (see Haegeman et al. for recent discussion). We put this matter to one side here for reasons of space (see also Larson 1998). 20 There is disagreement in the literature as to whether direct modification gives rise to ambiguity or to only a non-intersective reading. This stems from the fact that many adjectives can participate in both kinds of modification. Cinque advocates the stronger position whereby there is a closer syntax/semantics mapping, and direct modification always gives rise to a non-intersective reading.



A labelling-based account of the Head-Final Filter 

 189

Sproat and Shih posit this distinction to deal with the two kinds of modification observed in Mandarin, but the distinction also exists in English and many other languages.21 Cinque (2010) pursues the idea that the two readings result from distinct syntactic configurations.22 Direct modification involves an AdjP being externally merged as the specifier of a dedicated functional head, whereas indirect modification involves a (reduced) relative clause construction.23 If the HFF reduces to a FOFC effect occasioned by movement of a head-initial modifier from a covert relative clause, it follows that only indirect modifiers should be subject to the constraint. Direct modifiers, being externally merged as specifiers, should be immune to FOFC. Unfortunately, deciding which adjectives function intersectively and which do not is, however, far from straightforward. At the two extremes, it is fairly clear that former can only function as a subsective/direct modifier whereas red can only be intersective hence indirect: (74) a. A red door = A door which is red b. A former colleague ≠ a colleague who is former Clearly, former complies with the HFF in that it cannot surface with a post-head complement/modifier. This is, however, perhaps irrelevant to the HFF, as former and other clear direct modifiers do not readily accept any kind of modification:24 (75) a. *more former/alleged than John b. *a very former/alleged colleague c. *A more former/alleged colleague than John

21 Not all languages have both kinds of modification, though. Cinque (2010: chapter 3) cites Slave (Athapaskan), Lango, Hixkaryana and Tiriyó as languages lacking direct modification, citing Baker (2003: 207), Noonan (1992: 103) and Dixon (2004: 28f) respectively. He also discusses Yoruba and Gbaya Mbodómó (Niger-Congo) as languages lacking indirect modification, citing Ajíbóyè (2001: 6) and Boyd (1997: section 3.1.3) respectively. 22 Reichard (to appear) proposes that the intersective reading observed with relative clauses falls out from the phasal architecture. 23 Cinque (2010) further assumes that relative clauses are also externally merged as specifiers, but we adopt the more conservative view that they are complements of D, as per Kayne (1994). 24 Cinque (2010), citing Tallerman (1998) does mention a few potential examples, but these all plausibly involve idioms or parenthesis: (i) I feel the most utter fool (ii) The main point in principle is that…

190 

 Michelle Sheehan

A crucial question, then, is whether there is any evidence of direct modifiers being subject to the HFF. Some potentially relevant examples are discussed by Cinque (2010), who attributes the two potential meanings of old to the direct/ indirect contrast: (76) A friend who is old (77) An old friend While (76) can have only a pure intersective reading, where old denotes absolute age, (77) is ambiguous between this reading and another reading, whereby old denotes length of friendship (cf. Larson 1998, citing Siegel 1976 for discussion). Cinque (2010) also notes that adjectives like old can be followed by a thanclause when used comparatively and that, in such contexts, the two meanings of older are made more explicit: (78) a. John is a friend older than Mary/the legal age. b. John is an older friend than Mary/#the legal age. In the first example, only the absolute age reading is possible, as expected if this example involves a reduced relative clause. In the second example, however, only the length of friendship reading is possible. If the length of friendship reading involves direct modification, then (78b) provides evidence that direct modification is also subject to the HFF, as preposing of thanP is ungrammatical; (79) *John is an older than Mary friend. It is not clear, though, that the contrast here involves direct vs. indirect modification. According to Larson (1998), the distinction here does not concern intersectivity per se, but rather the variable which the adjectival phrase modifies. The noun friend contains an event variable which can also be intersectively modified by an adjective giving rise to the length of friendship reading in (77). Recall also the following facts discussed by Bresnan (1973): (80) a. I know a man taller than my mother b. #I know a taller man than my mother Given that phrasal comparatives in English are covert clausal comparatives (Lechner 1999), one way to account for the infelicity of (80b) is to posit the following elided material: (81) a. I know a man taller [than Opi my mother is ti tall] b. #I know a taller man [than Opi my mother is a ti tall man] This explains why (80a-b) favour the readings they do, without the assumption that they involve direct/indirect modification.



A labelling-based account of the Head-Final Filter 

 191

In sum, it seems to be the case that the HFF applies only trivially to direct modifiers, as the latter cannot generally be modified. It is thus plausible that the HFF reduces to a restriction on the spelling out of head-initial phrases raised from a complement to a specifier position: i.e. to a FOFC effect.

6.3 The proposal If all prenominal indirect modifiers result from phrasal movement from a reduced relative clause, then the HFF can be assimilated to the PF-account of FOFC, proposed by Sheehan 2013a, as outlined above. In this analysis, FOFC is an effect of the linearization algorithm which relies only on c-selection and c-command relations between categories and not between phrases and uses asymmetric c-command only as a last resort where selection based relations are insufficient. For this reason, a difference emerges between right-branching and left branching specifiers: only the latter can be linearized. Consider the following as a simple illustration (with some simplification, including the assumption that PP is atomic): (82) a. [NP [AdjP [PP PP] Adj ] N [AdjP [PP …]] b. [NP [AdjP Adj [PP PP]] N [AdjP Adj [PP …]]] In (82a), the category Adj, which does not enter into a selection relation with N, asymmetrically c-commands and so must precede the category N, and the atomic category PP must precede Adj (based either on selection or asymmetric c-command), giving the unambiguous order PP>Adj>N. For this reason, preposing PP serves to avoid an HFF violation. This can be observed in German, Dutch, Scandinavian, Finnish, Hungarian, Czech, Slovak, Sorbian, Serbo-Croatian and Slovene, in which it is possible to prepose a PP complement/modifier of Adj (cf. Haider 2004 on German; Zwart 1996, Hoekstra 1999 on Dutch; Platzack 1982, Delsing 1992 on Scandinavian; Grosu & Horvath 2006 on Hungarian; Siewierska & Uhlířová 2000 on Slavic): (83) een op Marie verliefde jongen an of Marie in.love boy ‘A boy in love with Marie’

[Dutch, Hoekstra (1999: 180)]

(84) ett sedan i går välkänt faktum a since yesterday well.known fact ‘a fact well-known since yesterday’

[Swedish, Delsing (1992:25)]

(85) na svého syna pyšný muž [Czech, Siewierska & Uhlířová (2000: 135)] of his son proud man

192 

 Michelle Sheehan

(86) A fizetésükkel elégedetlen munkások nem dolgoznak jól. the salary.their.with dissatisfied workers.NOM not work.3PL well ‘Workers dissatisfied with their pay don’t work well.’ [Hungarian, Grosu & Horvath (2006: 21)] This strategy is also marginally available in Macedonian, a language which has been claimed not to be subject to the HFF: (87) ?momche zavisno od svoite roditeli boy dependent on his parents

[Macedonian] 25

In (82b), on the other hand, the category Adj, which again does not enter into a selection relation with N, must still precede N as it asymmetrically c-commands it, and PP must follow Adj. For this reason, the c-command relations between PP and N must be inspected in order for them to be ordered. The problem here is that neither category c-commands the other. PP is partially dominated by N, but it is also totally dominated by Adj, which does not totally dominate N. For this reason it is not possible to linearize such a structure and the derivation crashes at PF. As with other FOFC-violating structures, however, there are two further ways to avoid such a crash. One possibility is for AdjP to be spelled out in its base position, where both Adj and PP are asymmetrically c-commanded by N and hence must follow it. This compliance strategy is available in all of the languages investigated (English, German, Swedish, Dutch, Afrikaans, Slovene, Serbo-Croatian, Polish, Macedonian, Portuguese and French) in all HFF contexts: (88) Une victoire [facile à A victory easy to win ‘An easy victory to win’

remporter]

[French] (Abeillé and Godard 2000: 339)

boa ( *a matemática)] (89) uma aluna [ a student good at maths intended ‘a student good at maths’

[Portuguese]26

(90) a. A student bored of French b. A patient sick of waiting c. A chair in the corner

25 Thanks to Nino Nikolovski for the Macedonian judgments. 26 Actually, the adjectives boa/bom (good) and mau/má (bad) also marginally allow PP extraposition of the complemente: (i) Uma boa aluna a matemática A good student at maths This is a very restricted phenomenon, however, as extraposition is not possible with the PP complements of orgulhosa/o (proud), chateada/o (annoyed), farta/o (tired), satisfeita/o (satisfied).



A labelling-based account of the Head-Final Filter 

 193

(91) a. a man blue in the face
 b. a payment due in thirty days
 c. a child restless in her seat
 It is possible that in such cases, no movement to a preverbal position takes place, and what we have is just a reduced relative clause in its base position. An alternative compliance strategy which is more restricted in its application in ways that are not yet fully understood, is partial extraposition, whereby the offending post-adjectival material is extraposed. Some languages, such as Slovene and Macedonian seem to permit this quite freely with both CP and PP complements/modifiers: (92) a. zavesten otrok, da je vojna [Slovenian] aware.M child.M that is.3.SG war.F ‘a child aware that there is a war’ b. odvisen mladenič od svojih staršev dependent.M young.man.M from REFL.POSS.ACC parents.PL.ACC ‘a youth dependent on his parents’ Other languages permit this basically only with degree modifiers (English, German, Swedish, Dutch, Afrikaans): (93) a. A bigger problem [CP than Op we had first anticipated] b. A bigger fool [CP than Op John (…) ] c. too smart a guy [CP to argue with] e. a tall enough guy [CP to play basketball] (94) Een meer intelligente student dan Wim [Dutch, Hoekstra (1999: 180)] A more intelligent student than Wim ‘a more intelligent student than Wim’ As discussed above, this kind of extraposition is also attested as a FOFCcompliance strategy as it also avoids the linearization problem under discussion. The copy of CP in its base position CP is asymmetrically c-commanded by everything else and so can be straightforwardly linearized. Note finally that the HFF applies to what are clearly fronted reduced relatives. As Cinque (2010) notes, the languages which permit surface HFF-violations also do so with reduced relatives: (95) a. Sidjaščaja okolo pal’my devuška [Russian] sitting near palm girl (očen’ krasiva) (very pretty) ‘the girl sitting near the palm (is very pretty)’ (Cinque 2010: 46, citing Babby 1973: 358)

194 

 Michelle Sheehan

b. ta prósfata sideroména me prosohi pukámisa [Greek] the recently ironed with care shirts ‘the shirts recently ironed with care’ (Cinque 2010: 46, citing Melita Stavrou, p.c.) Kayne (1994: 98-99) gives other reduced prenominal relative clauses, which with the exception of incorporated prepositions, are subject to the HFF: (96) a. The recently sent (*to me) book b. The much referred to hypothesis c. The little slept in bed These facts make it all the more plausible that fronted AdjPs are reduced relatives, subject also to the same constraint: a ban on non-atomic right-branching specifiers. Where atomization is possible as an idiosyncratic process, apparent surface violations of the HFF are possible: (97) a. his holier-than-thou attitude b. the Final-over-Final Condition c. his down-to-earth demeanour In such cases, the fronted AP lacks internal structure and so no linearization problem arises. Similarly, if comparable orders are derived via two separate movements, as in Bulgarian, no linearization problem arises (taking the PP to be atomic): (98) [DP [Adj

vernij ti ]j –at [CP [PP na Vera]i C [IP [NP muž] truthful the to Vera husband

I tj ]]

As such, the HFF falls out from the same linearization challenge which gives rise to FOFC, subject only to superficial counterexamples where (i) the complex AdjP is atomized or (ii) the AdjP raises separately from its complement to a prenominal position. Before concluding, we briefly compare this proposal to previous accounts, making clear the differences.

6.4 Previous accounts In an early approach, Abney (1987) proposes an analysis of the HFF whereby prenominal adjectives are heads in the extended projection of N, as mentioned above: (99) [DP D [AdjP Adj [NP N]]]

A labelling-based account of the Head-Final Filter 



 195

Bošković (2005) proposes that this structure is available only in languages with determiners. His basic proposal is that in such languages, the DP projection serves to make APs into arguments, whereas in languages lacking determiners no such possibility exists. As APs cannot function as arguments, it follows that, in determiner-less languages, only an NP-over-AP construction is possible (whereby the adjective is adjoined to NP). The attraction of his proposal concerns the other seemingly unrelated parametric effects which Bošković attributes to this NPover-AP structure, notably the possibility of left branch extraction (LBE). From such a perspective, the prediction is that determiner-less languages should not be subject to the HFF as, in these languages, adjectival modifiers are phrasal. Compare the structure in (100) with that in (99): (100) [NP [AdjP Adj [PP/CP …]] N ] As also noted above, Abney’s account of the HFF, if correct, renders the effect wholly distinct from FOFC, despite their surface similarities. The crucial fact about (99) is that the Adjective takes the NP as its complement and so can take no other complements (assuming binary branching), hence the HFF. It seems, however, that both Abney’s account of the HFF and Bošković’s parameterization of it are empirically problematic.27 While Bošković’s parametric account is highly elegant, it suffers from some obvious empirical problems. The first concerns the predicted correlations between the two properties. The prediction is that languages without determiners will not be subject to the HFF, whereas those with determiners will be. This predicts two classes of languages, when in fact all four possible combinations of the two properties seem to be attested: Tab. 1: The Head-Final Filter (HFF) and NP vs. DP languages. Class

Languages

NP language

Obeys the HFF

A B C

Russian, Polish Czech, Serbo-Croatian, Slovene, Finnish English, German, Dutch, Swedish, Spanish, Portuguese, Italian, French Bulgarian, Macedonian, Greek, Romanian (in comparatives)

Y Y N

N Y Y

N

N

D

27 The debate as to the relative merits of parameterizing nominal denotation and thus the presence of D is well-rehearsed. Some conceptual problems with Abney’s general approach are discussed by Svenonius (1994) and Pereltsvaig (2007).

196 

 Michelle Sheehan

Classes A and C on Tab. 1 conform to expectations. Russian and Polish are NP languages and hence fail to adhere to the HFF, whereas English, German etc. are DP languages which obey the HFF. Classes B and D, however, are not expected to exist. The NP/DP parameter provides no explanation as to why Bulgarian, Macedonian and Greek should fail to adhere to the HFF as they are all DP languages (as Pereltsvaig 2007 also notes). Some of these languages, might, however, be only superficial counterexamples as proposed above. More problematic is the fact that many NP languages are apparently at least weakly sensitive to the HFF. Bošković notes that Czech, Slovak, Sorbian, Bosnian/Croatian/Serbian and Slovene do freely permit phrasal prenominal modifiers, but with left-branching complements (see also section 3.2.1, Cinque 2010: chapter 4, fn 9, citing Siewierska and Uhlířová 1998: 135f). The problem with this is that the same is also true of German and Swedish which are DP languages, and are thus predicted to pattern differently. In fact, the availability of preposed complements appears to be a compliance strategy for the HFF as noted above. As Svenonius (1994) and Hankamer & Mikkelson (2005) point out, there are also serious problems with Abney’s (1987) original analysis of the HFF which are retained in Bošković’s account. Firstly, prenominal adjectives fail to block N-to-D movement in many languages which nonetheless adhere to the HFF, suggesting that they cannot be heads (cf. Longobardi 1994). Moreover, as noted by both Greenberg and Williams (see section 2), it is not the case that prenominal adjectives must be heads bearing no complements or modifiers, merely that they cannot bear right-branching complements/modifiers (as Hankamer & Mikkelson 2005: 96 note). In essence, Greenberg’s Universal 21, which clearly notes that word order is the crucial factor here, indicates that Abney’s account cannot be the whole story (see also Cinque 2010: chapter 4 for further problems).

7 Conclusions It has been argued that the HFF and FOFC both derive from a difficulty associated with linearizing right-branching specifiers. This difficulty, it has been proposed, arises in the context of a Copy Theory of Labelling whereby projection always leads to segmentation and c-command is calculated between categories. Essentially, right-branching specifiers cannot be linearized without atomization or scattered deletion, giving rise to CED effects and extraposition respectively. In the context of prenominal modifiers, additional compliance strategies are observed: complements of Adj can be fronted or the whole AdjP can be spelled out in base position to avoid the aforementioned linearization challenge. Head-final specifiers behave differently, not being subject to CED effects in some languages and failing to display restrictions such as FOFC and the HFF. This can be attributed to the simple fact that



A labelling-based account of the Head-Final Filter 

 197

if A>B and B>C then A>C, making atomization/extraposition unnecessary. Under such an approach, pervasive word order asymmetries are ascribed to a revised version of the LCA which maps hierarchical structure to linear order based on syntactic c-command and c-selection relations. Crucially, though, neither FOFC nor the HFF hold at the level of the narrow syntax; rather they emerge at PF as arbitrary side effects of a system with a ‘precedence preference’. This is a welcome result if the operations of the narrow syntax are not sensitive to word order. Acknowledgments: This is a standalone version of chapter 7 of Sheehan, Biberauer, Holmberg & Roberts (forthcoming), refocused on the topic of labelling. Thanks to the European Research Council Advanced Grant No. 269752, “Rethinking Comparative Syntax” (ReCoS) for funding this work. Thanks also to Joe Emonds and Edwin Williamson for discussing these issues with me. All errors are, of course, my own.

References Abeillé, Anne and Danièle Godard. 2000. French word order and lexical weight. In Robert Borsley (ed.), Syntactic Categories, 1–27. New York: Academic Press. Abels, Klaus. 2010. Anti-locality, snowballing movement, and their relation to a theory of word order. Paper presented at GIST 1, Ghent, Belgium. Abels, Klaus and Ad Neeleman. 2012. Linear Asymmetries and the LCA. Syntax 15. 25–74. Abney, Steven P. 1987. The English noun phrase in its sentential aspect. MIT Ph.D. thesis. Ajíbóyè, Ọládiípò. 2001. The internal structure of Yorùbá DP. Paper presented at ACAL 32. UC Berkeley. Androutsopoulou, Antonia. 1995. The licensing of adjectival modification. In J. Camacho, L. Choueiri and M. Watanabe (eds.), Proceedings of the Fourteenth West Coast Conference on Formal Linguistics, 17–31. Stanford: CSLI Publications. Babby, Leonard Harvey. 1975. A transformational grammar of Russian adjectives. The Hague: Mouton. Bailyn, John. 1994. The syntax and semantics of Russian long and short adjectives: An X‘-theoretic account. In J. Toman (ed.), Proceedings of the Annual Workshop on Formal Approaches to Slavic Linguistics, 1–30. Ann Arbor: Michigan Slavic Publications. Baker, Mark. 2003. Verbal adjectives as adjectives without phi-features. In. J. Otsu (ed.), Proceedings of the Fourth Tokyo Conference on Psycholinguistics, 1–22, Keio University. Berman, A. 1974. Adjectives and adjective complement constructions in English. Harvard University Ph. D dissertation. Bernstein, Judith B. 1995. Adjectives and their complements. Paper presented at LSA. Bhatt, Rajesh and Roumyana Pancheva. 2004. Late merger of degree clauses. Linguistic Inquiry 35. 1–45. Biberauer, Theresa, Anders Holmberg and Ian Roberts. 2008. Structure and linearization in disharmonic word orders. In Charles B. Chang and Hannah J. Haynie (eds.), The Proceedings of the 26th Western Coast Conference on Formal Linguistics, 96–104. Somerville, MA: Cascadilla Proceedings Project. Biberauer, Theresa, Anders Holmberg and Ian Roberts. 2014. A syntactic universal and its consequences. Linguistic Inquiry 45(2). 169–225.

198 

 Michelle Sheehan

Biberauer, Theresa, Glenda Newton and Michelle Sheehan. 2009a. Limiting synchronic and diachronic variation and change: the final-over-final constraint. Language and Linguistics 10. 701–43. Biberauer, Theresa, Glenda Newton and Michelle Sheehan. 2009b. On impossible changes and impossible borrowings. Toronto Working Papers in Linguistics 31: 1–17. Biberauer, Theresa and Michelle Sheehan. 2012. Disharmony, antisymmetry, and the final-overfinal constraint. In M. Uribe-Etxebarria and V. Valmala (eds.), Ways of structure building, 206–244. Oxford: Oxford University Press. Biberauer, Theresa and Michelle Sheehan. 2013. Theoretical approaches to disharmonic word order. In T. Biberauer & M. Sheehan (eds.), Theoretical approaches to disharmonic word order, 1–46. Oxford: Oxford University Press. Biberauer, Theresa, Michelle Sheehan and Glenda Newton. 2010. Impossible changes and impossible borrowings. In Anne Breitbarth, Chris Lucas, Sheila Watts and David Willis (eds.), Continuity and change in grammar, 35–60. Amsterdam: John Benjamins. Boškovic, Željko. 2005. On the locality of left branch extraction and the structure of NP. Studia Linguistica 59. 1–45. Bouchard, Denis. 1998. The distribution and interpretation of adjectives in French: A consequence of Bare Phrase Structure. Probus 10. 139–183. Bouchard, Denis. 2002. Adjectives, number, and interfaces: Why languages vary. Amsterdam: Elsevier. Boyd, Virginia Lee. 1997. A phonology and grammar of Mbodómó. University of Texas at Arlington M.A. thesis. Bresnan, Joan. 1973. Syntax of the comparative clause construction in English. Linguistic Inquiry 4. 275–343. Chomsky, Noam. 1957. Syntactic structures. Gravenhage: Mouton. Chomsky, N. 1995. Bare phrase structure. In G. Webelhuth (ed.), Government and binding theory and the minimalist program, 383–439. Oxford: Blackwell. Chomsky, Noam. 2000. Minimalist inquiries. In Roger Martin, David Michael and Juan Uriagereka (eds.), Step by step: Essays on minimalist syntax in honor of Howard Lasnik, 89–155. Cambridge, MA: MIT Press. Chomsky, Noam. 2004. Beyond explanatory adequacy. In Adriana Belletti (ed.), Structures and beyond: The cartography of syntactic structures, Volume 3, 104–131. New York: Oxford University Press. Chomsky, Noam. 2013. Problems of projection. Lingua 130. 33–49. Cinque, Guglielmo. 1999. Adverbs and functional heads: A cross-linguistic perspective. New York; Oxford: Oxford University Press. Cinque, Guglielmo. 2005. Deriving Greenberg’s Universal 20 and its exceptions. Linguistic Inquiry 36 (3). 315–332. Cinque, Guglielmo. 2010. The syntax of adjectives: A comparative study. Cambridge, MA: MIT Press. Collins, Chris. 2002. Eliminating labels. In Samuel David Epstein and T. Daniel Seely (ed.), Derivation and explanation in the minimalist program, 42–64. Malden, MA: Blackwell. Davison, Alice. 2007. Word order, parameters, and the extended COMP projection. In Josef Bayer, Tanmoy Bhattacharya, and M.T Hany Babu (ed.), Linguistic theory and South Asian languages, 175–198. Amsterdam: John Benjamins. De Clercq, Karen, Liliane Haegeman, and Terje Lohndal. 2011. Clause internal scrambling in English and the distribution of PP and DP-adjuncts. Paper presented at the LAGB-conference. University of Manchester, 7–10 September 2011.



A labelling-based account of the Head-Final Filter 

 199

Delsing, Lars. 1993. On attributive adjectives in Scandinavian and other languages. Studia Linguistica 47. 105–125. Dimitrova-Vulchanova, Mila and Giuliana Giusti. 1998. Fragments of Balkan nominal structure. In A. Artemis and C. Wilder (ed.), Possessors, predicates and movement in the determiner phrase, 333–360. Amsterdam: Benjamins. Dixon, Robert M. W. 2004. Adjective classes in typological perspective. In Robert M. W. Dixon and A. Aikhenvald (eds.), Adjective classes. A cross-linguistic typology, 1–49. Oxford: Oxford University Press. Dost, Ascander and Vera Gribanova. 2006. Definiteness marking in the Bulgarian DP. In Donald Baumer, David Montero, and Michael Scanlon (eds.), Proceedings of the 25th West Coast Conference on Formal Linguistics, 132–140.Somerville, MA: Cascadilla Proceedings Project. Dryer, Matthew S. 1992. The Greenbergian word order correlations. Language 68. 81–138. Dryer, Matthew S. 2005a. Position of polar question particles. In Martin Haspelmath, Matthew. S. Dryer, David Gill, and Bernard Comrie (eds.), The world atlas of language structures, 374–377. Oxford: Oxford University Press. [Reissued as Dryer, Matthew S. 2011a. Position of polar question particles. In M. Dryer, M. Haspelmath, D. Gill and B. Comrie (eds.), The world atlas of language structures online. Munich: Max Planck digital library. http://wals. info/chapter/92 (accessed 5 September 2011). Dryer, Matthew S. 2005b. Order of adverbial subordinator and clause. In M. Haspelmath, M. Dryer, D. Gill and B. Comrie (eds.), The world atlas of language structures. Oxford: Oxford University Press, 382–385. [Reissued as Dryer, Matthew S. 2011b. Position of Polar question particles. In M. Dryer, M. Haspelmath, D. Gill and B. Comrie (eds.), The world atlas of language structures online. Munich: Max Planck digital library, http://wals.info/ chapter/94 (accessed 5 September 2011). Dryer, Matthew. 2009. The branching direction theory revisited. In Sergio Scalise, Elisabetta Magni and Antonietta Bisetto (eds.), Universals of language today, 185–207. Berlin: Springer. Embick, David and Rolf Noyer. 2001. Movement operations after syntax. Linguistic Inquiry 32. 555–598. Emonds, Joseph Embley. 1976. A transformational approach to English syntax: Root, structurepreserving, and local transformations. New York: Academic Press. Epstein, Sam, Hisatsugu Kitahara and Daniel Seely. 2012. Structure building that can’t be. In Myriam Uribe-Etxebarria and Vidal Valmala (eds.), Ways of structure building, 253–270. Oxford: Oxford University Press. Fleisher, Nicholas. 2008. A crack at a hard nut: Attributive-adjective modality and infinitival relatives. In Charles B. Chang and Hannah J. Haynie (eds.), Proceedings of the 26th West Coast Conference on Formal Linguistics, 163–171. Somerville, MA: Cascadilla Proceedings Project. Fleisher, Nicholas. 2011. Attributive adjectives, infinitival relatives, and the semantics of inappropriateness. Journal of Linguistics 47. 341–380. Fortuny, Jordi. The emergence of order in syntax. Amsterdam/Philadelphia: John Benjamins. Fox, Danny and Jon Nissenbaum. 1999. Extraposition and scope: A case for overt QR. In WCCFL 18: Proceedings of the 18th West Coast Conference on Formal Linguistics. Fox, Danny and David Pesetsky. 2005. Cyclic linearization of syntactic structure. Theoretical Linguistics 3. 1–46. Giorgi, Alessandra. 1988. La struttura interna dei sintagmi nominali. In L. Renzi (ed.), Grande Grammatica Italiana di Consultazione, 273–314. Bologna: Il Mulino. González Escribano, José Luis. 2004. Head-final effects and the nature of modification. Journal of Linguistics 40. 1–43.

200 

 Michelle Sheehan

González Escribano, José Luis. 2005. Discontinuous APs in English. Linguistics. 563–610. Greenberg, Joseph. 1963. Some universals of grammar with particular reference to the order of meaningful elements. In Joseph Greenberg (ed.), Universals of language, 58–90. Cambridge, MA: MIT Press. Grosu, Alexander and Julia Horvath. 2006. Reply to Bhatt and Pancheva’s “Late Merger of degree clauses”: The irrelevance of (non)conservativity. Linguistic Inquiry 37. 457–483. Grosu, Alexander, Julia Horvath and Helen Trugman. 2007. DegPs as adjuncts and the head final filter. Bucharest Working Papers in Linguistics VIII. Haider, Hubert. 2004. Pre- and postverbal adverbials in OV and VO. Lingua 114. 779–807. Hankamer, Jorge and Line Mikkelsen. 2005. When movement must be blocked: A reply to Embick and Noyer. Linguistic Inquiry 36. 85–125. Haspelmath, Martin. 1998. How young is Standard Average European? Language Sciences 20. 271–87. Hawkins, John. 1990. A parsing theory of word order universals. Linguistic Inquiry 21. 223–261. Hawkins, John. 2004. Efficiency and complexity in grammars. Oxford: Oxford University Press. Hicks, Glyn. 2009. Tough-constructions and their derivation. Linguistic Inquiry 40. 535–566. Hoekstra, Teun. 1999. Parallels between nominal and verbal projections. In David Adger et al. (eds.), Specifiers. Oxford: Oxford University Press. Holmberg, Anders. 2000. Deriving OV order in Finnish. In P. Svenonius (ed.), The derivation of OV and VO, 123–152. Amsterdam: John Benjamins,. Holmberg, Anders. 2003. Yes/no questions and the relation between tense and polarity in English and Finnish. Linguistic Variation Yearbook 3. 45–70. Hornstein, Norbert. 2009. A theory of syntax: Minimal operations and universal grammar. Cambridge: Cambridge University Press. Huang, C.-T. J. 1982. Logical relations in Chinese and the theory of grammar. MIT Ph.D. dissertation. Jackendoff, Ray. 1977. X-bar syntax: A study of phrase structure. Cambridge, MA: MIT. Kayne, Richard S. 1994. The antisymmetry of syntax. Cambridge, MA: MIT Press. Kayne, Richard S. 2000. A note on prepositions, complementizers and word order universals. In R. Kayne, Parameters and universals, 314–326. New York: Oxford University Press. Kayne, Richard. S. 2013. Why are there no directionality parameters? In T. Biberauer and M. Sheehan (eds.), Theoretical approaches to disharmonic word order, 219–244. Oxford: Oxford University Press. Kennedy, Christopher. 1999. Projecting the adjective: The syntax and semantics of gradability and comparison. New York/London: Garland. Kornfilt, Jaklin. 1997. Turkish. London: Routledge. Kural, M. 1997. Postverbal constituents in Turkish and the linear correspondence axiom. Linguistic Inquiry, 28 (3). 498–519. Laka, Itziar. 1994. On the syntax of negation. New York: Garland. Larson, Richard. 1998. Events and modification in nominals. In D. Strolovitch and A. Lawson (eds.), Proceedings from Semantics and Linguistic Theory (SALT) VIII, 1–27. Ithaca, NY: Cornell University. Larson, Richard S. and Franc Marušič. 2004. On indefinite pronoun structures with APs: Reply to Kishimoto. Linguistic Inquiry 35. 268–287. Larson, Richard and Naoko Takahashi. 2007. Order and interpretation in prenominal relative clauses. In M Kelepir and B Öztürk (eds.), Proceedings of the Workshop on Altaic Formal Linguistics II, 101–120. Cambridge, MA: MIT Working Papers in Linguistics.



A labelling-based account of the Head-Final Filter 

 201

Longobardi, Giuseppi. 1994. Reference and proper names. Linguistic Inquiry 25:609–665. Lechner, Winfried. 1999. Comparatives and DP-structure. University of Massachusetts Ph.D. dissertation. Leung, Cheong (Alex) and William van der Wurff. 2012. Language function and language change – a principle and a case study. Newcastle University unpublished manuscript. Luján, Marta. 1973. Pre- and postnominal adjectives in Spanish. Kritikon Litterarum 2. 398–408. Moro, Andrea. 2000. Dynamic antisymmetry. Cambridge, MA/London: MIT Press. Nanni, Deborah L. 1980. On the surface syntax of constructions with easy-type adjectives. Language 56. 568–581. Noonan, Michael. 1992. A grammar of Lango. Berlin: Mouton de Gruyter. Nunes, Jairo. 2004. Linearization of chains and sideward movement. Cambridge, MA: MIT Press. Nunes, Jairo and Juan Uriagereka. 2000. Cyclicity and extraction domains, Syntax, 3 (1). 20–43. O‘Flynn, Kathleen. 2008. Tough nuts to crack. University of California manuscript. O’Flynn, Kathleen. 2009. Prenominal adjective-complement constructions in English. University of California manuscript. Oseki, Yohei. 2014. Eliminating pair-merge. To appear in Proceedings of the 32th West Coast Conference on Formal Linguistics. Pereltsvaig, Asya. 2007. The universality of DP: A view from Russian. Studia Linguistica 61. 59–94. Platzack, Christer. 1982. Transitive adjectives in Old and Modern Swedish. In A. Ahlqvist (ed.), Papers from the 5th International Conference on Historical Linguistics, 273–282. Amsterdam: Benjamins. Reichard, Ulrich. To appear. Inference and grammar: intersectivity, subsectivity and phases. Paper presented at Proceedings of the Irish Network in Formal Linguistics Conference 2011. Richards, M. 2004. Object shift and scrambling in North and West Germanic: A case study in symmetrical syntax. Cambridge University Ph.D. dissertation. van Riemsdijk, H. 2001. A far from simple matter: syntactic reflexes of syntax-pragmatics misalignments. In I. Kenesei, and R.M. Harnish (eds.), Semantics, pragmatics and discourse. Perspectives and connections. A Festschrift for Ferenc Kiefer, 21–41. Amsterdam: John Benjamins. Rizzi, Luigi. 1997. The fine structure of the left periphery. In L. Haegeman (ed.), Elements of Grammar, 281–337. Dordrecht: Kluwer. Rizzi, Luigi. 2001. On the position ‘Int(errogative)’ in the left periphery of the clause. In G. Cinque and G. Salvi (eds.), Current studies in Italian syntax, 287–337. Amsterdam: Elsevier. Roeper, Thomas and Muffy Siegel. 1978. A lexical transformation for verbal compounds. Linguistic Inquiry 9. 199–260. Rutkowski, Paweł. 2002. Noun/pronoun asymmetries: evidence in support of the DP hypothesis in Polish. Jezikoslovlje 3. 159–170. Rutkowski, Paweł. 2007. The syntactic properties and diachronic development of postnominal adjectives in Polish. Paper presented at Formal Approaches to Slavic Linguistics: The Toronto Meeting 2006, Ann Arbor. Rutkowski, Paweł and Ljiljana Progovac. 2005. Classification projection in Polish and Serbian: The position and shape of classifying adjectives. In M. Tasseva-Kurktchieva, S. Franks, and F. Gladney (eds.), Formal Approaches to Slavic Linguistics 13: The Columbia Meeting 2004, 289–299. Ann Arbor: Michigan Slavic Publications. Sadler, Louisa and Douglas J. Arnold. 1994. Prenominal adjectives and the phrasal/lexical distinction. Journal of Linguistics 30. 187–226. Samiian, Vida. 1994. The ezafe construction: some implications for the theory of x-bar syntax. In M. Marashi (ed.), Persian studies in North America. Bethesda, MD: Iranbooks.

202 

 Michelle Sheehan

Schütze, Carson. 1995. PP attachment and agreement. MIT Working Papers in Linguistics 26 (Papers on Language Processing and Acquisition). 95–152. Seely, Daniel. 2006. Merge, derivational c-command, and subcategorization. In C. Boeckx (eds.), Minimalist essays, 182–217. Amsterdam: John Benjamins. Sheehan, Michelle. 2009. Labelling, multiple spell-out and the final-over-final constraint. In V. Moscati and E. Servidio (eds.), Proceedings XXXV Incontro di Grammatica Generativa, 231–243. Siena: STiL – Studies in Linguistics. Sheehan, Michelle. 2010. Extraposition and antisymmetry. In Jeroen Van Craenenbroeck (ed.), Linguistic Variation Yearbook, 203–225. Amsterdam: John Benjamins. Sheehan, Michelle. 2013a. Explaining the final-over-final constraint. In Theresa Biberauer and Michelle Sheehan (eds.), Theoretical approaches to disharmonic word orders, 407–444. Sheehan, Michelle. 2013b. Some implications of a copy theory of labeling. Syntax 16 (4). 362–396. Sheehan, Michelle 2013c. 2013d. The resuscitation of CED. In S. Kan, C. Moore-Cantwell & R. Staubs (eds.), Proceedings of the 40th Annual Meeting of the North East Linguistic Society (NELS 40), 135–150. Amherst, MA: GLSA. Sheehan, Michelle, Theresa Biberauer, Anders Holmberg and Ian Roberts. Forthcoming. The Final-over-Final Condition. Cambridge, MA: MIT Press. Siegel, Muffy E. A. 1976. Capturing the adjective. Amherst, MA: University of Massachusetts Ph.D. dissertation. Siewierska, Anna and Ludmila Uhlířová. 1998. An overview of word order in Slavic languages. In A. Siewierska (ed.), Constituent order in the languages of Europe, 105–149. Berlin: Mouton de Gruyter. Sproat, Richard and Shih Chinlin. 1988. Prenominal adjectival ordering in English and Mandarin. In Jim Blevins and J. Carter (eds.), Proceedings of NELS 18, 465–489. Amherst: GSLA. Sproat, Richard and Shih Chinlin. 1990. The cross-linguistic distribution of adjectival ordering restrictions. In C. Georgopoulos and R. Ishihara (eds.), Interdisciplinary approaches to language: Essays in honor of S-Y. Kuroda, 565–593. Dordrecht: Kluwer. Svenonius, Peter. 1994. On the structural location of the attributive adjective. In E. Duncan, D. Farkas and P. Spaelti (eds.), Proceedings of the Twelfth West Coast Conference on Formal Linguistics (WCCFL), 439–454. Tallerman, Maggie. 1998. Understanding syntax. London: Arnold. Tasseva-Kurktchieva, Mila. 2005. The possessor that came home. In J.-Y.Kim, Y.A.Lander and B.H. Partee (eds.), Possessives and beyond: Semantics and syntax (University of Massachusetts Occasional Papers in Linguistics 29), 279–293 Amherst: GSLA Publications. Tat, Deniz. 2010. APs as reduced relatives: the case of bir in (some) varieties of Turkic. Proceedings of WAFL 7. 301–315. Uriagereka, Juan. 1999. Multiple spell-out. In Samuel David Epstein and Norbert Hornstein (eds.), Working minimalism, 251–282. Cambridge, MA: MIT Press. Williams, Edwin. 1982. Another argument that passive is transformational. Linguistic Inquiry 13. 160–163. Wunderlich, Dieter. 2001. Two comparatives. In István Kenesei and Robert M. Harnish (eds.), Perspectives on semantics, pragmatics and discourse. A festschrift for Ferenc Kiefer. Amsterdam: John Benjamins. Zwart, Jan-Wouter. 1996. On the status and position of PPs inside APs in Dutch. University of Groningen unpublished manuscript: Zwart, Jan-Wouter. 2011. Structure and order: asymmetric merge. In Cedric Boeckx (ed.), The Oxford handbook of linguistic minimalism, 96–118. Oxford: Oxford University Press.

Artemis Alexiadou and Terje Lohndal

The structural configurations of root categorization Abstract: This paper discusses the syntax of roots by way of looking at four different views discussed in the literature: (i) roots merged as complements, (ii) roots merged as modifiers, (iii) roots merged either as complements or as modifiers, and (iv) roots merged in a unique way. A lot of time is devoted to (i) and to demonstrate empirical challenges from both the nominal and the verbal domain. With the exception of (iii), the other views entail that roots are introduced into the structure in a uniform way, which is argued to be advantageous. The four different views are all discussed in detail, though the paper argues that it is hard to find solid evidence that solves the question of how roots are introduced into (morpho-)syntactic representations.

1 Introduction Researchers within Distributed Morphology claim that the units of word formation consist of roots, which combine with elements of the functional vocabulary to form larger units. These roots are acategorial and need to be categorized by functional elements (Marantz 1997, Alexiadou 2009, Arad 2003, 2005, Embick and Marantz 2008, Embick 2010, De Belder 2011, 2013 and others).1 An example is provided in (1) where the root is either categorized as a verb (1a) or as a noun (1b). (1) a. [v v √ROOT ] b. [n n √ROOT ] Essentially, word formation is syntactic and there are atomic, non-decomposable elements called roots (see also Pesetsky 1995, Borer 2005a, b, 2013). An important question concerns the phrase structural configuration in which a root is inserted. Are roots complements, modifiers, both, or do they enter into a unique relationship unlike all other constituents? In this paper, we will discuss and evaluate these options.

1 The idea that roots are the basic units of word formation is also adopted by exoskeletal approaches (e.g. Borer 2005a, b, 2013, Grimstad, Lohndal & Åfarli 2014, Lohndal 2014), though Borer explicitly rejects the categorization assumption adopted in Distributed Morphology. DOI 10.1515/9781501502118-009

204 

 Artemis Alexiadou and Terje Lohndal

The structure is as follows. In section 2, we will outline the major issues concerning the locus of where and how a root is merged into a syntactic structure. Section 3 is a lengthy discussion of the view that roots can serve as complements, including an extensive critique of that idea. Section 4 presents the proposal that roots are adjoined to their categorizing head, whereas section 5 introduces a compromise suggestion: Roots can either serve as complements or be adjoined. In section 6, we present a view according to which roots are special in the sense that they have a distinguished phrase-structural position. Section 7 offers a discussion of the various alternatives that we have presented and which alternative we favor. Lastly, we summarize the paper in section 8.

2 The locus of root-merger A common assumption within the Minimalist Program (Chomsky 1995) is that the syntactic operation Merge combines two syntactic objects α and β into the unordered set {α, β}.2 Typically one of the objects projects and provides the label for the phrasal element (Chomsky 1995). (2) a. {α, {α, β}} b. {β, {α, β}} As this example demonstrates, either α or β projects; there are no phrase structure constraints on which element can project. The element that projects will become the head of the given phrase, but both headedness and phrasal status are configurational properties: α or β becomes a head or a phrase by virtue of their syntactic positions (Chomsky 1995). Let us briefly consider one implementation of this general idea. Chomsky (1995: 245) argues that “we understand a terminal element LI to be an item selected from the numeration, with no parts (other than features) relevant to Chl. A category Xmin is a terminal element, with no categorical parts. We restrict the term head to terminal elements”. The following definitions of maximal and minimal projections are provided. (3) a. Maximal projection: A maximal projection is a syntactic object that does not project any further. b. Minimal projection: A minimal projection is a lexical item selected from the numeration.

2 See Zwart (2009a, 2009b, 2011) for an alternative in which the set is ordered.

The structural configurations of root categorization 



 205

This implies “[…] that an item can be both an X0 and an XP” (Chomsky 1995: 249), of which Chomsky mentions clitics as an example. As mentioned in the introduction, frameworks that adopts roots, such as Distributed Morphology (henceforth, DM), assume that roots are categorized syntactically. Examples of the three major lexical categories are provided in (4). (4) a.

vP v

√ROOT

b.

nP n

√ROOT

c.

aP a

√ROOT

Importantly, in this theory, roots are category neutral. They enter the syntactic derivation without a category and are only categorized by combining with category-defining functional heads/labels. When roots merge with a categorizer, the categorizer seems to project.3 This accounts for the fact that during the rest of the derivation, the root is categorized as a particular category. Put differently, the configuration in (5a) is rather common whereas (5b) does not seem to obtain. (5) a. [v v √ROOT ] b. [√ROOT v √ROOT ] An implicit assumption in (5) is that the categorizer is always a head. That raises the question: What blocks the structure in (5b) and should it be blocked? Despite the wide adoption of roots, the phrase-structure domains of categorization still have not been properly defined. For discussion, see Harley (2005b), Embick (2010), De Belder and van Craenenbroeck (2015), Anagnostopoulou & Samioti (2014), and to some extent Acquaviva (2009). Various views have been pursued with respect to the question of how roots are merged in morphosyntactic representations. We identify the following alternatives discussed in the literature: (6) a. Roots are merged as complements of v (e.g., Bobaljik and Harley 2013, Harley 2014 and literature cited there). b. Roots are merged as v modifiers (Marantz 2013). c. Some roots are merged as v’s modifier while others as v’s complement (Embick 2004, 2010, Alexiadou & Anagnostopoulou 2013).

3 Naturally, one can assume that some roots must not project, see Bauke & Roeper (this volume) for discussion. We will not consider this option here, and refer the reader to the above contribution for detailed discussion.

206 

 Artemis Alexiadou and Terje Lohndal

d. R  oots are inserted post-syntactically, thus they cannot take any complements or modify v (de Belder and van Craenenbroeck 2015). In the following sections, we will discuss these four alternatives in the order in which they appear in (6).

3 Roots as complements In a lot of analyses (see e.g., Embick 2010 and Harley 2014 for discussion), it is often assumed that roots can take complements. A verbal structure would for example have the following structure. (7)

vP v √ROOT

√P DP

This is a way of re-casting Kratzer’s (1996) claim that the internal argument is much closer to the verb than the external argument. More recently, Harley (2014) presents several arguments in favor of this claim. We will discuss them based on Alexiadou (2014), which in turn was based on an earlier incarnation of the present paper. The first concerns one-replacement (3.1), the second verb-object idioms (3.2), and the third root suppletion in Hiaki (3.3). Then we will turn to nominalizations and argue that the relevant data provide no argument in favor of roots being able to take complements (3.4). Then we summarize this long section (3.5).

3.1 One-replacement As has been known since Jackendoff (1977), arguments that are selected cannot be stranded under one-replacement whereas nominal adjuncts allow for stranding. (8) a. *the student of chemistry and the one of physics. b. the student with long hair and the one with short hair Jackendoff’s analysis was that one targets N’, an analysis that cannot be implemented in Bare Phrase Structure (Speas 1990, Chomsky 1995, Lohndal 2012). Harley (2005b) illustrates how this phenomenon can be captured without generating problems for Bare Phrase Structure, given that Bare Phrase Structure does not allow for rules to target bar-level projections.

The structural configurations of root categorization 



 207

Harley makes the following assumptions: i) verbal study and nominal student both share the same root stud-, ii) the argument is a sister of the root itself, iii) nominalization involves the addition of n (-ent), iv) adjunct PPs adjoin to nP, not to √P. This allows her to characterize one-replacement as follows: one is an nP anaphor, not a √P anaphor. Harley gives the following structures for student of chemistry and student with long hair. The root head moves to the n head. (9) the student of chemistry DP D the

nP n -ent

√P √

(of)

√STUD



DP

chemistry

(10) the student with long hair DP D the

nP nP

n

-ent

PP √P

P

√STUD with

DP long hair

Harley sketches an alternative analysis briefly outlined in note 22, following a suggestion by a reviewer. We will elaborate on this suggestion here. If we assume, as is done in much of the recent literature (see Alexiadou, Haegeman and Stavrou 2007 for discussion), that the nominal phrase contains a rich array of functional projections, then one replacement can target a layer higher than nP. Specifically, we can assume, following Borer (2005a) and Alexiadou & Gengel (2011), that one targets the projection CLP, the Classifier Phrase. In Borer’s (2005a) framework, the following two functional projections between the DP and the nP are assumed: (i) the quantity phrase (#P in her system; similar but not equal to NumberP) and the Classifier Phrase (CLP in her system). All nouns enter the derivation as mass, and become count in the syntax, via CLP.

208 

 Artemis Alexiadou and Terje Lohndal

DP

(11) D

#P [ = quantity, hosts numerals/quantifiers] CLP [ = division/classification/unit] nP

In English, CLP is realized either by the plural marker or the numeral one. For Borer, one is a portmanteau divider/counter, while all other cardinal numerals are solely counters. These two functional projections have different functions. The #P denotes quantity, and the CLP introduces division, i.e. the function of dividing something. Moreover, the classifier has an individuating function. ClassP is the input to quantity, i.e. if something is divided, it can be counted. This opens up the possibility that one-replacement targets CLP, and that the adjunct is located above nP, presumably adjoined to CLP. Harley (2014) highlights in her footnote the fact that the selectional restrictions are preserved. However, this has nothing to do with one-replacement.4 Student is a deverbal nominal, hence nP embeds a verbal structure, and in this the introduction of the internal argument takes place pretty much as described above, see Alexiadou (2009), and Borer (2013): (12) [nP [vP theme [ Root ]]]

3.2 The verb and the direct object Kratzer (1996) argues that the verb takes a complement but that the subject is severed from the verb both syntactically and semantically (see also Schein 1993). She relies on an argument from Marantz (1984), showing that the verb-object interpretation often receive special/idiomatic meanings, this happens rarely

4 An anonymous reviewer points out that one-replacement must be allowed to cut inside a ­recursive AP as well as substitute for an entire AP: (i) a. John has a big blue car and Bill has a small one = small blue car b. John has a big blue car and Bill has a red one = a red car We think that such data provide evidence for the view that one replacement targets something higher than the nP. Assuming that adjectives are inserted as adjuncts/specifiers of projections between DP and nP, see Cinque (1994), Alexiadou, Haegeman & Stavrou (2007) for further references, would be a possible way to explore the size of one-replacement.



The structural configurations of root categorization 

 209

with subject-verb combinations composing with an object (see Nunberg, Sag and Wasow 1994, Horvath and Siloni 2002, Harley and Stone 2013 for much ­discussion). (13) a. throw a baseball b. throw support behind a candidate c. throw a boxing match (i.e., take a dive) d. throw a fit (14) a. take a book from the shelf b. take a bus to New York c. take a nap d. take an aspirin for a cold e. take a letter in shorthand (15) a. kill a cockroach b. kill a conversation c. kill an evening watching TV d. kill a bottle (i.e., empty it) e. kill an audience (i.e., wow them) Kratzer argues that the external verb should be severed from the verb’s lexical representation. This will ensure that the external verb cannot yield a special interpretation. That is, instead of (16a) we have (16b). (16) a. λy.λx.λe [buying(e) & Theme(x)(e) & Agent(y)(e)] b. λx.λe [buying(e) & Theme(x)(e)] Since Kratzer‘s paper, there has been a lot of work on the syntax of external arguments (see, e.g., Hale and Keyser 1993, 2002, Harley 1995, Marantz 1997, Borer 2005a, b, Alexiadou, Anagnostopoulou, and Schäfer 2006, 2015, Folli and Harley 2005, 2007, Jeong 2007, Pylkkänen 2008, Ramchand 2008, Schäfer 2008, 2012, Merchant 2013, Lohndal 2014). However, Kratzer’s argument only goes through if the specification of the verb’s meaning only refers to the internal argument, and furthermore, if idiomatic dependencies like these can be captured by defining the meaning of the verb. Kratzer discusses the first premise but not the second. She seems to assume that idiomatic dependencies must be specified over objects in the lexicon, that is, over the verb and its Theme. For reasons of space, we won’t discuss these issues here but refer among others to Marantz (1997), Lohndal (2012, 2014), and Anagnostopoulou and Samioti (2014) who all argue that idiomatic dependencies can be defined over outputs of syntax. In addition, the literature contains various arguments for severing the Theme as well. Lohndal (2014) surveys the following arguments.

210  (17)

 Artemis Alexiadou and Terje Lohndal

a. The semantics of reciprocals such as each other (Schein 2003) b. Adjectival passives (Borer 2005b) c. Focus and Neo-Davidsonian representations (Herburger 2000) d. Measuring-out Themes (Tenny 1987, 1994; see Borer 2005b) e. Variable adicities (Borer 2005a, b)

Borer (2005) presents a syntax which completely severs all arguments (see also Lin 2001, Marantz 2005, Bowers 2010, Lohndal 2014). Generalizing the labels somewhat, we get the following structure (Lohndal 2014; see also Alexiadou, Anagnostopoulou and Schäfer 2015). (18)

Voice ext arg

Voice Voice

F int arg

F F

v v

√ROOT

v Other alternatives are possible as well (see e.g., Bowers 2010). On this theory, roots are not able to take complements. Rather, the root is merged at the bottom of the structure, and all arguments are merged after the root has been merged. Thus the root does not contribute to building the structure; the structure is built independently of the root. But by inserting the root at the bottom, the root is inserted in a verbal structure as opposed to a nominal or adjectival structure, and the F projection takes the v as its complement, not an n or a.

3.3 Suppletion in Hiaki Bobaljik and Harley (2013) and Harley (2014) present an argument in favor of roots being able to take complements. The evidence relies on suppletion and is based on data from the Uto-Aztecan language Hiaki. Suppletion phenomena constitute an area of controversy within Distributed Morphology. The reason is that it is not clear whether suppletion applies to roots



The structural configurations of root categorization 

 211

as well as the functional vocabulary; see Embick & Halle (2005). Importantly, however, suppletion is subject to locality effects. The Hiaki data present a puzzle for locality and seem, at first sight, to suggest that suppletion applies both to roots and functional material. Bobaljik (2012) argues that insertion of suppletive vocabulary items can only be sensitive to features within the same maximal projection, not across a maximal projection boundary. (19) Locality: α may condition β in (a), not (b): a. α …]X0 … β b. * α …]XP … β Data from Hiaki threaten this generalization. Let us consider the relevant examples. The number of a subject DP can trigger suppletion in a certain class of intransitive verbs (data from Bobaljik and Harley 2013). (20) a. Aapo vuite. 3.sg run.sg ‘S/he is running.’ b. Vempo tenne. 3.pl run.pl ‘They are running.’ (21) a. Aapo weye. 3.sg walk.sg ‘S/he is walking.’ b. Vempo kate 3.pl walk.pl ‘They are walking.’ With transitive suppletive verbs, it is the object that triggers suppletion. (22) a. Aapo/Vempo uka koowi-ta 3.sg/pl the.sg pig-acc.sg ‘He/They killed the pig.’

me’a-k. kill.sg-prf

b. Aapo/Vempo ume kowi-m sua-k. 3.sg/pl the.sg pig-pl kill.pl-prf ‘He/They killed the pigs.’ If this is true subject-verb agreement, this would be a problem for Bobaljik’s generalization given that the external argument is merged in a separate projection, as seen above.

212 

 Artemis Alexiadou and Terje Lohndal

Bobaljik and Harley argue that a plural object DP is base-generated as a sister to the verb root. If the root of the selecting verb is a suppletive root, the two forms will compete for insertion. The object is local enough to condition suppletion. √P

(23) DP+pl

√KILL sua kill.pl.obj

ume toto’im the.pl chickens

*mea kill.sg.obj

If intransitive verbs are unergative, they would constitute a counter-example to (19). (24)

VoiceP Voice’

DP+pl ume toto’im the.pl chickens

√P

√tenne run.pl.subj

Voice Ø

Bobaljik and Harley argue that the intransitive verbs in question indeed are unaccusative. That would pose no problem for (19) as long as roots can take complements as in (23). We assume that their arguments in favor of the unaccusative status are sound. (19) allows for a specifier within the same projection of a head to condition suppletion. Bobaljik and Harley argue that the definition should be strengthened in order to block specifiers from conditioning suppletion. They suggest (25). (25) Locality: α may condition β in (a), not (b): a. α …]X0 … β b. * α …]Xn … β, where n > 0. They offer the following motivation (p. 11): Excluding specifiers from the locality domain of the root also renders moot the ­possibility that head movement may extend locality domains. Hypothetically, if the root were to undergo head movement to Voice0 in [(24)], then – in its derived position – the root would no longer be separated from the e­ xternal argument by a maximal projection. If not plugged,

The structural configurations of root categorization 



 213

this could be construed as a loophole that threatens to unravel our account of internalexternal ­argument asymmetries, at least where head movement is involved.

Bobaljik and Harley are essentially postulating a sisterhood condition on suppletion. The root and its conditioning DP have to be sisters at a given point in the derivation. However, there are other technical ways of getting around this conclusion. The derivational theory in Lohndal (2014) achieves this result by way of head movement. Lohndal adopts the structure in (26) from Borer (2005a, b), repeated here for expository convenience. Voice

(26) ext arg

Voice Voice

F int arg

F F

v v

√ROOT

v However, he argues that this structure does not exist as a representational object at any point in the derivation. Rather, a head merges with a phrase, whose phrase can then merge with yet a different head. What is not allowed, is for a phrase (e.g., what is depicted as specifiers in (26)) to merge with another phrase. Put differently, at the point in the derivation where the internal argument is to be merged with the FP (consisting of F and its sister vP), a constraint blocking XP-YP merger demands that the complement (YP) of the head is spelled out so that the head can merge again with a new phrase (XP). Put simply, a head can only merge with a non-head; two phrases can never merge.5

5 See Narita (2014) for a related approach which also bans XP-YP merger.

214 

 Artemis Alexiadou and Terje Lohndal

Let us illustrate how this system would apply to the present case. In (27), a copy of the root moves into F (and then into Voice). Various technical implementations are possible; here we assume that the root and F create a complex head.6 (27) a.

F F

b.

→ spell-out v(P), move √ROOT to F v(P)

F F

c.

√ROOT

F DP

F F

√ROOT

This assumes that the root is “close enough” in (27c) for the DP to condition suppletion. Since F and √ROOT constitute a complex head, suppletion is possible.7 In this system, Bobaljik’s (2012) generalization can be maintained, even though the root does not take a complement.

3.4 No complements of roots appear in a nominal environment In the area of nominalizations (see Alexiadou 2010a, b for general discussion), it has been argued that the Theme can be the complement of the root, viz. Alexiadou (2001) and Embick (2009, 2010). If it is true that roots cannot appear with

6 The structure and derivation are motivated at length in Lohndal (2014), drawing on a range of different data and theoretical arguments. It is impossible to do justice to these arguments in the present paper; see Lohndal’s book. 7 There is also the issue of adjacency, in terms of overt string adjacency (Radkevich 2010). (26b) would fulfill that since F is empty. See Embick (2010) and Bobaljik (2012) and the literature cited there for extensive discussion of the importance of adjacency.



The structural configurations of root categorization 

 215

complements, these analyses have to be revised. We argue that the Theme is introduced as the specifier of a functional projection, in line with e.g. Lin (2001), Borer (2005a, b, 2013), Marantz (2005). Grimshaw (1990) argues in detail that de-verbal nouns do not form a homogeneous class.8 They are claimed to be ambiguous between a reading that supports argument structure (AS nominals (ASNs)), and a result/referential (R)-reading that does not. (28a) instantiates the AS-interpretation of the nominal, while (28b) instantiates the R one. (28) a. the examination of the patients took a long time b. the examination was on the table

AS R

Tab. 1 summarizes the criteria Grimshaw introduced to distinguish between the two types of nominals in English (Alexiadou 2009, Borer 2013): Tab. 1: R-nominals versus Argument Supporting Nominals. R Nominals

ASNs

1. Non-q-assigner, No obligatory arguments 2. No event reading 3. No agent-oriented modifiers 4. Subjects are possessives 5. by phrases are non-arguments 6. No implicit argument control 7. No aspectual modifiers 8. Modifiers like frequent, constant only with plural 9. May be indefinite 10. May be plural

q-assigners, Obligatory arguments Event reading. Agent-oriented modifiers subjects are arguments by phrases are arguments Implicit argument control Aspectual modifiers. Modifiers like frequent, constant appear with singular Must be definite Must be singular

Alexiadou & Grimshaw (2008) discuss the following observations made in the rich literature on nominalizations: (1) Only nouns which are related to corresponding verbs have argument structure. This means that being associated with an event structure/argument structure is not a property of nouns per se.

8 Note that Grimshaw actually distinguishes between three classes of nominals: (i) complex event nominals that license AS, (ii) event nominals that do not license AS but still have an eventive interpretation and (iii) result nominals that do not license AS and lack an eventive interpretation.

216 

 Artemis Alexiadou and Terje Lohndal

(2) Nouns which are identical in form to verbs do not generally behave like ASNs, i.e. they are rigidly different from verbs (recall, offer, report, see (29)).9 (29) *The frequent report of looting (3) –ing nominals are always ASNs.10 (4) –(a)tion and –ment nominals are frequently ambiguous between ASN and non ASN readings. On the basis of the above, they propose the following generalization:11 (30) Only nouns derived from verbs can have argument structure.

9 Newmeyer (2009) reports several zero derived nominals that appear to contradict this claim: (i) Maria’s metamorphosis of the house See Borer (2013) for arguments that most of his cases do not behave as ASNs. 10 An anonymous reviewer raises the very interesting question of how the formation of idioms relates to this discussion. For instance, the breaking of bread with our friends is ok, while *the playing of ball with friends is not. A detailed discussion of this problem will take us too far afield, but we would like to point out that we take an idiomatic interpretation to be excluded when Voice is present in the structure, see Anagnostopoulou & Samioti (2014) for arguments from the domain of participles. Alexiadou & Iordachioaia (2015) discuss cases like the ones mentioned by the reviewer and argue that this suggests that the structures that allow for an idiomatic interpretation exclude Voice. 11 We take (30) to be understood as referring to event participants. Importantly, (30) raises the question what happens with de-adjectival nominalizations, which seem to license arguments in the absence of a verbal source. Roy (2010) shows, however, that only predicative (intersective) adjectives derive (suffix-based) nominalizations: (i) a. the poor child i. the pitiful child (non-intersective) ii. the moneyless child (intersective) b. This child is poor. i. #This child is pitiful. ii.  This child is moneyless. c. the poverty of the child ≠ the pitifulness of the child In (ic), Roy takes of the child to be the argument of the nominalization, since the adjectival property is predicated of the child: (ii) the poverty of the child =>  the child is poor We follow Roy (2010) and assume that on top of the aP, the nominalization also includes a PredP (see also Bowers 1993) which hosts the argument. Similar issues arise for of phrases with relational/body part nouns, e.g. the father of John and picture nouns, e.g. the picture of John. We assume that these nouns are not ASNs of the type defined in Grimshaw, i.e. they lack an event structure. We assume that in the case of picture nouns, the possessor is introduced in Spec, nP thus the interpretation of the possessor is rather free. In the case of relational/body part nouns the possessor is introduced as a complement of the noun, see Alexiadou, Haegeman & Stavrou (2007) for extensive discussion.



The structural configurations of root categorization 

 217

Crucially, then, this means that only those nouns that have a verbal source can appear together with their arguments. Having shown that the arguments of verbs are introduced by functional layers, this leads to the suggestion that the nominal structure embeds a verbal structure that contains the verb’s arguments. We assume, as we did in the previous sections, that the root is introduced at the bottom of the structure, adjoined to a categorizer, e.g., as in (31). This creates a verbal environment, which can then undergo nominalization, see (32b) (the structures are simplified for expository convenience). (32a), the nominal s­ tructure that lacks a verbal categorizer, corresponds to the representation of R-nominals in tab. 1. (31) [FP DPtheme [F F [v v √ROOT] We assume that the various affixes such as –ation, and -ing realize n in (32): (32) a. [FP DPtheme [F F [n n √ROOT] b. [ n [(Voice) [FP DPtheme [F F [v v √ROOT]]]

non- ASNSs ASNs

There is evidence that ASNs may contain F/v, see Alexiadou, Iordachioia, Cano, Martin & Schäfer (2014), whose discussion is briefly summarized below: A first piece of evidence in favor of this is the presence of overt verbalizers in English, such as -ify, -ize etc. to which the nominalizing morphology attaches, e.g. modifymodification, hospitalize-hospitalization. Second, data involving re-prefixation also points to the presence of v.  Wechsler (1990) and Marantz (2007) argue that the prefix re- only has a restitutive interpretation and it only attaches to accomplishments, meaning that it needs a state ­component to modify (von Stechow 1996), i.e. it requires a result state of the type present in (32b).12 (33) a. the re-verification of the diagnosis b. a re-justification of former notations Thirdly, modification by gradual also points to the presence of v (Borer 2013): (34) a. Kim’s (gradual) formulation of several procedures {twice/in two weeks} b. Pat’s (gradual) formation of many committees {twice/in two minutes} c. Robin’s (gradual) dissolution of these chemicals {twice/in two hours}

12 As an anonymous rewiever points out, the data involving re-prefixation are more complex, and this is acknowledged in recent work by Lechner, Spathas, Alexiadou & Anagnostopoulou (2015). For reasons of space we cannot enter a detailed discussion of this complexity. Still, our point is that whatever holds in the verbal domain with respect to re-prefixation also holds in the nominal domain.

218 

 Artemis Alexiadou and Terje Lohndal

Piñon (2000) argues that gradually (and thus, presumably, also gradual with deverbal nouns) involves a change: whatever happens gradually should in fact happen and not just hold. This modifier is sensitive to the type of change characteristic of accomplishments, which involves the presence of v/F in our system. Naturally, the structure of ASNs may be richer, in that it can contain further functional layers associated with verbal clauses. For instance, certain ASNs may contain Voice, Borer (2013), Alexiadou (2001, 2009), Alexiadou, Iordachioia, Cano, Martin & Schäfer (2014). Evidence for this comes from the following ­observations:

a) they allow by-phrases:

(35) the destroying of the city by the enemy

b) As has been discussed in the literature, ing-of gerunds (36b) pattern with the verbal passive in (36a) in excluding a self-action interpretation, the standard diagnostic for VoiceP in Kratzer (1996, 2003), building on Baker, Johnson & Roberts (1989). By contrast, derived nominals in (36c) allow a self-action interpretation indicating the lack of VoiceP.13

(36) a. The children were being registered. i. *Th = Ag: The children registered themselves ii.  Th ≠ Ag: The children were registered by someone b. The report mentioned the painfully slow registering of the children. Th ≠ Ag / *Th = Ag c. The report mentioned the painfully slow registration of the children. Th ≠ Ag / Th = Ag

13 We agree with an anonymous reviewer that the derived nominal in (c) allows both for a selfaction interpretation and an interpretation in which an external argument was responsible for the registration. The reviewer points out that a self-action interpretation has been identified for the verbal passive as well. To the extent that this holds, we might be able to attribute this to the ambiguity between the verbal and adjectival passive in English. The latter behaves as derived nominals do; see Alexiadou, Anagnostopoulou & Schäfer (2015) for discussion of differences between verbal and adjectival passives and further references, see Alexiadou, Iordachioaia, Cano, Martin & Schäfer (2014) for discussion of external arguments in nominalizations.



The structural configurations of root categorization 

 219

3.5 Summary In this section, we have reviewed four different arguments in favor of roots being able to take complements. Three of them involve the verbal domain, one involves the nominal domain. In all four cases, we have argued that the arguments are not conclusive and that the data are open to alternative implementations that do not require roots to take complements.

4 Roots as adjuncts to the categorizing head Let us return to (5) from section 2, here repeated as (37). (37) a. [v v √ROOT ] b. [√ROOT v √ROOT ] We want to ensure that the grammar does not generate a structure where the root serves as the label of the categorized constituent. Technically, the only way to ensure that we never get (37b) is to propose that roots merge always as modifiers of/adjuncts to their categorizing head. Let us briefly address how this could work technically. Chomsky (2004) distinguishes between Set Merge and Pair Merge. The former is ordinary merge, which creates an unordered set, whereas the latter is a type of merge that creates an ordered pair. Chomsky claims that adjunction corresponds to Pair Merge. (38) a. {α, β} b. In cases of adjunction, the adjoined phrase γP combines with another phrase δP. Normally, δP projects the label (though see Donati 2006). In that sense, roots interact with categorizers in a similar way that adjuncts interact with the nonadjoined part of the structure. This would give us the following structure (using a fairly traditional depiction for adjunction structures).14

14 Marantz (2013) puts forth several arguments in favor of the view that in English, roots always merge as modifiers of v. For reasons of space, we cannot review his arguments here.

220 

 Artemis Alexiadou and Terje Lohndal

v

(39) v

√ROOT

v On this view, roots strictly speaking do not have to be categorized. This would predict that there are cases where roots can survive a derivation without being categorized. De Belder (in press) argues that this is exactly the case based on primary compounds in Dutch.

5 Roots as complements or as adjuncts A different view holds that roots are merged with categorizing heads either as their complements or as modifiers. Perhaps the most prominent representative of this position is Embick (2004), see also Alexiadou & Anagnostopoulou (2013), Alexiadou, Anagnostopoulou & Schäfer (2015) for discussion. Embick, similarly to what is put forth in the work of Levin and Rappaport Hovav (2010), takes it that roots come in two guises; there are manner roots and state/result roots. These two types are integrated differently into the structural representation of the event. In Levin and Rappaport Hovav (2010), manner roots are integrated as modifiers of an event, while result roots are integrated as arguments of primitive predicates. In Embick’s work, manner roots are merged as modifiers of categorizing heads whereas state/result roots are merged as complements of these heads, as shown in (40) below: (40) a. modifiers of v, direct Merge v √

b. complements of v

e.g. hammer v

v v

e.g. flatten √

Embick suggests that direct merge has semantic consequences. It specifies the means component of the complex predicate. Implicitly, the type of merge is sensitive to the manner vs. result/state classification of roots. Manner roots merge as modifiers of v, state roots merge as complements of v. The structure in (40a) can feed secondary resultative predication. In that case, the element that appears in the complement of v cannot be a bare root, see (41) below.

The structural configurations of root categorization 



(41)

e.g. hammer flat

vP DP

v‘ v



 221

aP v

Embick argues that v’s complement cannot be a bare Root when v has a Root merged with it, as in (41), because the Root in the complement position would be “uncategorized”. That is, in (41) the complement of v is an aP.15 Direct merge applies to manner/instrument roots, and roots that can be so coerced, cf. Rossdeutscher (2011) and Marantz (2013).16 The above proposal implies that roots belong to ontological classes, which in turn influence the structural positions roots can occupy. However, as is evident from the structures in (40) above, in the absence of secondary resultative predication, direct Merge is practically indistinguishable from complement Merge. Furthermore, in view of the fact that several result roots can be coerced into manner interpretations is not clear what the argument in favor of (b) above is. Embick (2009) in fact introduces the notion of proxy state to describe the behavior of predicates such as break, which merge as modifiers of v but require an empty result state as their complement, which can sometimes be overt, e.g., break open. From this perspective then, structure (40b) seems to be restricted to a small subset of roots.

6 Roots as special There is also a view in which roots are privileged because of the position which they occupy. There are two main implementations of this idea in the literature.

15 Embick argues that the little v has a special feature which he labels FIENT for fientive, which is a type of become-operator (Embick 2004: 366). 16 The discussion in Embick raises the more general question, also raised by an anonymous reviewers, of whether or not roots need to be categorized to begin with, see our footnote 2. It might very well be that some roots need not be categorized, leading perhaps to the formation of root compounds (see e.g. Bauke & Roeper this volume, and Borer 2013). See also Alexiadou & Iordachioaia (2015) for some discussion of compound formation in DM.

222 

 Artemis Alexiadou and Terje Lohndal

One implementation is due to Adger (2013). He argues that Self Merge (­Guimarães 2000, Kayne 2010) is a fundamental operation which comes for “free” if one removes a stipulation in the standard definitions of Merge. A root is an entity which is able to undergo Self Merge: √DOG would then yield {√DOG}. Adger also argues for a labeling algorithm which does not include roots in its domain. A crucial consequence of this is that a root cannot merge with any other object distinct from that root, which is to say that roots can never take complements. Once a root has undergone Self Merge, it can be labeled, but the label is inserted after the structure has been built, an important claim which Adger (2013) is devoted to defending. A somewhat different implementation is due to De Belder and van Craenenbroeck (2015). They are concerned with a range of properties related to roots, most of which we will not have the space to discuss here. But one important question in their paper is whether roots are to be defined lexically or structurally. One way of illuminating this issue is to ask the following question: Can a functional vocabulary item (henceforth, FVI) be used as a root? De Belder and van Craenenbroeck say the following: Suppose we want to use an FVI as a root. In an early insertion model this state of affairs is unformulable. The mere presence of grammatical features on a VI will cause the projection headed by this VI to be recognized as functional rather than lexical. As a result, FVIs can never head lexical projections. In the Late Insertion-model, however, there is no a priori ban on merging a particular type of VI in a root terminal node.

If roots are defined structurally, and given that the structural position is devoid of features, it is immaterial whether the VI realizing this position post-syntactically bears any grammatical features. Therefore, it is possible to use FVIs as a resting ground in order to distinguish between a lexical and a structural definition of a root. De Belder and van Craenenbroeck provide the following examples from Dutch. (42) Ik heb het waarom van de zaak nooit begrepen. I have the why of the case never understood ‘I have never understood the motivation behind the case.’ (43) In een krantenartikel komt het wat/hoe/wie/waar in a newspaper.article comes the what/how/who/where altijd voor het waarom. always before the why ‘In a newspaper the what/how/who/where always precedes the why.’ (44) De studenten jij-en onderling. the students you-infinitive amongst.one.another ‘The students are on a first-name basis with each other.’



The structural configurations of root categorization 

(45) Martha is mijn tweede Martha is my second ‘Martha is my best friend.’

 223

ik. I

(46) Niets te maar-en! nothing to but-infinitive ‘Don’t object!’ (47) Paard is een het-woord. horse is a theneuter.def-word ‘Paard takes a neuter article.’ These examples all illustrate the use of an FVI in what is a root position. A possible counter-argument is to argue that these examples are exceptions. Rather, what is inserted in the root position is not an FVIs, but a root which happens to be homophonous with an FVI (De Belder and van Craenenbroeck 2015). However, this argument does not work. Consider the data in (48). (48) a. het getik van de klok the GE-tick of the clock ‘the ticking of the clock.’ b. het gefluit van de vogeltjes the GE-whistle of the birds ‘the whistling of the birds.’ In Dutch, there is a derivational word formation process which forms nouns referring to a pluractional event by means of ge-prefixation. This type of word formation productively allows FVIs to occur in root position. A range of examples are provided in (49).17 (49) a. Ik hoef al dat ge-maar niet. I need all that GE-but not ‘I don’t like those constant objections.’ b. Ik hoef al dat ge-alhoewel niet. I need all that GE-although not ‘I don’t like those constant considerations.’ c. Ik hoef al dat ge-of niet. I need all that GE-or not ‘I don’t like those constant alternatives.’

17 It may be objected that De Belder and van Craenenbroeck (2015) do not fully justify that ge- actually attaches to the root, and not an element that is already categorized.

224 

 Artemis Alexiadou and Terje Lohndal

d. Ik hoef al dat ge-hé niet. I need all that GE-PRT not ‘I don’t like the constant need for confirmation.’ e. Ik hoef al dat ge-waarom niet I need all that GE-why not ‘I don’t like the constant need for justification.’ f. Ik hoef al dat ge-nooit niet I need all that GE-never not ‘I don’t like the constant unwillingness.’ g. Ik hoef al dat ge-ik niet I need all that GE-I not ‘I don’t like all this egocentricity.’ It is not very explanatory to assume that FVIs are systematically ambiguous between a root and a functional reading. De Belder and van Craenenbroeck instead take the data to show that FVIs can be used in root positions. Whether or not an item is a root or not cannot be due to inherent characteristics or properties of the item as such, rather, it depends on the structural position in which the element is merged. Let us look at the technical implementation in De Belder and van Craenenbroeck (2015). They assume that Merge is inherently asymmetric (Jaspers 1998, Langendoen 2003, Cormack & Smith 2005, Di Sciullo & Isac 2008, Zwart 2009a, 2009b, 2011, Franco 2011, Osborne, Putnam & Gross 2011) and that Pair Merge rather than Set Merge is the default (contrary to Chomsky 2004 and many others). They define Merge as follows. (50) Unary Merge Merge selects a single subset from a resource (e.g. {α}), includes it in the derivation under construction (δ), and yields an ordered pair (e.g. , assuming {α} projects). They propose to take the definition in (50) as literally as possible. When an element {α} is the first one to be taken from the resource by Unary Merge, it is included into an empty derivation, i.e., the object under construction is the empty set Ø (see also Zwart 2011). The output of this instance of Merge is no different from any other: it yields an ordered pair, in this case . De Belder and van Craenenbroeck argue for late insertion of all vocabulary items.18 This implies that the resource from which Merge draws contains only grammatical features. Moreover, roots play no role in the syntactic derivation and

18 See Embick (2000) for arguments in favor of early insertion of roots.

The structural configurations of root categorization 



 225

they are defined structurally. As a result, there are no features in the resource that refer to or anticipate the merger of a root. For the example they use as an illustration, the books, this means that the resource is a set containing a definiteness feature and a plural feature, i.e. R = {[+def], [+pl]}. Based on this resource, the derivation proceeds as follows. Unary Merge first selects the singleton containing the plural feature from R and merges it with the empty set. Given that the latter is featureless, it is the plural feature that projects. (51)

{[+pl]} {[+pl]}

Ø

In the next step, the definiteness feature is targeted by Merge. It too projects its own structure, yielding (52). (52)

{[+def]} {[+def]}

{[+pl]} {[+pl]}

Ø

This concludes the syntactic derivation. The structure is handed over to PF, where Vocabulary Insertion (see Harley & Noyer 1999 for discussion) applies. When confronted with the structure in (52), the following potential VIs occur: (53) a. /δə/ ↔ [+def] b. /s/ ↔ [+pl] c. /buk/ ↔ Ø The phonological exponents on the left-hand side of the equivalences in (53) are inserted into the terminal nodes of the structure. The derivation converges as the books. On this view, Ø can never project. This is equivalent to saying that roots cannot project (see Bauke & Roeper this volume) and consequently, they are also acategorial. In De Belder and van Craenenbroeck’s theory, it is always the set that merges with Ø that projects and determines the category. Ø will always appear lower than any functional material, which is an aspect that the analysis shares with Adger (2013). This yields the following. (54) One Derivational Workspace One Root (ODWOR) In every derivational workspace there is exactly one root, and for every root there is exactly one derivational workspace.

226 

 Artemis Alexiadou and Terje Lohndal

As (54) illustrates, this will yield different cycles than what most of the literature argues for (viz. phases of some sort). One aspect of the De Belder and van Craenenbroeck analysis is that they do not account for the following two often-assumed claims: i) that roots have to be categorized, and ii) that the categorizer always seems to project. The first claim simply cannot be accounted for in their system, whereas the second property can be accounted for since it is the functional material above the Ø that determines the labels of the ensuing structure.

7 Discussion In this paper, we have presented the four different views on how roots are introduced into syntactic structures and categorized. (55) repeats (6) for expository convenience. (55) a. Roots are merged as complements of v (e.g., Bobaljik and Harley 2013, Harley 2014 and literature cited there). b. Roots are merged as v modifiers (Marantz 2013). c. Some roots are merged as v’s modifier while others as v’s complement (Embick 2004, 2010, Alexiadou & Anagnostopoulou 2013) d. Roots are inserted post-syntactically, thus they cannot take any complements or modify v (de Belder and van Craenenbroeck 2015). We have pointed at arguments against and in favor of each of the alternatives. Based on Occam’s razor, a unified way of introducing roots is to be preferred. (55b) is the only approach which derives endocentricity in the traditional sense, given that it entails that it is the label that always projects and serves as input to further applications of Merge. However, Embick’s view also has certain advantages. For example, it accounts for why you cannot say (56b) but it is ok to say (56a).19 (56) a. colored dark b. *blackened dark The explanation is that (56a) roots can function as v modifiers whereas (56b) roots cannot. Note that the views in (55a), (55b) and (55d) do not have a straightforward way of accounting for this restriction since all instances of roots merging

19 An anonymous reviewer points out that strings such as mine was blackened darker than yours. Let us point out that this a comparative construction, which in our opinion enables a state/result root to be coerced into a manner reading, i.e. colored, as mentioned above in the context of (41).



The structural configurations of root categorization 

 227

with a categorizer are identical. However, this view does face the challenges from section 3, where it was argued that roots cannot take complements at all. What (c) argues, though, is that there has to be a distinction between roots being merged as modifiers and roots being merged in some other way. The latter may be as complements, although section 3 and the work cited there suggest otherwise. Thus we appear to be at an impasse, which future work hopefully will help resolve.

8 Summary In this paper, we have discussed the possible structural configurations of categorization; that is, how roots come to be categorized in the syntax. We argued that there is a lot of evidence against roots taking complements, although the work discussed in section 5 provides another perspective. There are also problems with the other views presented. The view of roots as modifiers has the advantage that it derives the correct headedness of the structure, although it does not derive that roots often or always need to be categorized. Another view, that roots are introduced in a special way, is able to derive a lot of properties often attributed to roots, but it faces two main problems: i) it presents a special mechanism for introducing roots, i.e., it makes roots syntactically unique, ii) the view does not derive that roots often or always need to be categorized. We conclude that at present, there is not enough empirical evidence distinguishing the four views adequately and that more work needs to be done to tease them apart in order to provide a conclusive answer to what the structural configuration of categorization is. Acknowledgments: We are grateful to Hans Petter Helland, the audience at the Workshop Roots and Labels in Marburg in March 2014, the participants in the Research Seminar at the University of Stuttgart, the research group EXOGRAM in Trondheim, an anonymous reviewer and the editors of this volume for their comments and suggestions. Alexiadou’s research was supported by a DFG grant to the project B6 of the Collaborative Research Center 732 Incremental Specification in Context at the University of Stuttgart.

References Acquaviva, Paolo. 2009. Roots and lexicality in Distributed Morphology. In Alexandra Galani, Daniel Redinger & Norman Yeo (eds.), York-Essex Morphology Meeting 5: Special Issue of York Working Papers in Linguistics, 1–21. York: University of York, Department of Language and Linguistic Science. Adger, David. 2013. The syntax of substance. Cambridge, MA: MIT Press.

228 

 Artemis Alexiadou and Terje Lohndal

Alexiadou. Artemis. 2001. Functional structure in nominals. Amsterdam: John Benjamins. Alexiadou, Artemis. 2009. On the role of syntactic locality in morphological processes: the case of (Greek) nominals. In Anastasia Giannakidou & Monika Rathert (eds.), Quantification, definiteness, and nominalization, 253–280. Oxford: Oxford University Press. Alexiadou, Artemis. 2010a. Nominalizations: A probe into the architecture of grammar. Part I: The nominalization puzzle. Language and Linguistics Compass 4. 496–511. Alexiadou, Artemis. 2010b. Nominalizations: A probe into the architecture of grammar. Part II: The aspectual properties of nominalizations, and the lexicon vs. syntax debate. Language and Linguistics Compass 4. 512–523. Alexiadou, Artemis. 2014. Roots don’t take complements. Theoretical Linguistics 40. 287–297. Alexiadou, Artemis & Elena Anagnostopoulou. 2013. Manner vs. result complementarity in verbal alternations: A view from the clear-alternation. In Stefan Keine and Shayne Sloggett (eds.), Proceedings of the Forty-Second Annual Meeting of the North East Linguistic Society, 39–52. University of Massachusetts, Amherst: GSLA. Alexiadou, Artemis, Elena Anagnostopoulou & Florian Schäfer. 2006. In Mara Frascarelli (ed.), The properties of anticausatives cross-linguistically. Phases of interpretation, 187–212. Berlin: Mouton de Gruyter. Alexiadou, Artemis, Elena Anagnostopoulou & Florian Schäfer. 2015. External arguments in transitivity alternations. A layering approach. Oxford: Oxford University Press. Alexiadou, Artemis & Gianina Iordachioaia. 2015. Idiomaticity and compositionality in deverbal compounds. Paper presented at BCGL 8, Brussels. http://www.crissp.be/events/bcgl8/ bcgl8-program/. Alexiadou, Artemis & Kirsten Gengel. 2011. Classifiers as morphosyntactic licensors of NP ellipsis: English vs. Romance. In Suzi Lima, Kevin Mullin & Brian Smith (eds.), Proceedings of the Thirty-Ninth Annual Meeting of the North East Linguistic Society, 15–28. University of Massachusetts, Amherst: GSLA. Alexiadou, Artemis & Jane Grimshaw. 2008. Verbs, nouns and affixation. In Florian Schäfer (ed.), SinSpec 1: Working Papers of the SFB 732, 1–16. Stuttgart: Universität Stuttgart. Alexiadou, Artemis, Liliane Haegeman & Melita Stavrou. 2007. Noun phrase in the generative perspective. Berlin: Mouton de Gruyter. Alexiadou, Artemis, Gianina Iordachioia, Mariangeles Cano, Fabienne Martin & Florian Schäfer. 2013. The realization of external arguments in nominalizations. Journal of Comparative Germanic Linguistics 16: 73–95. Anagnostopoulou, Elena and Yota Samioti. 2014. Domains within words and their meanings: A case study. In Artemis Alexiadou, Hagit Borer and Florian Schäfer (eds.), The syntax of roots and the roots of syntax, 81–111. Oxford: Oxford University Press. Arad, Maya. 2003. Locality constraints on the interpretation of roots: The case of Hebrew denominal verbs. Natural Language and Linguistic Theory 21. 737–778. Arad, Maya. 2005. Roots and patterns: Hebrew morphosyntax. Dordrecht: Springer. Bobaljik, Jonathan D. 2012. Universals in Comparative Morphology: Suppletion, superlatives, and the structure of words. Cambridge, MA: The MIT Press. Bobaljik, Jonathan D. and Heidi Harley. 2013. Suppletion is local: Evidence from Hiaki. University of Connecticut and University of Arizona manuscript. Borer, Hagit. 2005a. Structuring sense: In name only. Oxford: Oxford University Press. Borer, Hagit. 2005b. Structuring sense: The normal course of events. Oxford: Oxford University Press. Borer, Hagit. 2013. Structuring sense: Taking form. Oxford: Oxford University Press. Bowers, John. 1993. The syntax of predication. Linguistic Inquiry 24. 591–656.



The structural configurations of root categorization 

 229

Bowers, John. 2010. Arguments as relations. Cambridge, MA: MIT Press. Cinque, Guglielmo. 1994. On the evidence for partial N-Movement in the Romance DP. In Guglielmo Cinque, Jan Koster, Jean-Yves Pollock, Luigi Rizzi & Rafaella Zanuttini (eds.), Paths towards universal grammar, 85–110. Washington, D.C: Georgetown University Press. Chomsky, Noam. 1995. The minimalist program. Cambridge, MA: MIT Press. Chomsky, Noam. 2004. Beyond explanatory adequacy. In Adriana Belletti (ed.), Structures and beyond: The cartography of syntactic structures, 104–131. Oxford: Oxford University Press. De Belder, Marijke. 2011. Roots and affixes: Eliminating lexical categories from syntax. University of Utrecht doctoral dissertation. De Belder, Marijke. 2013. Collective mass affixes: when derivation restricts functional structure. Lingua 126. 32–50. De Belder, Marijke. In press. The root and nothing but the root: Primary compounds in Dutch. Syntax. De Belder, Marijke & Jeroen van Craenenbroeck. 2015. How to merge a root? Linguistic Inquiry 46. Di Sciullo, Anna Marie & Daniela Isac. 2008. The asymmetry of merge. Biolinguistics 2. 260–290. Donati, Caterina. 2006. On wh head movement. In Lisa Cheng & Norbert Corver (eds.), Wh moving on, 21–46. Cambridge, MA: MIT Press. Embick, David. 2000. Features, syntax, and categories in the Latin perfect. Linguistic Inquiry 31. 185–230. Embick, David. 2004. On the structure of resultative participles in English. Linguistic Inquiry 35. 355–392. Embick, David. 2009. Roots, states, and stative passives. Paper presented at the Roots workshop, University of Stuttgart, June 2009. Embick, David. 2010. Localism vs. globalism in morphology and phonology. Cambridge MA: MIT Press. Embick, David & Morris Halle. 2005. On the status of stems in morphological theory. In Twan Geerts, Ivo van Ginneken & Haike Jacobs (eds.), Romance Languages and Linguistic Theory 2003: Selected Papers from Going Romance 2003, Nijmegen, 20–22 November, 37–62. Amsterdam: John Benjamins. Embick, David & Alec Marantz. 2008. Architecture and blocking. Linguistic Inquiry 29. 1–53. Folli, Raffaella & Heidi Harley. 2005. Flavors of v: Consuming results in Italian and English. In Paula Kempchinsky & Roumyana Slabakova (eds.), Aspectual inquiries, 95–120. Dordrecht: Springer. Folli, Raffaella & Heidi Harley. 2007. Causation, obligation, and argument structure: On the nature of little v. Linguistic Inquiry 38. 197–238. Franco, Ludovico. 2011. The strict asymmetry of Merge. Università Ca’Foscari, Venice, manuscript. Grimshaw, Jane. 1990. Argument structure. Cambridge, MA: MIT Press. Grimstad, Maren Berg, Terje Lohndal & Tor Anders Åfarli. 2014. Language mixing and exoskeletal theory: A case study of word-internal mixing in American Norwegian. Nordlyd 41: 213–237. Guimarães, Maximiliano. 2000. In defense of vacuous projections in bare phrase structure. University of Maryland Working Papers in Linguistics 9. 90–115. Hale, Kenneth & Samuel Jay Keyser. 1993. On argument structure and the lexical expression of syntactic relations. In Kenneth Hale and Samuel Jay Keyser (eds.), The view from building 20: Essays in linguistics in honor of Sylvain Bromberger, 53–109. Cambridge, MA: MIT Press. Hale, Kenneth & Samuel Jay Keyser. 2002. Prolegomenon to a theory of argument structure. Cambridge, MA: MIT Press.

230 

 Artemis Alexiadou and Terje Lohndal

Halle, Morris & Alec Marantz. 1993. Distributed morphology and the pieces of inflection. In Kenneth Hale and Samuel Jay Keyser (eds.), The view from building 20: Essays in linguistics in honor of Sylvain Bromberger, 111–176. Cambridge, MA: MIT Press. Harley, Heidi. 1995. Subjects, events and licensing. MIT doctoral dissertation. Harley, Heidi & Rolf Noyer. 1999. Distributed morphology: State-of-the-art article. Glot International 4. 3–9. Harley, Heidi. 2005a. How do verbs take their names? Denominal verbs, manner incorporation and the ontology of roots in English. In Nomi Erteschik-Shir & Tova Rapoport (eds.), The Syntax of aspect, 42–64. Oxford: Oxford University Press. Harley, Heidi. 2005b. Bare phrase structure, A-categorial roots, One-replacement and unaccusativity. In Yaroslav Gorbachov & Andrew Nevins (eds.), Harvard Working Papers in Linguistics 11, 59–78. Cambridge: Harvard University, Department of Linguistics. Harley, Heidi. 2014. On the identity of roots. Theoretical Linguistics 40. 225–276. Harley, Heidi and Rolf Noyer. 2000. Licensing in the non-lexicalist lexicon. In Bert Peeters (ed.), The lexicon/encyclopedia interface, 349–374. Amsterdam: Elsevier. Harley, Heidi & Megan Schildmier Stone. 2013. The ‘no agent idioms’ hypothesis. In Raffaella Folli, Christina Sevdali & Robert Truswell (eds.), Syntax and its limits, 251–274. Oxford: Oxford University Press. Herburger, Elena. 2000. What counts: Focus and quantification. Cambridge, MA: MIT Press. Horvath, Julia & Tal Siloni. 2002. Against the little-v hypothesis. Rivista di Grammatica Generativa 27. 107–122. Jackendoff, Ray. 1977. X-bar Syntax: A Study of phrase structure. Cambridge, MA: MIT Press. Jaspers, Danny. 1998. Categories and recursion. Journal of Applied Linguistics 12. 81–112. Jeong, Youngmi. 2007. Applicatives: Structure and interpretation from a minimalist perspective. Amsterdam: John Benjamins. Kayne, Richard S. 2010. Antisymmetry and the lexicon. In Anna Maria Di Sciullo & Cedric Boeckx (eds.), The biolinguistic enterprise, 329–353. Oxford: Oxford University Press. Kratzer, Angelika. 1996. Severing the external argument from its verb. In Johan Rooryck & Laurie Zaring (eds.), Phrase structure and the lexicon, 109–137. Dordrecht: Kluwer. Kratzer, Angelika. 2003. The event argument and the semantics of verbs. University of Massachusetts, Amherst, manuscript. Langendoen, Terence D. 2003. Merge. In Andrew Carnie, Heidi Harley & Mary Willie (eds.), Formal approaches to function in grammar: In honor of Eliose Jelinek, 307–318. Amsterdam: John Benjamins. Lechner, Winfried, Giorgos Spathas, Artemis Alexiadou & Elena Anagnostopoulou. 2015. On deriving the typology of repetition and restitution. Paper presented at GLOW 38, Paris. https://sites.google.com/site/2015glow/home/programme. Lin, Tzong-Hong. 2001. Light verb syntax and the theory of phrase structure. University of California, Irvine, doctoral dissertation. Lohndal, Terje. 2012. Without specifiers: Phrase structure and events. University of Maryland doctoral dissertation. Lohndal, Terje. 2014. Phrase structure and argument structure: A case study of the syntax semantics interface. Oxford: Oxford University Press. Marantz, Alec. 1984. On the nature of grammatical relations. Cambridge, MA: MIT Press. Marantz, Alec. 1997. No escape from syntax: Don’t try morphological analysis in the privacy of your own lexicon. In A. Dimitriadis, L. Siegel, C. Surek-Clark & A. Williams (eds.), University of Pennsylvania Working Papers in Linguistics, 201–225. Philadelphia: University of Philadelphia.



The structural configurations of root categorization 

 231

Marantz, Alec. 2005. Objects out of the lexicon: Objects as events. MIT manuscript. Marantz, Alec. 2013. Locating the verbal root. Talk given at the 25th Scandinavian Conference of Linguistics, University of Iceland, Reykjavik. Merchant, Jason. 2013. Voice and ellipsis. Linguistic Inquiry 44. 77–108. Newmeyer, Frederick J. 2008. Current challenges to the lexicalist hypothesis: An overview and a critique. In Will Lewis, Simin Karimi, Heidi Harley, & Scott Farrar (eds.), Time and again: Papers in honor of D. Terence Langendoen, 91–117. Amsterdam: John Benjamins. Nunberg, Geoffrey, Ivan A. Sag & Thomas Wasow. 1994. Idioms. Language 70. 491–538. Osborne, Timothy, Michael Putnam & Thomas M. Gross. 2011. Bare phrase structure, label less trees, and specifier-less syntax. Is Minimalism becoming a dependency grammar? The Linguistic Review 28. 315–364. Pesetsky, David. 1995. Zero syntax. Cambridge, MA: MIT Press. Pustejovsky, James. 1991. The syntax of event structure. Cognition 41. 47–82. Pylkkänen, Liina. 2008. Introducing arguments. Cambridge, MA: MIT Press. Radkevich, Nina. 2010. On location: The structure of case and adpositions. University of Connecticut doctoral dissertation. Ramchand, Gillian. 2008. Verb meaning and the lexicon: A first phase syntax. Cambridge: Cambridge University Press. Rappaport Hovav, Malka & Beth Levin. 2010. Reflections on manner/result complementarity. In Edith Doron, Malka Rappaport Hovav & Ivy Sichel (eds.), Syntax, lexical semantics, and event structure, 21–38. Oxford: Oxford University Press. Rossdeutscher, Antje. 2011. Particle verbs and prefix verbs in German: Linking theory versus word-syntax. Leuvense Bijdragen 97. 1–53. Roy, Isabelle. 2010. Deadjectival nominalizations and the structure of the adjective. In Artemis Alexiadou & Monika Rathert (eds.), The syntax of nominalizations across languages and frameworks, 129–158. Berlin: Mouton de Gruyter. Schäfer, Florian. 2008. The syntax of (anti)causatives: External arguments in change-of- state contexts. Amsterdam: John Benjamins. Schäfer, Florian. 2012. Two types of external argument licensing: The case of causers. Studia Linguistica 66. 128–180. Schein, Barry. 1993. Plurals and events. Cambridge, MA: MIT Press. Schein, Barry. 2003. Adverbial, descriptive reciprocals. Philosophical Perspectives 17. 333–367. Speas, Margaret J. 1990. Phrase structure in natural language. Dordrecht: Kluwer. von Stechow, Arnim. 1996. The different readings of wieder ‘again’: A structural account. Journal of Semantics 13. 87–138. Tenny, Carol. 1987. Grammaticalizing aspect and affectedness. MIT doctoral dissertation. Tenny, Carol. 1994. Aspectual roles and the syntax-semantics interface. Dordrecht: Kluwer. Wechlser, Stephen. 1990. Accomplishments and the prefix re-. In Juli Carter and Rose-Marie Dechaine (eds.), Proceedings of the North Eastern Linguistic Society XIX , 419–434. Zwart, Jan-Wouter. 2009a. Uncharted territory? Towards a non-cartographic account of Germanic syntax. In Artemis Alexiadou, Jorge Hankamer, Thomas McFadden, Justin Nuger & Florian Schäfer (eds.), Advances in comparative Germanic syntax, 59–84. Amsterdam: John Benjamins. Zwart, Jan Wouter. 2009b. Prospects for top-down derivation. Catalan Journal of Linguistics 8. 161–187. Zwart, Jan-Wouter. 2011. Structure and order: asymmetric merge. In Cedric Boeckx (ed.), The Oxford handbook of linguistic minimalism, 96–118. Oxford: Oxford University Press.

Leah S Bauke and Tom Roeper

How Unlabelled Nodes work: Morphological derivations and the subcomponents of UG operations Abstract: This paper investigates which role labeling plays at the morphosyntax interface. It is argued that temporarily unlabelled nodes exist in the grammar beyond the level of roots, which are unlabelled per definition and that well known properties of small clause constructions, verb particle constructions, incorporation structures and nominalizations can be linked to the existence of unlabelled nodes in the lexicon and in the initial stages of the syntactic derivation. In that sense this paper is to be understood as a continuation of the ideas already implied in the abstract clitic hypothesis of Keyser & Roeper (1992) and the theory of dynamic antisymmetry of Moro (2000).

1 Introduction A major challenge in scientific work lies in determining which ingredients are notational choices and which ones represent core ideas.1 Put differently, a fundamental question for any scientific inquiry is: A. Are the ingredients of formal mechanisms a single psychological “unit” or are they independently psychologically real? In Chomsky’s recent work (2013) he has proposed that we separate the devices of: Merge, Linearization, and Labelling which originally were unified in Phrase-Structure-Rules (PSRs). A non-linearized array could constitute an interface with a cognitive representation where: Linearization occurs only for the purpose of Externalization.

1 Chomsky (pc) has informally remarked that with progress “technical solutions should turn into leading ideas”. DOI 10.1515/9781501502118-010

234 

 Leah S Bauke and Tom Roeper

That is, linear order is needed for overt expression in time. If true, an important question to be asked is: B. Can we find further independent other derivations where linearization or Labelling has not yet applied? We argue that if we can discover a domain where Unlabelled Nodes are in principle needed, then it provides a fundamental justification for the claim that: C. Nodes without Labels can be real in UG. Indeed, where grammaticality can only be explained if a Node is temporarily Unlabelled, then we have a classic form of linguistic evidence for psychological reality, in this case, for Unlabelled nodes. This approach has already been argued for in Moro (2000) for Small Clauses. If correct, then it should arise elsewhere as well. We will argue that morpholological derivations require Unlabelled Nodes as well. Our approach demonstrates that Unlabelled Nodes are not “defective” or “Last Resort” phenomona, but a necessary and natural concept.

1.1 Labelling in acquisition, small clauses, and morphology Initial evidence in favor of this approach comes from acquisition (Roeper 2014) where it is argued (following Lebeaux 2000), that children begin with Adjoinalpha. These initial adjunctions are Unlabelled and a) therefore allow broader interpretations, and b) ultimately require Labels and therefore revisions are forced on the acquisition path. We can see this, for instance, from the classic fact that “no” (and “Nein” in German) initially at the 2-word stage allows both corrective readings (beans, no peas) and propositional readings equal to not ((they are) not beans). No will ultimately receive one Label as a negative conjunction, and a different Label as a Determiner in a DP. Ulimately, we think, the requirements on a full theory of Labelling will be very sensitive to particular syntactic environments in particular languages. The strongest and most extensively discussed current arguments revolve around how Labelling for Small Clauses works: the claim is that the node over one Unlabelled branch of a small clause raises in order to acquire a Label (see Moro 2000, Chomsky 2013 and Bauke 2014). The argument is that movement is motivated by the necessity to satisfy the UG requirement that Nodes are Labelled.



How Unlabelled Nodes work 

 235

The technical approach uses anti-symmetry (Kayne 1994 and subsequent literature) to force the movement of one element of a symmetrical pair to achieve anti-symmetry by acquiring a Label at a higher level. We seek to broaden the question: Is it the presence of symmetry or the absence of a Label that is the primary motivation for movement. (Narita & Fuki to appear offer considerations that go in the same direction). From our approach to morphology, we suggest that it is the absence of a Label that motivates derivational operations, both for morphology and small clauses.

1.2 The morphology/syntax interface This argument entails the view that both morphology and syntax obey the same principles, particularly at the interface between them. The mechanism we build claims that: a) the assumption that Nodes are Unlabelled, b) movement operations occur, and c) under-determined semantics are all resolved through the concept of Unlabelled Nodes. It is reflected in a variety of morphological phenomena. In particular, compound incorporation (dog-loving, upcoming) discussed in section 2 and productive pre-verbal affixation (re-outflow) discussed in section 3 are underiveable without the critical assumption of Unlabelled Nodes at one stage in the derivation. Our approach extends the claim that elements can remain as Roots prior to Labelling in Distributed Morphology (cf. sections 4 and 5) and discusses some further consequences in section 6, which is followed by a very detailed illustration of the technical implementation of the proposal presented here in section 7. Section 8 concludes the paper.

2 Incorporation In Keyser and Roeper (1992) the Abstract Clitic Position was introduced in part to provide a launching site for incorporation. The term “abstract” captured the surprising fact that the position seemed to range over traditional lexical categories – and other features – and it revealed sharp complementary distribution. It is the “abstractness” notion which adumbrates the need for an Unlabelled Node. Witness how different categories are mutually exclusive (cf. 1–3) below: (1) play [N, A, P, dative, middle] a. John played ball – N b. John played smart – A

236 

 Leah S Bauke and Tom Roeper



c. the game played out – P a’. *John played ball out b’. *John played smart ball (smart as a predicate of play) c’. *John played smart out

(2)

a. John acted a fool b. John acted out c. John acted dumb a’. *John acted dumb a fool b’. *John acted out a fool c’. *John acted dumb out

(3)

Dative excluded: a. *John played me dumb. b. *John played his teacher a fool. c. *The game played us out.

(4) Middle excluded a. *ball plays [invisible middle morpheme] easily b. *dumb acts easily As we can see in (1) play can select a nominal, adjectival or prepositional category. It is however not possible to combine any two of these as the examples in (1a’-1c’) illustrate. Similarly for act in (2) and in (3) we can see additionally that datives and middles are excluded in these constructions as well. In (4) the origin of dumb in the Abstract Clitic Position conflicts with an invisible middle marker (an overt reflexive in some languages), causing ungrammaticality. It seems natural then to assert that the abstract clitic position must exist but lacks a Label in any syntactically traditional way (cf. the original argumentation in Keyser & Roeper 1992 and Bauke 2014 for a summary). Fundamental evidence for the notion that Unlabelled Nodes are legitimate occurs if they are part of linguistic operations. We argue that incorporation both: a) Targets an Unlabelled position and b) Moves to an Unlabelled position.

2.1 Unlabelled positions First we note that there are well-known morphological relations where Unlabelled Nodes appear to be a part of derivations. For instance, -ed adjective

How Unlabelled Nodes work 



 237

formation takes a variety of elements that do not exist as independent words with categorical labels: (5)

a. lion-hearted/*hearted b. unseen problems/*the seen problems c. well-made suggestions/*the made suggestions d. deodorant =/= *oderant/*deodor (pc. Janet Randall)

Note that the traditional notion of circumflexes in German essentially illustrate this relation as well (ge-….-t, as in gesichert, getanzt, gespielt, …etc.) where if it were created in two steps, one does not occur independently (viz: *gesicher, getanz, gespiel, .… etc.) In each instance in (5), the form: (6) a. *lion-heart/*hearted b. *unsee/seen problems … does not really exist and acquires its adjectival category when –ed is added to a combination of roots that is combined without a category commitment, (as is commonly argued within Derivational Morphology, cf. e.g. Halle & Marantz 1993, Harley & Noyer 2000, Harley 2005, and many others). These examples fall together with traditional notions of compound formation where the verbal elements do not occur by themselves, but nevertheless allow recursive generation: (7) a. coffee-maker/ b. coffee-maker-maker c. ice-breaker-lover where there notoriously exist no verbs by themselves (8a) or with just a noun incorporated (8b): (8) a. *he is a maker/*it is a breaker b. *to coffee-make/*to coffee-love/*to truck-drive nor is there a verb at a more extreme level: (9) *to coffee-maker love/*to coffee-maker make/*to truck-driver drive As with standard cases of cyclic movement, here also an intermediate step in the derivation appears never to arise by itself. This is true even though the internal structure may be complex and seem to carry labels (the coffee-maker), as in: *to [N coffee-maker] love. Notice also that the complex form retains a strict verbobject reading no matter how far embedded it is. This strongly suggests that a lexical Phase has occurred that is a mirror of the VP phase.

238 

 Leah S Bauke and Tom Roeper

We argue that the further step must be taken because something is defective in the derivation until that point: i.e. a missing [N] [A] Label is absent until added by affix: [coffee-maker-love+ er N] or [coffee-make+ing A] in English. Some languages, like Mohawk (Postal 1979) allow a V productive label [to dress-make] that is unavailable in English. The English derivation is only completed once the incorporated material is raised to attach to –er which carries a Nominal Label, thereby making it an expressible element. We will illustrate the exact structural configurations in what follows.

2.2 Landing site The first question to be addressed is: where does incorporation go? It is typically adjoined to a verb, which could be in a SPEC position, but notably has no clear categorical content, as is illustrated in (10), which is in part a repetition of the examples in (1–3) and (7–8): (10) a. upcoming, bystander, ongoing (note: *stander) b. dumb-acting c. dog-lover In each of these instances it is the element in the Abstract Clitic position (also called First Sister in Roeper and Siegel (1978) which moves (see further discussion and revision in Bauke (2014)). Various approaches that refer to argument structure have failed to address the fact that only the immediately dominated First Sisters will move and that any element that involves a double-object (cf. 11a) or particle + object (cf. 11b), or object + locative argument structure (cf.  11c) is blocked because it entails another higher node involving a Small Clause: (11) a. promise [a boy a cookie] => *boy-promiser, *cookie-promiser b. throw [away a ball]=> *away-thrower c. put [a bowl on the shelf ]=> *bowl-putter, *shelf-putter Also in this context, -ed adjectival compounds are instructive because they allow recursive compounds, but only by what repeatedly appears in First Sister position. First note the contrasts in (12): (12) make a boat well => boat-maker, *well-maker but when the object is removed by passive, well is in the abstract clitic (i.e. First Sister) position, and incorporation is fine: (13) made well => (the boat) is well-made

How Unlabelled Nodes work 



 239

Notice also, that this restriction shows up with greater subtlety with repeated PP’s: (14) a. made by hand => hand-made b. made in a factory => factory-made c. made by hand in a factory => *hand-factory-made, ?factory-hand-made Note further that the landing site appears to be the same and is never given a Label, but rather treated as kind of “tucking in” operation which does not disturb the structure being built: (15)

N, A V dog

-ing V love

And recursion to new “tucked in” positions can occur: (16) the dog-loving-loving people (=they love loving dogs) The incorporation clearly blocks a projected object (*potato-eating of food) and therefore the incorporated elements entails the THEME Argument. Nonetheless additional interpretations have been proposed (cf. e.g. Williams 1994). This would suggest that in an Unlabelled state it can be ambiguous between an incorporated THEME-object, an incorporated MANNER-adverb, or object of an implied Preposition. So, (17) fish-frying can refer to either what is fried or to the manner of frying. In the following examples, we can again see both options in the contrast between object and implied or deleted Preposition: (18) a. room-sweeping [object] b. broom-sweeping [Manner: with a broom] c. stage-director = directs the stage/directs on stage Thus, we take the semantic flexibility of the incorporated element to be a direct reflection of its Unlabelled status, although we cannot articulate this notion with semantic precision here.

240 

 Leah S Bauke and Tom Roeper

We now need an explanation for why prepositions are not incorporated (cf. 19b) if whole compounds (cf. 19a) can be incorporated: (19) a. lion-tamer-lover b. *with-broom sweeper One answer to this question is identical to what we have provided before: a PP, not just a clitic P head, will provide a label for this category, and therefore it is disallowed (compare (20) and (15)):2 (20)

N U U2

-er, øN, -s V

V

U1

These arguments show that if Unlabelled Node is a primitive in the system, open to movement and recursion, it can also provide explanations for clear and wellknown gaps in what the system produces at the morphological- derivational level and which could not be systematically accounted for beforehand (notice that these are treated as accidental lexical gaps in most accounts).

2 One reviewer remarks in this context that the projection of U2 is somewhat surprising in this context. One would expect that U2 is just as invisible to labeling as U1 seems to be. It is to be noted though, that the structure in (19) is not the result of simple X’-projection a la Grimshaw (1990). These are rather cases of tucking-in that are needed for the incorporation structures in (18). However, tucking-in does not require a label. As we articulate below, our system will allow a sequence of Unlabelled Nodes. In principle one might expect from a direct object construction that you can create repeated incorporation: (i) (ii) (iii) (iv)

give the boy candy candy giver *boy giver *candy boy giver

The ungrammaticality of (iv) is not a result of an incorporation constraint but must be a semantic constraint which is evidenced by the fact that boy giver is also ungrammatical – which matches the general observation that datives do not incorporate (cf. also discussion in (1–3) and (6–7) above). (v) I envy John (vi) *John envier The same can be observed in German: (vii) Ich helfe dem Mann (viii) *der Mannhefer



How Unlabelled Nodes work 

 241

2.3 Blocked Plurals Another consequence of the absence of a Label, making a limitation to roots even more plausible, is that plurals are disallowed. Consider (21) in this context: (21) *peas-eater Even when a plural is entailed by implication, i.e.a pea-eater eats more than one pea, plural marking is blocked. Elsewhere (Roeper 1995) it has been observed that nominalizations which acquire a plural block object control: (22) a. Object argument is controlled: John needs discussion => we should discuss John John enjoys defeat = likes being defeated b. Arguments uncontrolled with plural John enjoys defeats = either subject or object relation entailed John needs discussions => he wants to participate in them. So here again the open argument slot suggests that the verbal properties are still viable and not yet blocked by an N label. If the incorporation position must be Unlabelled, and plural carries an N label, then we have an explanation for the absence of plural, despite the apparent semantic anomaly of it being implied.3 It has long been unclear how one should categorize bare nominalizations which resist articles: (23) John knows that preparation is important =/=the preparation/a preparation so that although the nominalization carries the subject position, just as an infinitive does: (24) to prepare is important the article is not allowed. Once again, if the effect of the plural is to impose an N-head which those who argue for a NumberP also argue (cf. e.g. Alexiadou 2008; Borer 2005 and many others), then we have an explanation for why neither an N nor a V head is appropriate. Notice that it follows as well that the observation that plurals seem incompatible with typical nominalizations is also explained: (25) *the destructions of the city.

3 Recent interesting work by Schwarzschild (2011) on “plural mass nouns” treats this as a semantic puzzle, which we believe has a formal morphological explanation.

242 

 Leah S Bauke and Tom Roeper

The form should be where destruction has not acquired its final label and therefore, in brief, its argument structure remains open (see Roeper 1995, 1999 for detailed discussion). (26) [the [V destruction [PP of the city]] Typically the plural blocks the direct object and allows only an external argument reading which this contrast illustrates (27) a. the conquest of the Indians (object) b. the conquests of the Indians (agent)4

2.4 Intermediate summary We argue that Unprobed Merge without feature-satisfaction naturally exists in the lexicon if the nodes carry no features to be probed or satisfied. So the following automatically holds: a. Inherently Unlabelled positions exist b. Unlabelled landing sites exist c. Unlabelled rules can be recursive. We consider the operations to be an argument for internal rather than external Merge from a rather different angle than what is standardly discussed in the literature (cf. e.g. most recently Ceccetto & Donati 2015 but see Epstein, Kithahara & Seely 2016 for a very illuminating approach that seems to go in the same direction). All of these operations are subordinate to the ultimate requirement that at the final step Labelling must be present. This approach is a specific instantiation of Chomsky’s (2013) suggestion that Labelling exists to satisfy Interface interpretation requirements and parallel to claims by Borer (2013) that pre-syntactic operations can be complex. These arguments all suggest that the original intuition behind the NonLabelled Abstract Clitic, was pre-figuring the concept of Unlabelled.

4 Note that Bauke and Roeper (2012) show many cases where these are allowed, which suggests some further level of derivation is possible in some cases as one might expect for plural events, e.g. : (i) police-shootings



How Unlabelled Nodes work 

 243

3 German particles It has long been a question exactly how the complex verbal predicates in Germanic should be represented. V2 operations typically leave behind a part of the predicate while they are joined as “prefixes” (cf. 28a) in the infinitival form (cf. 28b): (28) a. er steht auf he stands up ‘he stands up’ b. er kann aufstehen he can up.stand ‘he can stand up’ Again, the question that emerges here is: does the remainder have a Label? The verb that moves up must have a Label in order to satisfy the higher verbal projection into which it moves. However, it would seem to be unnecessary for the remainder to have a Label. And yet, it is notable that the extraposed version in English carries a full Maximal Projection5: (29) a. *John zipped all the way up the bag. b. John zipped the bag all the way up. c. *John could zip all the way up the bag. d. John could zip the bag all the way up. Chomsky (pc) suggested that the particle moves to find a maximal projection. An updated version of that suggestion would be that the particle moves to a Labelled particle node which is at the end of the VP.

5 Notice that we cannot only deduce from this Labelling: move to an XP position. This is evident from the fact that such positions can be further modified by adjectives and adverbs which entail the presence of a Label. It also shows that even Unlabelled Nodes cannot be syntactically projected without an explicit syntactic context. They are not freely adjoined.  Consider the following examples in this context which alternate between a restitutive and a resumptive reading, clearly showing that what we are looking at here is an output requirement on labels: (i) John turned the TV off again. (ii) John turned off the TV again. That is, modifying off with again in final position generates a restitutive meaning, unavailable in (ii). So in (ii) John turns off the television twice, whereas in (i) John restores the TV just to a state that it has been in before, i.e. off.

244 

 Leah S Bauke and Tom Roeper

Parallel examples in German show that basically the same considerations apply: (30) a. Er reißt die Tasche ganz zu. b. *Er reißt ganz zu die Tasche. c. Er kann die Tasche ganz zureißen. d. *Er kann ganz zureißen die Tasche. However, the alternation between participial and verbal uses of reißen (cf. discussion in 28) complicate matters somewhat and the observed pattern might be subject to deeper principles that regulate the alternation of verbal and participial forms. (31) a. Er reißt die Tasche ganz zu. b. Er hat die Tasche ganz zu gerissen. So, we can see here that extensions of the account on Unlabelled Nodes provided in section 2 are possible to other languages and constructions as well.

4 Sublexical merge and interfaces Now if we have sublexical internal merge (i.e. incorporation) without Probe/Goal relations, we have a symmetrical system obedient to basic linguistic principles. Why should one then have movement at all? It was argued that leftward incorporation of particles was for the purpose of finding an N label already in Roeper (1999): (32) flow out => outflow In fact, English allows non-moved forms: (33) breakout/knockout/strikeout It is striking that these post-verbal particles preserve an AGENT role, which would be the external argument of the verb. In contrast, movement of an element into that position blocks the AGENT reading, which cannot be projected into a higher –er position: (34) *outflower/*outbreaker/*incomer This kind of a semantic effect for a derivation is a novel one and its place in linguistic theory must still be more carefully conceived. We return to it below. Now if a template of this kind exists then it should pass the premiere test for productivity, which is that it should allow recursion. The first piece of evidence that the incorporation node is indeed Unlabelled is that it tolerates exactly the kind of affixation which Labeled nodes prevent ([N husband]) in (35): (35) *rehusband/*relife/*retime



How Unlabelled Nodes work 

 245

As (35) illustrates these forms are strictly ungrammatical, although the meanings could be very natural. We will argue that there is a stage in the derivation where out- is Unlablled [U out [V flow], and therefore verbal re- can be added [re [U out [V flow]] prior to nominalization [N [re [U out [V flow]] to produce: (36) (the) re-outflow And it is easy to produce recursion: (37) (the) re-re-outflow of funds It is now important to see that re2- in [re2-[ re1-outflow]] also does not attach to a nominalized re1- form which shows that the Nominalization must be after an initial Unlabelled re- is added. Similar examples (redownturn) can be found on the web, again allowing further prefixation: (38) a. re-downturn, pre-downturn b. pre-redownturn of the stock market This shows that the position into which out- moves cannot automatically mark it for [+N] (contra Roeper 1999). So what then is the status of outflow before re- attaches, if re- cannot attach to a noun? It must not be a noun, yet there is no verb *to outflow. It is this paradox that the notion of Unlabelled is ideally suited for. But we have now a new problem, why is it not possible to have: (39) *re-outlaw/*re-income/*repayee/ The notion of outlaw defines a person, not for instance a law, and therefore it is drifted in a particular way (cf. Bauke 2014 for similar arguments along this line). Likewise income refers only to money, not the compositional possibility of any kind of income (*the income of mail).6

6 One reviewer remarks here that it might be sufficient to rely on the semantics of the verb rather than on a morphological requirement on re- prefixation. By treating re- prefixation on a par with modification by again which leads to a restitutive reading the ungrammaticality of the forms in (39) can be explained simply by assuming that these do not constitute an event. We agree with the semantic assessment but would like to point out that the standard approach of treating reprefixation as an alternative of modification with again overlooks the fact that there are several re- forms which do not have a restitutive reading. Notice that to repurpose something does not mean to give it the same purpose again! Whereas to replant a tree can mean that the tree is planted in the same hole again or that the tree is uprooted and then planted in a different location. So this clearly shows that re- does not only have a restitutive but also a repetitive interpretation and we are arguing here that both version of re- allow recursion.

246 

 Leah S Bauke and Tom Roeper

The generalization here is: any drifted or idiomatic form resists further derivational affixes: (40) a. the frighteningness of his demeanor b. the challengingness of his proposal c. the bellowingness of his voice d. the lovingness of his voice e. the laggingness of his contingent in contrast to: (41) a. *the walkingness of his stick b. *livingness of these wages c. *the smellingness of these salts d. *the sellingness of his arguments In each of these instances in (41), some lexical drift has occurred and the compositional +ing reading is no longer maintained, blocking –ness. Thus (42) a living wage does not and cannot mean that the wage is alive but that it provides a financial basis for a typical form of living.

5 Idioms and the cognitive interface One approach to this sharp restriction might be to say that all idiomatic expressions must be phases. They therefore receive an interpretation, and phases do not tolerate Unlabelled Nodes. We could expand this claim to say: we associate the assigning of any meaning to the further requirement of a grammatical label. This says, in effect, that Labelling should be seen as an interface requirement, which therefore justifies Chomsky’s claim that Labelling occurs to satisfy interface requirements.7 In effect, then, meaning without Labels is either a root or an expressive. We assume that expressives have a direct emotional, hence non-compositional interpretation and syntactically are treated much like interjections (cf. Chomsky 2013).

7 We concede that possibly some minimal interpretation without a label seems to be conceivable though. For instance run as a root does not have a label but seems to have some independent semantic force (but cf. Borer 2013 for a very elaborate and different view).



How Unlabelled Nodes work 

 247

If there is a projection into the realm of cognitive representations, those representations have the requirement of assigning reference, property, or location to all expressions. It follows that: All entries into cognitve composition must have an interpretable Label. That is again, in order to undergo further combination under a natural notion of compositionality, they must meet the interface requirements of having a Label. We elaborate the representation of idioms below.

6 Further consequences Notice that we can, of course, also find the same situation that defies cognitive compositioin illustrated above in other contexts where verbs have specialized meanings. Just consider (43) in this context: (43) *uninhabit, *untouch, … The verbal forms are all ungrammatical with reversitive/restitutive un-, despite the very obvious counterexamples that come from a different group of verbs that allow this meaning, illustrated in (44) (44) undo, unravel, unleash, … Notice though that both can occur as (participial forms or) negative adjectives: (45) uninhabited, untouched, undone, unraveled, unleashed, … We will not discuss this any further at this point, but simply suggest that our analysis can also be extended to these forms somewhat along the line of the argumentation for the forms in (5) and (6) already.

7 Technical implementation of the meaning interface In what follows we will now suggest an approach that allows us to capture the observed relation between meaning assignment on the one hand and grammatical labeling on the other. Let us start by sharpening the question already discussed above. It is a standard assumption in morphology that (at least in languages like German and English) category changing morphology is to the right, where suffixes bring about a category change.

248 

 Leah S Bauke and Tom Roeper

(46) a. blindA+nessN-> blindnessN b. singV+erN->singerN It is not clear whether the affix itself carries the Label or a higher node is involved, as we argue below. There are only very few exceptions in English (defrost, decipher, encrypt…) which are all of Latinate origin and where the category of the base may be reanalyzed so that what looks like left-adjoining category-changing morphology can be reanalyzed as standard cases. Nevertheless, left-adjunction of course exists in abundance: (47) write - rewrite, perform - outperform, sleep - oversleep, …. What is striking about all the forms in (47), however, is that in all these cases the category of the base is maintained and only the meaning of the the base is changed productively by prefixation. This follows naturally from the observation that prefixes cannot change category. What we can further observe here is that not only nominal (*rehusband, *reincome) but also adverbial and adjectival bases cannot be combined with the verbal prefix re-: (48) *resick, *requickly, … Whereas the reverse scenario, i.e nominal prefixes on adjectival, adverbial and nominal bases are unproblematic: (49) supersick, megaincome, ultraquick(ly), … We now shall address these technical matters in terms of a set of challenges which restate more technically the observations above. The first challenge is to capture the fact that prefixes that are the result of leftward movement, have the potential of changing the category (discussed with extensive further examples in Roeper (1999)): (50) a. flow outv -> outflowN b. break outv-> outbreakN c. comes outv -> outcomeN Once again, there are no verbs: *to outflow, *to outbreak, *to outcome, … etc. This cannot be attributed to the properties of out though, because we can observe out being used productively on verbs in a number of places (some of them already mentioned above): (51) a. outrun b. outperform c. outkennedy Kennedy



How Unlabelled Nodes work 

 249

Moreover, this is not a property of out alone, we can observe it with a number of other prefixes as well: (52) a. intake, inlet, instep, … b. upgrade, upsurge, upraise, … c. downpour, downtone, downfall, … d. overflow, oversize, overreach, …. The second challenge is that the leftward movement operation of the particle blocks an AGENT-reading both implicitly and explicitly. In all of the following the agentive -er suffix is blocked: (53) *outbreaker, *overflower, *incomer, *outcomer, *downpourer, *outlooker, *upstarter,…. Notice that the leftward movement operation is not obligatory. The particle can also remain on the right and nominalization is still a possibility, as is illustrated in the following pairs: (54) a. breakout/outbreak b. setup/upset c. start-up/upstart d. hangover/overhang e. passover/overpass f. turn-down/downturn g. lookout/outlook h. payback/backpay … Notice also that when the particle remains on the right, agency or at least an active event participation is still possible, exemplified here for the particles out (cf. 55), in (cf. 56), up (cf. 57) and over (cf. 58): (55) lookout, knockout, walkout, lockout, cookout, workout, burnout, fadeout, dropout, blowout, handout, strikeout, carryout, takeout, … (56) break-in, sit-in, walk-in, … (57) break-up, lockup, workup, … (58) stopover, pushover, holdover, sleepover,… A number of things can be observed with these examples. First of all, it is worth noticing again that there are no verbs *to outcome, *to outcook and *to outknock, *to outhand, … For some of these, the nominal pairs of course exist (cf. examples

250 

 Leah S Bauke and Tom Roeper

in (54) above), but there are a number of forms, where the nominal forms also diverge from the pattern. Notice, for instance, that for workup there is no nominal form *the upwork, and similarly for *uplock, *upbreak etc. And additionally, it is important to notice the distinction between sleepover and oversleep, where the person interpretation is not possible and an Event interpretation emerges instead. So the challenge that arises from these data clearly is to find a coherent explanation for why the AGENT-reading is blocked under leftward movement. The third challenge emerges from the following data: (59) a. re-outflow of capital b. reoutbreak of ebola c. redownfall of womankind d. redownturn on Wall St. e. pre-redownturn, … All of these forms are (somewhat surprisingly) well-formed. Under the assumption that outflow is a nominal form, where leftward movement has taken place, where the AGENT-reading is blocked and where all the characteristics discussed above hold, it would be expected that re-outflow is ungrammatical, for the same reasons that *rehusband and *reincome are. Our argument is that out is not labelled initially and that re- attachment is still possible, before labeling took place. We will sharpen this intuition below. The fourth challenge is closely related to the third and can be regarded as a follow-up. We therefore formulate it in the form of a single question here: If reoutbreak and reoutflow are indeed grammatical and attested, why then are, *a reoutlet and *a repermit all ungrammatical (and notice, that a repermission is again fine)? Let us now see how we can account for all four puzzles. The basic model we use is a very simple one in which structure building proceeds via simple merge (Chomsky 2013, Kidwai & Mathson 2006). Merge as such is a recursive process that produces unordered concatenation (Hornstein 2009). We assume that interpretation is by lexical phase (a concept to be specified below) and we will focus on where exactly labeling takes place, i.e. whether it is part of the recursive process of concatenation or whether it is only relevant at the interfaces for interpretation. We assume the following verbal syntax template for the structures to be discussed: (60) preverb _ verb postverb _ AGENT verb complement So for example for the verb break out we assume: verb, AG, particle,(THEME) all unordered.

How Unlabelled Nodes work 



 251

This allows us to derive break out as well as outbreak and the distinctive properties of the two forms via a simple syntactic process of projection, as we will see shortly. Productive morphology and simple merger allows us to derive the following patterns for Agent - Verb - Theme and NP - V - NP configurations (cf. also Carlson and Roeper 1981): (61) a. out:

John outran Bill Clinton outkennedyed Kennedy

b. re:

Fred rethought the issue

c. over:

John overplanned his vacation

d. -able: the house is makeable e. comp: mother-helping f. mis: as above g. pre: as above h. under: as above

[compare: *Fred ran Bill] [compare: *Clinton kennedyed Kennedy] [compare: *Fred thought the issue] [compare: *John overplanned to go ] [compare: *the house is makeable big] [compare: *motherhelping to clean up]

So we can see that clitics/particles in preverbal position have immediate impact on the grammaticality of the overall verbal structure. They impose a default transitive template. Note also, that this is a recursive process in leftward building structures, as can be seen from the following examples (62), but not (63): (62) re-rewrite, over-overinterpret, re-overinterpret, … (63) *strangenessness, *followup up, Apparent rightward recursion is then best understood as raising to a series of higher leftward nodes (cf. 64), each of which satisfies the change of category requirement for derivation that is ultimately on the “right” (coffee-maker-maker): (64) a. coffee-maker-maker b. from [make +er[make coffee +er]      very (far) c. How far did you throw the ball… Here we find that precisely the labeled maximal projection can be subject to movement as in (68b), but not the unlabled particle as in (68a), although one could imagine that the wh-word how naturally asks for a Degree Phrase as in (68c). The topic deserves closer analysis. Here we just note that in present day German and older stages of English forms within the lexicon exist, which are limitedly productive in German (only wo) and were productive in English, such as: (69) a. wherein, wherefore, whereas, whereto, wherefrom b. womit, wofür, worüber, wodurch, wovon, wozu, wohin.... Note that in German the forms in (69b) are all separable. On the other hand the forms in (70) are all ungrammatical: (70) a. *wiemit, *wievon, *wieher b. *warummit, *warumfür c. *werfür, *wermit



How Unlabelled Nodes work 

 253

We can argue that these forms are in principle possible in the lexicon precisely because root merger does not require lexical labels. In turn, we conclude that it is the exactly this absence of Labels which stands at the origin of why we cannot form questions like: (71) *how away did you drive the car? The claim that core forms of movement are excluded is an important dimension of defining the representational basis for successful movement and a theory of Labelling. This perspective has existed since the origins of generative grammar where structural descriptions defined operations in terms of labelled rewriting rules. So let us now return to the question of why exactly *how out is impossible. We have seen one unlabeled merge partner in constructions like break outV which begins with a verb break that merges with an unlabeled particle to form the verb break out. It can undergo further derivation and become a noun. However the unlabeled particle out cannot link to an unlabeled how. We are asserting that a functional element can also be unlabeled. We build upon the arguments of Cechetto and Donati (2010, 2015) that wh-words can be ambiguous between Q and D which at some point in a derivation allows the ambiguity between headless relatives and indirect questions (Cf. Clauss (2015) for discussion (GALA presentation)): (72) a. I read what you read = relative b. I know what you read = embedded wh-question We interpret the ambiguity to mean that at the Merge point what is unlabelled (cf. Ceccetto & Donati 2010 for detailed exposition). That suggestion fits the notion that the wh-word carries either D or Q, but has not resolved which will project. The resolution of labeling occurs in the process of embedding the phrase in a CP or a DP environment. Therefore we can make the natural and strong claim in (73): (73) Two Unlabelled Nodes cannot merge This perspective echoes the claims of Narita (2014) that phase formation requires a labeled head linking to a maximal projection. Again, this can be explained by the unavailability of movement unless the entity is labelled. Hence, for our examples here the particles cannot be A’- or A-moved unless a label is assigned. We suspect that something similar is going on in idioms on the basis of the following data: (74) a. *ball was played by John b. Baseball was played by John

254 

 Leah S Bauke and Tom Roeper

c. Chess was played by John d. *love was made by John e. headway was made by john

Play baseball, play chess, make headway all have an idiomatic interpretation, which is probably associated with an intermediate labelling step, that forces the idiomatic reading and rules out compositional interpretations as in play ball, make love, etc. which can all go unlabelled but which, in consequence, are barred from leftward movement.8 The next question that emerges is: what then is different about how far and how up? In how far the Label of Degree P on far allows the phrase to acquire a Degree Label. When it merges with a higher verb, then the labeling algorithm chooses either D or Q which is carried by what, even though it does not project a label until it is merged under agree with the higher verb. Up cannot merge with how, however, because up cannot provide a phrasal label. We do not provide a full definition of a labeling algorithm, but instead we are beginning to project the outline of such an operation by observing what constraints it must obey. We think building such an algorithm step by step in compliance with observations with empirical data is a good approach. Notice finally that mixed forms in which one particle is moved to the left and the other stays in postverbal position are not licit, as is illustrated by the following contrasts: (75) a. re-overturn b. *re-turnover c. re-upend d. *re-endup e. re-overwrite f. *rewrite over So the crucial question here is how the lexical content projects onto labeled lexical entries. We will further develop here the ACP of Keyser and Roeper (1992) introduced above (cf. Bauke 2012/2014 for recent discussion and a revised implementation). Thus we argue that the particles are base generated in a clitic position to the right

8 A further differentiation is necessary for Unlabelled Nodes. Those that attach at the root can remain Unlabelled and serve as expressives or interjections in the sense of Chomsky (2013): gee, well, etc. In addition, in acquisition, the first step appears to be an Unlabelled adjunction, which is then either left as a root-attached expressive, or functions in the systems we outline here (see Roeper 2014).

How Unlabelled Nodes work 



 255

and we additionally assume that the subcategorization reflection of -er as Agent is a default on the left as is schematically outlined below: (76) Subject Verb Complement -er (Agent) Particle (re/out/….) So in a sense, the lexical entry is unordered at two levels: (77) N -er {flow out/re} -> out/re moves into the Agent -er position and re- and out start in the same clitic position. This provides a straightforward explanation for why the following are incompatible: (78) *regiveup/*rethinkout/*remissout/*overdrinkup, … That is because the re- cannot originate in the righthand clitic position if it is already occupied by –out. What we are saying then is that the unordered merger of out and flow needs to be linearized in a first step into outflow as follows: (79) 

U

out

U flow

This provides an ordering, but it does not provide a label yet. In fact before labeling applies, a second linearization procedure upon the merger of re- is possible9: (80)

U re

U out



U flow

9 Notice that we thus take linearization to be independent of labeling. A linear order can be established by mapping c-command relations onto linear precedence and this mapping must include labelled and unlabelled nodes. So the ordering of the particles is not so much a linearization issue as it is one of the modification of syntactic structure.

256 

 Leah S Bauke and Tom Roeper

Again, linearization provides the order of the particles on the left, but it does not provide a label. It could in fact therefore continue, generating the aforementioned recursive structures exemplified again in (81): (81) re-re-outflow Labeling is a separate operation in a second step which then labels the whole structure as N, which then occupies a new node above the whole complex derived so far: (82)

N U re

U out

U flow



When we now turn to AGENT-nominalizations the following picture emerges: (83) {-er, sing} is linearized as singer {-er, reoutflow} cannot be linearized as *re-outflow-er (84) V sing

N er

For the alternative ordering where -er and all particles remain on the right this means: (85) (-erAG) {break out} -> linearize which however also leads to ungrammaticality (*breakouter) We will now seek to extend our system to capture all of the facts about blocked AGENT and blocked –er for both types of particle incorporation: So for the forms outbreak/*outbreaker and breakout/*breakouter. We recall that: a) outbreak blocks the AGENT projection altogether – no Agent is implied. It follows that *outbreaker is blocked. b) Breakout does not block the AGENT projection from the verb, and AGENT is implied, c)  Nevertheless *breakouter is blocked so that the implied AGENT cannot be expressed. The judgements here are sharp and therefore a simple system should capture all the facts.

How Unlabelled Nodes work 



 257

To capture all the facts we begin with the assumption of a VP internal subject that therefore carries the AGENT projection. So the following picture emerges: (86) +N VP



+AG [+er]

V

We argue that out- moves into the AG position, which blocks the AGENT, but does not require the higher N label to be present, therefore it remains Unlabelled, and allows re- or re-re- to be further added (recall [re-re-outflow].) Now we begin from a different position to derive the fact that the particle blocks the expression of the AGENT role even if it is present on the verb and projected by the VP as above. We note that it is blocked in all cases: (87) a. *outbreaker b. *walker up c. *walk upper Now let us argue that in order to linearize –er, it must also occupy the clitic position, which is not clear from cases like singer alone. But this claim immediately explains: *breakouter, as can be seen from a comparison of the structures in (88) and (89): (88) N vP AG –er

V break

U out

(89) N vP AG –er

V break

U out er

258 

 Leah S Bauke and Tom Roeper

Therefore the –er is blocked again. So *outbreaker and *breakouter are blocked in quite separate ways, which accords with the fact that breakout retains Agency. Evidence that it is the Unlabelled nature of the clitic position that plays a critical role can be derived from some further facts. Under this view, one might expect that if the particle is moved to an extraposed position, then the –er ought to be free to occur. We find that there is a difference linked to the Unlabelled status. Consider the following contrasts: (90) a. *the picker of the paper up b. *thrower of money away If we assume that we moved the particle to the final position it still is linked to the ACP position. The movement of the particle does not change its unique meaning and therefore its link to the particle position must be maintained by a trace. If it is expressed by a full maximal projection, then it has an independent Label, and then no trace is required. Predictably the –er is much better in (91a) than in (91b): (91) a. ?the lifter of the poor all the way up out of poverty is a noble man b. **lifter of the poor up is a noble man (91a) is not perfect but vastly better than (91b). These facts, in turn are echoed in the potential for passives. Unlabelled Nodes are not eligible for passive or topicalization formation, and therefore the same contrast arises, as we discussed above for particles (compare (91) and (92)): (92) a. ?all the way up out of poverty they were lifted by his unusual generousity b. *up they were lifted by his generosity. We have now extended our system, making pivotal use of Unlabelled Nodes to account for all of the facts. A modern perspective helps: if the verb raises, then the particle must still be in the ACP position. How about the complex particle all the way up. We must assume that it has moved out of the ACP to a MP even if nothing is moved over (he got all the way up).10

10 A different way of phrasing this is that complex particles are maximal projections and the verb is moved away from them, as illustrated in the example above. The same can also be observed in German: er har die Tür nicht ganz zugemacht.

How Unlabelled Nodes work 



 259

So let’s summarize: Our suggestion is that labeling is ‘delayed’ in these verb particle cases: (93) flow

out

Merger of the two roots flow and out is free. Categorization takes place in the next step of the derivation and provides for the verbal interpretation: (94)

v v flow

out

This only labels the higher node. The lower node remains unlabeled (and presumably also unlinearized). Remerger of flow in higher functional projections, i.e. T is still possible which then leads to fixed categorization (and linearization). Alternatively, out can be remerged outside the verbal projection: (95) v out

v

flow

out

Nothing requires imminent labeling. Thus further attachment of re- is possible. (96) re

v out

flow

v out

Now the situation is similar to the scenario above (linearization of outflow) in so far as again, unlabelled nodes emerge in the course of the derivation, as a consequence of free (re)merge. This is unproblematic as long as labeling upon

260 

 Leah S Bauke and Tom Roeper

spell-out and linearization is fixed. This is done when a further categorizing node is added: (97)

n n

re

v out

flow

v out

One immediate consequence of unordered, unlabeled root merge is that we can now integrate idiom formation. As discussed above, criteria for lexical insertion are: a) specific meaning b) label If roots are merged but remain unlabeled no compositional meaning can be computed. If roots are linked to morphology (particles, affixes, categorizers, and possibly others) compositional meaning is available (cf. Bauke 2012/2014; Bauke & Roeper 2012). Idioms arise when there is no syntactic label derived from the roots, as can be seen from the following frequently discussed examples. Notice though, that at this point we remain non-committed on the question of how far this parallel can be pushed. It might turn out that a distinction between different types of idioms e.g. those that resemble compounds and phrasal idioms, which presumably contain more functional structure, is necessary (cf. Bauke 2015 for further explanation): (98) turncoat = traitor, income, incoming a.

N

b.

turncoat



α turn

N

c.

income β coat

α in

A

in β come

V come

prog ing

In sublexical phase-theory terms, labels here determine interpretive phases. In a sense then this is moving away from the discussion of whether every merger



How Unlabelled Nodes work 

 261

instantiates a phase or whether phases are designated points in the derivation. Phases are rather stages in the derivation at which Labeling is absolutely required. This does not exclude basic merge of roots, it just means that the projection and interpretation of this word will be non-compositional (cf. Bauke 2014 for further exposition of this in the context of nominal root compounding and Roeper & Bauke 2012 for examples of gerund nominalization).

8 In conclusion There is a long-standing claim that morphology subcategorizes the category to which it attaches (re- => on verb, -ness => on Adjectives, -er on verbs). It predicts the ungrammaticality of *rehusband but leaves a form like re-outflow unexplained. We explain the exception by claiming that during the derivation prefixes like out- and re- can be Unlabelled. The interaction of prefixation, movement, and Labelling can explain a wide range of facts, including the absence of *outbreaker and *breakouter. The non-existence of forms like *a re-outlaw would seem to contradict these claims. We propose a link between idiom formation and obligatory categorization to explain the exception to the exception. Three important theoretical observations emerge from the preceding investigation: a. Extremely variable grammaticality in productive morphology can be explained by increasing the abstraction of syntactic principles in the lexicon b. The interplay between Labeling and interpretative phases captures the domain of idioms c. The system captures recursion, the hallmark of syntax. We have shown that the concept of Unlabelled Node is both natural and necessary in the lexicon, that is, derivational morphology. Its presence there not only reinforces the role of syntactic principles in the lexicon, but provides perhaps the first domain where the UG operations needed in less obvious ways in syntax have a robust presence in the lexicon. The notion that operations of movement define lexicon, syntax, and LF formation in similar ways should be seen as strong evidence in their behalf.

References Alexiadou, A., Stavrou, M. & Hageman, L. 2008. Noun phrase in generative perspective. Berlin: De Gruyter. Bauke, L. 2014. Symmetry breaking in syntax and the lexicon. Amsterdam: John Benjamins.

262 

 Leah S Bauke and Tom Roeper

Bauke, L. 2015. Content matching in idioms and compounds: A comparative analysis. Paper presented at BCGL8 Grammar and Idioms, Brussels. Borer, H. 2005. Structuring sense. Vol. 1+2. Oxford: Oxford University Press. Borer, H. 2013. Taking form. Oxford: Oxford University Press. Carlson, G. & Roeper, T. 1981. Morphology and subcategorization: case and the unmarked complex verb. In Hoekstra, T., van der Hulst and M. Moortgat (eds.) Lexical grammar, 123–164. Foris: Dordrecht. Chomsky, N. 2013. Problems of projection. Lingua 130. 33–49. Cecchetto, C & Donati, C. 2010. On labeling principle C and head-movement. Syntax 13, 3. 241–278. Cecchetto, C & Donati, C. 2015. Relabeling. Cambridge, MA: MIT Press. Clauss, M. 2015. Free relatives in acquisition – A case of over-generalization. GALA Presentation. Epstein, S., Kitahara, H. & Seely, D. 2016. Phase cancellation by external pair-merge of heads. The Linguistic Review 33(1). 87–102. Grimshaw, J. 1990. Argument structure. Cambridge MA: MIT Press. Halle, M., and A. Marantz. 1993. Distributed Morphology and the pieces of inflection. In K. Hale and S. J. Keyser, (eds.) The view from building 20: Essays in linguistics in honor of Sylvain Bromberger, 111–176. Cambridge, MA: MIT Press. Harley, H. and R. Noyer. 2000. Licensing in the non-lexicalist lexicon. In Bert Peeters, (ed.), The lexicon/encyclopaedia interface, 349–374. Amsterdam: Elsevier Press. Harley, H. 2005. Bare phrase structure, acategorial roots, one-replacement and unaccusativity. In S. Gorbachov and A. Nevins, (eds.) Harvard Working Papers in Linguistics Vol. 11, Cambridge, MA: Harvard Linguistics Department. Hornstein, N. 2009. Theory of syntax. Cambridge: Cambridge University Press. Kayne, R. 1994. The antisymmetry of syntax. Cambridge MA: MIT Press. Keyser, S. & Roeper, T. 1992. Re: the abstract clitic hypothesis. LI 23. 89–125. Kidwai, A. & R. Mathson. 2006. Relating to relatives: The cyclicity of SIMPL. Jawaharlal Nehru University manuscript. Marantz, A. 2007. Phases and words. NYU manuscript. Moro, A. 2000. Dynamic antisymmetry. Cambridge MA: MIT Press. Narita, H. 2014. Endocentric structuring of projection free syntax. Amsterdam: John Benjamins Narita, H. & Fukui, N. to appear. Symmetry driven syntax. Postal, P. 1979. Some syntactic rules in Mohawk. New York: Garland. Pustejovsky, J. 1995. The generative lexicon. Cambridge MA: MIT Press. Roeper, T. 1995. Lexical specification and lexical insertion. In M. Everaert and J. Grimshaw (eds.), Inherent binding and the syntax/lexicon interface: Distinguishing DP, NP, and N. Amsterdam: Benjamins. Roeper, T. 1999. Leftward movement in morphology. In V. Lin, C. Krause, B. Bruening and K. Arregi (eds), Morphology and syntax cycle two. MITWPL 34. 35–67. Roeper, T. 2014. Strict interface principles and the acquisition engine: from unlabeled to labeled and minimal modular contact. Language Sciences. 1–14. Roeper, T. & Siegel, M. 1978. A lexical transformation for verbal compounds. LI 9. 199–260. Schwarzschild, R. 2011. Stubborn distributivity, multiparticipant nouns and the count/mass distinction. In Brian Smith, Suzi Lima, Kevin Mullin (eds.), Proceedings of NELS39, 661–678. Williams, E. 1994. Thematic structure in syntax. Cambridge MA: MIT Press.

Andreas Blümel

Exocentric root declaratives: Evidence from V2 Abstract: In this paper I argue that V2-structures in (mostly) Germanic provide evidence for the necessity to avoid labeling in root (-like) contexts, i.e. they instantiate systematic and obligatory failures of the two strategies to label XPYP-structures suggested in Chomsky 2013, 2014: In declarative contexts, the prefield must be occupied to prevent specification of category of α = {XP, CPV2}. I show how this approach not only makes sense conceptually, but elegantly solves riddles of prefield-occupation (“Vorfeldbesetzung”).

1 Introduction A recent effort in syntactic theorizing is to rid the grammar of endocentricity and – stipulative X-theoretic residues. According to Chomsky (2013, 2014) the cases that do speak in favor of headedness can be made to follow from Minimal Search, i.e. efficient detection of the structurally closest head, ultimately a Third Factor principle. Within that approach two strategies exist to endow structures of the form XP-YP with a label, i.e. structures, where detection of the closest head fails (as in the trivial case Head-XP): (a) symmetry-breaking movement turns one member into a discontinuous element such that the in-situ element is detected, (b) both X and Y share a prominent feature and have undergone agree, so that the common feature provides the label. The aim of this paper is to defend the view that at least a third strategy exists and that grammars exploit it systematically. Central to this paper are two related claims: (1)

a. declarative root clauses must remain labelless b. and prefield-occupation in V2-languages is one strategy to ensure this.

I couch both these claims in Chomsky’s recent theory of labeling and thereby show how an interplay between unconstrained Merge,1 Third Factors2 and a reasonably explicit hypothesis about the Conceptual-Intentional interface – i.e. (1) – delivers a solution to one part of the syntactic puzzle known as V2.

1 Chomsky 2004 et seq, Boeckx 2014. 2 Chomsky 2005. DOI 10.1515/9781501502118-011

264 

 Andreas Blümel

That said, let me point out the proviso that in this paper, I will not be concerned with the question of V-in-C, i.e. the fact that the finite verb ends up in the so-called left sentence bracket or C in V2 clauses. I here rely on the idea that the finite verb V-to-C-moves to assume the finiteness features on C (cf. Roberts 2004 and Eide 2007 inter alia). Intuitively, it is this formal movement which endows clauses with illocutionary force in languages that have this property. A full account of V2 has to address this other half of the conundrum and I hope to find a satisfactory solution to this problem in the future. Claim (1) is not novel if stated in different terms:3 Emonds (2004, 2012) proposes that root clauses in general are categorially unspecified, comprising ‘a-categorial shells.’ In the current analysis I side with him in taking (1) to be a significant fact which holds the key to many of the cross-linguistic root properties we know – of. However, I depart from his proposal in that I reject the x-theoretic machinery, including its stipulations, and in that I take (1) to be an interface condition, which languages meet in different ways. In other words, the way (1) manifests itself across languages is not in Emonds’ static universal category-neutral projection but by means of different syntactic and derivational mechanisms (cf. below for specific suggestions). Let me point out that in addition to what one might consider a mere difference in ‘style’ of analysis, Emonds’ implementation of root phenomena in terms of a Discourse Projection arguably violates Full Interpretation.4 Be that as it may, I take the fact that (1) has independently been suggested to be promising: (1) – or something close to it – might turn out to be a genuine interface property. The empirical focus of this paper is narrowly confined to three salient properties of V2 contained in the following generalization: (2) Prefield-Occupation (central puzzle): In German declarative root(-like) contexts at least and at most one XP must occupy the position before the finite verb (V2 – the position of the left sentence bracket)

3 Its specific restriction to declarative clauses might be. Emonds’ seems to defend a stronger thesis which extends labellessness to all clause types. Since other clause types too exhibit root/ non-root asymmetries, (1.a) might turn out to be too weak and Emonds’ stronger view might have to be invoked. As for V2, much depends on the nature of V-in-C which is confined to root contexts (excluding Icelandic and Yiddish). 4 Chomsky (1986a:88–99): Full Interpretation: “every element of PF and LF, taken to be the interface of syntax (in the broad sense) with systems of language use, must receive an appropriate interpretation – must be licensed [...]. None can simply be disregarded.” Emonds’ discourse shells provide landing sides for movement in V2-clauses, but its head is not associated with a particular meaning. The problem is exacerbated if no movement to the discourse shells takes place as in simple English declarative root clauses: the discourse shells receives no interpretation at either PF or LF.



 Exocentric root declaratives: Evidence from V2 

 265

A declarative root clause with an XP in the prefield is shown in (3.a). Omitting the phrase leads to ungrammaticality as (3.b) shows.5 Unless indicated otherwise, all the examples of this paper are from German. (3) a. Der Jens hat der Maria ein Buch the Jens has the Mary a book ‘Jens gave Mary the book. (as a present)’ b. *Hat der Jens der Maria ein Buch geschenkt.

geschenkt. given (assertion)

The second prominent feature of (2) is the lack of specificity of the prefield-XP, i.e. its promiscuity: a phrase of any category can do the job of occupying the prefield: (4) a. [DP Maria] hat tDP den Mann gestern gesehen Mary has the man yesterday seen ‘Mary has seen the man yesterday.’ b. [AdvP gestern] hat Maria den Mann tAdvP gesehen yesterday has Mary the man seen c. [VP den Mann gesehen] hat Maria gestern tVP the man seen has Mary yesterday d. [CP+fin dass die Sonne scheint] hat Maria tCP gesagt that the sun shines has Mary said ‘That the sun shines, Mary said.’ e. [CP-fin die Scheibe einzuschlagen] hat Maria tCP beschlossen the window to-crush has Mary decided ‘Mary decided to crush the window.’

5 The structure is fine as a yes/no-question, which is accompanied by a rising intonation towards the end of the sentence. It also has a narrative inversion parsing, for which however, there is evidence for an empty operator in the prefield, cf. Zwart 1997:217ff. Thus insofar as that kind of analysis is on track, narrative inversion is V2. Imperatives and a variant of conditionals also features verb initial structures. Finally, topic drop deserves to be mentioned, which is superficially verb initial. In these structures there is strong evidence for a silent syntactic object in the prefield (such as island sensitivity).  Brandner (2011) rightly points out that V2-clauses like (3.a) can have a yes/no-interpretation if accompanied by a rising intonation. (Intuitively, the reading differs from a regular yes/no-question in verb-initial structures in that an incredulity appears to be conveyed. The reading might be the yes/no-counterpart of an echo wh-question.) In this sense, prefield-occupation does not unambiguously determine semantic interpretation. For the purposes of this paper, I will abstract away from this complication.

266 

 Andreas Blümel

f. [PP über den Wolken] muss die Freiheit tPP wohl grenzenlos sein above the clouds must the freedom ptcl limitless be ‘Freedom must be limitless above the clouds. g. [AP schön] ist Maria tAP beautiful is Mary ‘Mary is beautiful.’ Uniqueness may function as a term for the final significant feature of V2: No more than one phrase/XP may occupy the prefield (5). (5) *Der the

Jens Jens

der the

Maria Mary

hat has

ein a

Buch book

geschenkt. given

So the three explananda of prefield-occupation are (6) a. “promiscuity” of the prefield: a phrase of any category b. obligatoriness: at least one phrase (in declarative root contexts) c. uniqueness: at most one phrase I set out to show that a labeling-based account not only accounts for these explananda, but also naturally links them to the root property of V2 and is thus preferable to alternatives. This paper is structured as follows. In section 2 I re-establish the root character of V2 in German. Section 3 briefly sketches the standard analysis of V2 and identifies some of its shortcomings. In section 4 I present my new, labeling-based analysis. In section 5 I address some implications and questions the present analysis gives rise to. In particular, I address the question which alternative derivational mechanisms non-V2 languages might employ to cleanse root declaratives of labels. Finally I summarize the paper in section 6.

2 V2 – A Root(-like) phenomenon 2.1 Asymmetries between V2-clauses and V-final clauses According to a widespread generalization, V2 is invariably a root phenomenon, i.e. it is the topmost main clause or has distinctively root-like properties, respectively. This hunch has been around at least since Joe Emonds’ work on root transformations and den Besten 1977, and has extensively been argued for by Marga Reis. In the following, I present a small subset of contrasts observed between V2-clauses and uncontroversially subordinated, properly selected clauses, introduced by the complementizer dass ‘that.’ Most of the observations stem from Reis’ work (Reis 1997).



 Exocentric root declaratives: Evidence from V2 

 267

To begin, once the finite verb is in the C-position, the clause cannot be embedded (7), contrasting with V-final variants (8).6 This holds for all clause types (declarative, wh-question, yes/no-question): (7) a. *Maria hat {vergessen/geleugnet...} Hans bringt die Monika  Mary has {forgotten/denied...} Hans brings the Monika  nach  Hause.  to  home  ‘Mary {forgot/denied...} that Hans brings Monika home.’ b. *Maria hat sich gefragt, wen bringt der Hans nach Hause.  Mary has REFL asked who brings the Hans to home  ‘Mary wondered whom Hans will bring home.’ c. *Maria hat sich gefragt, bringt der Hans die Monika  Mary has REFL asked brings the Hans the Monika  nach  Hause.  to  home  ‘May wondered if Hans will bring Monika home.’ (8) a. Maria hat {vergessen/geleugnet...} dass Hans die Mary has {forgotten/denied...} that Hans the nach Hause bringt. to home brings. ‘Mary forgot/denied that Hans brought Monika home.’ b. Maria hat sich gefragt, wen der Hans nach Mary has REFL asked how the Hans to Hause bringt. home brings. ‘Mary wondered whom Hans brought home.’

Monika Monika

6 Viola Schmitt (p.c.) brought to my attention that it is possible to coordinate what looks like conjunction of dass- and V2-clauses (i), thus suggesting, prima facie, that dass-clauses and embedded V2 are on a par: i. Peter hat gemeint, Peter sei dumm und dass Maria schlau sei. There seems to be an ordering restriction on the conjuncts, cf. (ii), which looks as if it raises the issue how my account deals with coordination of labeled and unlabeled categories: ii. ??Peter hat gemeint, dass Peter dumm sei und Maria sei schlau. It is not clear to me, however, that what is coordinated in (ii) is two embedded clauses. It is conceivable, for instance, that the conjunct in (ii) conjoins two main clauses the second of which is mainly elliptical along the lines of the analysis of right dislocation in Ott & de Vries 2015. The dass-clause is then the counterpart of right dislocated constituents in their analysis. This approach accounts straightaway for the deviance of (i) in that a V2-clause cannot be fronted within the second elided main clause or occupy its prefield – a precondition of ellipsis.

268 

 Andreas Blümel

c. Maria hat sich gefragt, ob der Hans Mary has REFL asked if the Hans nach Hause bringt. to home brings. ‘Mary wondered if Hans brings Monika home.’

die the

Monika Monika

Moreover, the correlate pronoun es is impossible when associated with a V2 embedded clause (9.b), but perfectly fine when associated with a V-final embedded clause (9.a): (9) a. Peter hat es {geärgert/gesagt/vergessen...} dass die Sonne Peter has it {annoyed/said/forgotten...} that the sun scheint. shines b. *Peter hat es {geärgert/gesagt/vergessen...} die Sonne  Peter has it {annoyed/said/forgotten...} the sun  scheint.  shines V2-clauses cannot appear in subject position (10), again, unlike verb-final finite clauses: (10) {Dass die Sonne {that the sun beeindruckt. impressed

scheint/*Die shines/the

Sonne sun

scheint} shines}

hat has

die the

Leute people

Let me end this section by noting that Dutch V2-clauses cannot be embedded under bridge verbs,7 which is remarkable, given that the syntactic properties of the two languages are otherwise very similar: (11) *Piet zei Frits moest gisteren  Peter says Fritz must yesterday  ‘Peter says Fritz had to cry yesterday.’

huilen. cry

(Dutch)

7 Fitting is the fact that neither weil-V2 nor N+V2 exist in Dutch, as Ad Neeleman (p.c.) informs me.



 Exocentric root declaratives: Evidence from V2 

 269

2.2 Seemingly embedded contexts of V2-clauses The purpose of this section is to review phenomena in which V2-clauses are embedded and by the same token to highlight asymmetries to their verb-final counterpart. German seems to be one of the few among the V2-languages where the idea that embedded V2-clauses are directly selected is even an issue. Next to empirical arguments which I will review in this section (Reis 1997), I hope to show by means of this paper that strong conceptual points speak in favor of treating V2-clauses as invariably root-like, whatever the formal analysis of “embedding” turns out to be. As is well-known, it is possible to embed V2 declarative clauses in the context of so-called bridge verbs: (12) Maria hat {gemeint/gesagt/geäussert ...} Mary has {meant/said/uttered...} nach Hause. to home ‘Mary said that Hans brings Monika home.’

Hans Hans

bringt brings

die the

Monika Monika

Also, given subjunctive mood, V2-clauses may appear as what looks like complements of nouns or in the prefield: (13) a. D  ie Behauptung Hans {bringe/*bringt} die the claim Hans bring-SBJC/brings the Hause, liess sich nicht halten. home was REFL not tenable b. Die Sonne gehe unter meinte Monika. the sun goes under meant Monika ‘That the sun sets Monika said.’

Monika Monika

nach to

However, there are still asymmetries between dass-clauses and V2-clauses preceding the finite verb in C: pronouns in V2-clauses, which occupy this position cannot be bound by quantifiers in the main clause (14.b) – again, in contrast to the dass-counterpart (14.c) (Reis 1997:139): (14) a.  Jederi möchte gern glauben, eri sei unheimlich  everyone wants happily believe he be uncannily  beliebt.  beloved  ‘Everyonei wants to believe that hei is very much liked by others.’ b. *Eri sei unheimlich beliebt, möchte jederi gern glauben.  hei be uncannily beloved wants everyonei happily believe

270 

 Andreas Blümel

c.  Dass eri unheimlich  that he uncannily  glauben.  believe

beliebt beloved

sei, be

möchte wants

jederi everyone

gern happily

The contrast is reinforced by embedded V2-clauses preceding the main clause finite verb in C, which feature weil ‘because’ and obwohl ‘despite,’ both of which are impossible (15.b/16.b) in contrast to the option of having embedded V-final clauses in this position (15.a/16.a):8 (15) a.  Weil Fritz so gerne Eis ißt, gehen wir jetzt in die  because Fritz so gladly ice eats go we now in the  Stadt.  city  ‘We go into the city now, because Fritz enjoys eating ice cream so much.’ b. *Weil Fritz ißt so gerne Eis, gehen wir jetzt in die Stadt. (16) a.  Obwohl Fritz so gerne Eis ißt, gehen wir jetzt in die Stadt. b. *Obwohl Fritz ißt so gerne Eis, gehen wir jetzt in die Stadt. From the observation of variable binding asymmetries in (14) Reis concludes that in cases like (13.b) the embedded V2-clause does not occupy SPEC-CP of the host clause, but involves a V1 parenthetical. The embedded V2-clause is thus not properly contained in the prefield at all (cf. also Webelhuth 1992:89, fn. 59). The observation is, again, corroborated by extraction asymmetries – between verb final dass-clauses which allow long-distance A-movement (17.a) and V2-clauses which do not (17.b) (cf. Reis 2002; long-distance ­extraction out of dass-clauses is subject to well-known dialectal and idiolectal variation): (17) a.  Was glaubst du dass er lesen  what believe you that he read  ‘What do you think he should read?’ b. *Was glaubst du, er sollte lesen?

sollte? should

8 V2-clauses with denn ‘because’ cannot occupy the prefield either; they have no verb final counterpart: iii.   Wir gehen wir jetzt in die Stadt, denn Fritz ißt so gerne Eis,.  iv.   *Denn Fritz ißt so gerne Eis, gehen wir jetzt in die Stadt. Cf. Wegener 1993, 2000 and Steinbach & Antomo 2010 for a survey and semantic differences between these clause types.   9 Thanks to Gereon Müller for querying me about this point. The ban on weil/obwohl-V2-clauses in the prefield in my view speak support the idea that embedded V2-clauses do not occupy SPEC-CP.

 Exocentric root declaratives: Evidence from V2 



 271

The morale of the observations above is that none of these asymmetries are expected if the V2-clause is embedded in the same fashion as dass-clauses, i.e. directly merged with and selected by the related verb. V2-clauses, which ­function as realizations of arguments of related verbs10 are “relatively disintegrated subordinate clauses.” It comes as no surprise, then, that they are islands for movement. The table below summarizes many of the asymmetries noted by Reis. The upshot is that embedded V2-clauses in German exhibit distinctive root properties and are crucially not selected by the embedding verb.

Topicalization middle field position Postfield Correlates Extraction next to V, N, A, also with P with N in different functions und zwar-structure (‘namely’-structure) sluicing remnants in fragment answers

dass-clauses

aV2-clauses

+ + (marginal) possible + + + − + +

− − necessary − − − − (only explicative) − −

Verb-final/V2-asymmetries

Having (re-)established the root character of V2 and its external distribution, I would like to briefly touch on and critique the standard syntactic analysis of the phenomenon, insofar as its internal syntax is concerned. I will return to the distributional properties of V2-clauses in section 5.

3 The standard analysis of V2 The structure of a German main clause like (18) is standardly represented as in (19), where the prefield is identified with SPEC-CP. (18) Der the

Jens Jens

hat has

die the

Maria Mary

geküsst. kissed

10 “V2-Sätze in argumentrealisierender Funktion.”

272 

 Andreas Blümel

(19)

CP NPi Det Der

C’ N’ N Jens

C+T hat

TP T’

tNP vP

tT

tNP

v’ VP

v

NP Det die

V N’

gekiisst

N maria

In previous analyses, an EPP or Edge Feature EF on C was postulated, which triggers movement of an arbitrary category to SPEC-CP (Fanselow 2002, 2004, Roberts 2004). I believe that there are numerous conceptual problems with this analysis, but here I would like to highlight one particular issue: Aside from re­stating the problem of why SPEC-CP must be occupied, the analysis has nothing to say about what distinguishes root-CPs from embedded, i.e. properly selected ones. The difference would then probably have to be stipulated in terms of presence or absence of the EPP/EF on C, which is clearly non-explanatory. In other words, the analysis fails to even address the concern of Emonds, den Besten and Reis to subsume V2 under root phenomena and to try to find an account for it. This crucial link is what the following analysis captures.

4 The new analysis 4.1 Labeling My analysis rests on Chomsky’s recent conception of deriving endocentricity of the language-independent Third Factor principle Minimal Search as laid out



 Exocentric root declaratives: Evidence from V2 

 273

in Problems of Projection POP, (cf. Chomsky 2013 and references therein). The combinatorial device to form hierarchical structures is Simplest Merge,11 a binary operation that takes two syntactic objects – say, X and Y – to yield an unordered set, which comprises X and Y: (20) Merge(X, Y)={X, Y} Let me elaborate. The set delivered by Merge in (20) is strictly symmetrical, i.e. no element is more prominent than the other. There is thus no mechanism of projection involved, which means that headedness as conceived – in X -theory must come about by different means – at least, insofar as it holds. Part of the background assumption is the Minimalist goal to ground properties of the grammar other than Merge in language-independent ­principles, i.e. principles of efficient computation and characteristics of the interfaces. Let us see how the economy principle is operationalized in POP, to then proceed to the interface-related question what labels are needed for. Given (20) we have three logical options regarding the question how the two Merge-members are constituted: Head-Phrase, Head-Head, Phrase-Phrase. I will discuss each in turn. According to Chomsky (2000/POP), Minimal Search is the guiding principle for operations such as agree and labeling. Regarding labeling, any set formed in the fashion of (20) is inspected for the closest head. Whenever a lexical item and a phrasal syntactic object is merged, a labeling algorithm LA, which abides by the principle of computational efficiency Minimal Search, detects the former to be the head of the resulting set (21). (22) exemplifies how a PP comes about: (21) {X, YP} → {X̲, YP} (22) A prepositional phrase: {P, NP} → { P̲, NP} = PP Thus the X̅ -theoretic notion of prominence and projection of a head is recast in terms of availability of structurally closest lexical item within a given set. The next logical option is Merger of two simplex elements, i.e. the first step in the derivation. According to much recent work by Alec Marantz, this

11 I confine myself to Set Merge here. Pair Merge, the operation that delivers ordered sets, was suggested in Chomsky 2004 as the operation underlying adjunction phenomena. It raises independent problems and questions with respect to projection and labeling. See below for a speculative remark.

274 

 Andreas Blümel

p ­ rototypically12 involves a category-neutral root and a categorizing element like little v or n, i.e. lexical categories are contextually specified, not substantially. By assumption,13 acategorical roots do not bear grammatical features which can project (or serve as labels) to yield a root phrase. In other words, root phrases do not exist. Categorizers, by contrast, bear grammatical features and can function as labels. LA thus invariably identifies the latter as the label of Head-Head-structures (23); the non-projectionist conception of a vP in (24) serves as an example: (23) {X, Y} → { X̲, Y}, iff Y a categoryless √root and X a categorizer (n, v, a) (24) Assembling a vP: {v, √root} → {  v̲, √root}=vP The final conceivable combination is Merger of two phrasal units (25): (25) {XP, YP} Here we have to distinguish between two subcases for which POP offers a solution: The first falls under the rubric of “symmetry breaking movement” pioneered by Andrea Moro.14 The central idea is that such symmetric structures cannot receive a label unless one element moves, forming a discontinuous object. The relevant syntactic object visible for the labeling algorithm is the movement chain, where a chain is understood to be the set of all occurrences. As an effect, the unraised element is visible for label detection while the lower copy of the raised element is not (26); what I would like to call “the bottom of the EPP” might serve as an example (27). Here an external argument (DP) merges with a transitive vP; raising of the former makes labeling of {DP, vP} possible: (26) {XP, YP} → YP ... {X̲P, } (symmetry-breaking movement) (27) The bottom of the EPP: {DP, vP} → {DP, {T, {, vP}}} {,  v̲P}=vP

12 This exposure is a simplification, which might hold for syntactic processes only. It is perfectly conceivable that the initial step in the derivation also involves Merger of two roots, which might actually be the right pairing for morphological processes, such as exocentric root compounds and the like.  Notice also that, given unconstrained Merge, Merger of two functional heads is a possible combinatorial option. Here I have nothing to say about the question if such a configuration materializes in natural language or whether it is uninterpretable by the interfaces. 13 Cf. Irwin 2012, a.o., for the idea that lexical roots do not project. 14 To be accurate, Chomsky (1995) tentatively touches on the idea. Cf. inter alia Moro 2000, 2007; Ott 2012; Blümel 2012 and Chomsky 2013 for explorations.



 Exocentric root declaratives: Evidence from V2 

 275

The other strategy POP suggests to label XP-YP-structures is contingent on the operation agree15 and can be considered an instance of feature sharing between X(P) and Y(P): whenever a probe on a head H entertains agree with a goal, raising of the latter to the sister of HP yields a structure which is labelable by the probing feature and the corresponding feature on the goal (28). I call the derivation exemplifying this labeling strategy “the top of the EPP” (29): (28) {XP, YP} → Label({XP, YP})=F, where X and Y bear F and agree wrt F (29) The top of the EPP: a. {DP, vP} b. {Tϕ, {DP, vP}} c. {Tϕ, {DP, vP}} d. {DP, {Tϕ, {DP, vP}}}={DP, TP} e. {DPϕ͟, TPϕ͟}=ϕP16

Merger of T agree(T, DP) Move DP ϕ is shared and agreed upon

This exposition of the technical side of POP completes the list of configurations which the labeling algorithm endows with a syntactic category. Let me finally say that the labeling algorithm as conceived in POP applies at the phase level, i.e. at least v and C. At this juncture we are in a position to address the question what labels are needed for in the first place. Consider the following quote: “Each SO [syntactic object] generated enters into further computations. Some information about the SO is relevant to these computations. In the best case, a single designated element should contain all the relevant information: the label. [...] the label selects and is selected in EM [External Merge ...].” (Chomsky 2008:141, supplements and emphasis A.B.)

This is much in the spirit of Collins (2002): At the very least, labels are needed for the Conceptual-Intentional systems to ensure that the sister of a selecting head is of the right category. For instance, a verb embedding a sentential question requires its complement to be labeled by the semantic feature Q.17 It appears natural to ask: Do labelless categories exist? Given the system laid out above, this

15 In spirit, this strategy and in particular its contingency on agree is reminiscent of the Activity Condition. I fail to see, however, what it follows from or why exactly it works the way it is intended to work. 16 Technically, Chomsky (2014) suggests that the two ϕ-sets form a pair , similar to members of movement chains under the contextual conception of chains. On other parallelisms, cf. Boeckx 2008:34 ff. 17 Insofar as the goal to reduce c-selection to s-selection and Case is fully feasible, cf. Pesetsky 1982, Chomsky 1995.

276 

 Andreas Blümel

is quite expected if none of the conditions for label detection (as illustrated in (21)–(29)) are fulfilled and if no selection is involved. Since at least root clauses as well as adjunct clauses are not selected, they appear to be excellent candidates to be labelless categories. This, of course, leads me to V2.

4.2 Core properties of V2 derived Let me reiterate the property of V2-structures this paper focusses on: a phrase of any category must occupy the position before the finite verb in declarative root-like contexts; all the examples are ungrammatical declaratives without the fronted XP: (30) a. [α [DP Maria] hat tDP den Mann gestern gesehen] Mary has the man yesterday seen b. [α [AdvP gestern] hat Maria den Mann tAdvP gesehen] yesterday has Mary the man seen c. [α [VP den Mann gesehen] hat Maria gestern tVP] the man seen has Mary yesterday d. [α [CP dass die Sonne scheint] hat Maria tCP gesagt ] that the sun shines has Mary said e. [α [PP über den Wolken] muss die Freiheit tPP wohl above the clouds must the freedom ptcl grenzenlos sein] limitless be f. [α [AP schön] ist Maria tAP] beautiful is Mary In the absence of the notion specifier, how does {XP, CPV2}=α receive a label? Notice that α is an instance of XP-YP. Quite plausibly, these structures elude both labeling strategies mentioned above: Symmetry-breaking movement is obviously no option, i.e. the prefield category shows no sign of being a trace: it is phonologically realized and there is no antecedent. There appears to be no feature which V2-C and the prefield-XP have in common,18 and thus the shared-feature+agreeidea is not plausible either. The central idea of this paper is that V2-structures provide crucial evidence for the following hypothesis: (31) Root exocentricity: Declarative root clauses must not receive a label.

18 I here abstract away from subject-initial V2-clauses.



 Exocentric root declaratives: Evidence from V2 

 277

α must remain labelless and prefield-occupation ensures this.19 The question posed before receives a principled answer: that all the label detection options (21)–(29) fail is desired, it is part of the system and must not be otherwise: Merger of a phrase of any category guarantees that labeling of the declarative root clause – by V2-C20 – is obviated. Let us take a look, what (31) buys us. First of all, the obligatoriness of prefieldoccupation is immediately implied by the hypothesis. Secondly, the promiscuity of prefield-occupation follows as well: it does not matter what category the system uses to subdue labeling as long as (31) is met. In this sense, promiscuity is expected and it would be rather puzzling if the system would use a specific category to clog the labeling algorithm. Notice also that the source from which the XP is merged is irrelevant – not only internal Merge but external Merge too is a strategy to prevent labeling of root clauses. This is precisely what Frey (2006) suggests by way of examples like (32): (32) Kein Wunder spricht Peter so gut no wonder speaks Peter so well ‘No surprise Peter speaks French so well.’

Französisch. French

The evaluative DP kein Wunder in the prefield has no plausible clause-internal base position. Under the current view, this is exactly what we expect, given that the formal function of prefield-occupation is label suppression. No label can be identified within {DP, CPV2}, no matter how DP is introduced. Conceptually, it appears reasonable to say that root clauses do not need a label, because they are not selected and because syntactic categories arguably serve the ongoing derivation, hence, if the derivation terminates, labels are superfluous. The stronger claim I defend here, that root clauses must not receive a label, can be justified by economy: If labeling is not needed, avoid it by any means necessary, V2 being one such means.21 The account also solves another V2-riddle: we must insert an it-type expletive (33) to obtain a V2-clause, which further supports the assumption that prefield-occupation serves irreducibly formal purposes22 (cf. Fanselow & Lenertova 2011, Frey 2005): (33) a. *(Es) haben hier viele Leute übernachtet. EXPL have here many people spend-the-night ‘Many people have spent he night here.’

19 Aside from Joe Emonds’ work, cf. already Bloomfield 1933:194ff for the claim that sentences in English are exocentric, and Cecchetto & Donati (2014) for recent related ideas about root clauses. 20 Crucially, this means that V2-C by virtue of its lexically properties functions as a label. On the possibility that C in English might not have this property, cf. section 5.2. 21 Maybe this avoiding of redundancy is a reflection of optimal design of language? 22 There is a plethora of works dealing with information structural effects of V2 in German, none of which in my view show that prefield-occupation results from these functions.

278 

 Andreas Blümel

b. weil because

(*es) EXPL

hier hier

viele many

Leute people

übernachtet spend-the-night

haben have

In current terms, the contrast (33.a/33.b) receives an explanation: only {DP, CPV2} remains labelless due to the presence of DP=es in (33.a), while in (33.b) no such requirement holds as German plainly does not have anything like the EPP of English. Thus the so-called Vorfeld-es (‘prefield-it’) emerges as an element that while bearing no meaning on its own, fulfills a crucial interpretive function: facilitating the expression of a declarative when no other prefield element does. Let me address illicit verb third V3 at this point, (34)/(35):23 (34) *[DP Den Mann] the-ACC man

[DP Maria] hat Mary has

gestern yesterday

gesehen. seen

(35) *[ Nu för tiden] [DP Gusten] äter aldrig köttfaärs. now for time-the Gusten eats never minced.meat ‘Nowadays, Gusten never eats minced meat.’  (Swedish, Petersson 2009:104) The tentative accounts I would like to suggest to exclude V3 are related to labeling, but not logically dependent on it. One strategy to rule out examples like (34)/(35) could run as follows: LA inspects {XP, CPV2}=α for a label at the point of its creation.24 No label is detected and the operation transfer removes α immediately, terminating the derivation.25 With an empty workspace, there is no prefield left to doubly occupy. Alternatively, merging an additional XP has no effect on the outcome (Chomsky 2001) and is hence prevented. I will not pursue these questions further here.

23 Cf. Ott 2014 for an analysis of Left Dislocation in German which retains the V2-rule. Whichever analysis of grammatical V3-phenomena one chooses, it must explain the ungrammaticality of (34)/(35).  Alternatives to weakening the ‘single-constituent‘-generalization are readily available: a) The first constituent is not integrated in the putative host clause (cf. Meinunger 2006 for examples and convincing arguments) or b) the fronted units are really one XP, mistakenly analyzed as many. Cf. Sabel 2015 for an approach along these lines. 24 In POP label detection “waits” until the next phase head. However, at the root level, there is no higher phase head to trigger label detection – and it’s quite implausible and problematic to postulate one. On the technical side, a phase head would arguably be a label, which would infinitely reiterate the need to keep the root label-free. On the aesthetic side, it would be an artificial device to let labeling uniformly be triggered by phase heads. On the conceptual side, such a merely transfer-inducing head without further motivation and empirical evidence violates Full Interpretation. 25 Transfer of material other than a phase head’s complement has repeatedly been suggested and seems to be conceptually necessary at the main clause level (cf. Chomsky 2004:108; Obata 2010). Ott (2010) suggests that free relatives involve transfer of the complement of SPEC-CP, i.e. {C, TP}, such that the remaining wh-phrase determines the label. Boeckx (2014) generalizes the idea, suggesting that transfer is an operation that applies freely altogether.



 Exocentric root declaratives: Evidence from V2 

 279

4.3 A remark on sentence or force types Why is it that root declaratives must lack a label? Intuitively, labels, such as those provided by declarative complementizers, “signal” that the derivation continues and that the formed syntactic object seeks to entertain further grammatical relations. This would be impossible to fulfill at the root level – in a sense, a label on a syntactic object is a promise which is impossible to keep at the main clause level. Hence labels must be suppressed. Relatedly, the language system seeks to distinguish sentence types. Maybe root declarative is a kind of default or unmarked clause type. The unmarked clause type is then canonically associated with an unlabeled structure {XP, CP}, while marked clause types (subordinate complement clauses generally, interrogatives, imperatives, etc.) employ mostly labeled formats.26 Yes/noquestions and the other verb-initial sentences in German could then be derived by transferring C together with its complement, banning V2. These sentence types are labeled root clauses and hence give rise to interpretations other than declaratives. Alternatively, such sentences are V2-clauses with an empty polarity operator in the prefield – again, an unlabeled structure.

4.4 Interim conclusion Let me briefly summarize the hypothesis of this paper and then proceed to important ramifications the account raises. I have proposed that declarative root clauses need to remain unlabeled. This is ensured by Merger of an arbitrary XP in those languages that have V2-C – the type of C I restrict myself to here. The root-CP cannot be {C, TP} as it is illicitly labeled C, but raising of an arbitrary XP to its sister position to yield {XP, CP} makes labeling impossible, as desired. This derives the obligatoriness of prefield-occupation. Thus we need not resort to movement-inducing features like e.g. the EPP on C (Fanselow 2002, 2004; Roberts 2004); instead Merge applies freely and is restricted by Third Factors: Minimal Search for a label and the interface condition (31). The approach offers

26 Notice, incidentally, an interesting, but rather speculative conclusion which these considerations could lead to regarding the beginning and the end of the derivation. Above I described the POP-suggestion that Head-Head structures always involve a category-neutral root. Here I propose that the terminal step in the derivation must involve a category-free syntactic object. In essence then, both onset and end of the derivation involve an unlabeled object.  Why should this be so? One way to interpret this is related to the way these structures are interpreted at the Conceptual-Intentional interface. Possibly these are the only purely ­Conceptual-Intentional and, in a sense, extragrammatical structures. I will not pursue these speculations here.

280 

 Andreas Blümel

a solution to the elusive promiscuity of the fronted category: any XP will do to prevent labeling of the root. In fact, selectivity would be unexpected. Finally, the uniqueness property of V2 could be derived by transfer-induced removal of the structure, bringing the derivation to a halt and banning V3.

5 Implications and questions 5.1 Embedded V2 5.1.1 Embedded “C’’+V2 An obvious question the above analysis of V2 raises is: How does embedded V2-clauses in the bulk of Scandinavian languages come about, which are introduced by what looks like a complementizer, for example att in Swedish (36)? (36) Gusten sa att Fantomen har inte tio tigrars styrka. Gusten said that Phantom-the has not ten tigers strength ‘Gusten said that the Phantom doesn’t have the strength of ten tigers’ (Swedish) I will call this element at, collectively referring to the complementizer-like element which introduces embedded V2-clauses and which can be found in the Scandinavian languages as well as Frisian.27 What I would like to suggest is that the current approach to V2 supports a variant of and generalization over a hypothesis recently advanced by Petersson (2009)28 for Swedish: at is, in fact, lexically ambiguous between a true complementizer and a nominal element N/D which either embeds a proposition (Manzini 2010 and Roussou 2010 on Italian and Greek respectively) or which is associated with it. (37) at-Hypothesis It is only ever N/D-at that is associated with embedded V2-clauses, never C-at. Let me hasten to say that by using is associated with, I chose a deliberately vague wording in (37). If V2-clauses are unlabeled and if labeling is a requirement for selection – i.e. Merger with a verb or at –, then V2-clauses as unlabeled αs are not selected at all, neither by a verb nor by at. As I will argue later, the remaining way of introducing embedded V2 into the derivation is by adjunction, i.e. pair Merge. At this point

27 As before, I exempt Icelandic and Yiddish from the picture. 28 He suggests that embedded V2 in Swedish involves a nominal embedding a (split-)CP.



 Exocentric root declaratives: Evidence from V2 

 281

I remain agnostic about the exact details of the formal analysis of embedded V2 in Scandinavian/Frisian, a task I hope to take up in future work. Suffice it to say that for the current purposes, there is evidence for (37) and (37) allows me to maintain (31). The evidence for a lexical ambiguity of at is indirect; and yet the striking asymmetries between at with V2-clauses and at with non-V2-clauses fall into place if we accept the thesis. I.e. there is a way in which (37) appears to make the right cut in order to make sense of syntactic, prosodic and morphological contrasts between these two types of embedded clauses. Aside from the well-known asymmetries reported in the literature, I will here add another one less commonly mentioned, regarding a typological gap in complementizer agreement. As I will show, this gap is entirely expected, given (37). Let me first point out that (37) predicts that extraction from embedded V2-clauses is a violation of the complex NP-constraint (or represents an adjunct island violation respectively, if (37) is equivalent to an adjunction analysis of embedded V2). The examples (38.a/38.b) confirm the prediction. (38.c) shows the possibility to extract from a non-V2 embedded clause, headed by a genuine complementizer homophonous to N/D (for parallel facts from Faroese, Icelandic, as well as residual V2 in English, cf. Vikner 1995:114–116): (38) a. *Hvilken film sagde hun at i skolen havde  which movie said she AT in school-the had  bø rnene  allerede  set?  children-the  already  seen  ‘Which movie did she say that the children have seen in school?’(Danish) b. *Hvilken film sagde hun at bø rnene havde allerede  which movie said she AT children-the had already  set?  seen c.  Hvilken film sagde hun at bø rnene allerede havde  which movie said she AT children-the already have  set?  seen. (39) a. Den boken vet jag att Gusten inte har läst That book-the know I ATT Gusten not has read den boken. that book-the ‘That book, I know that Gusten hasn’t read’ (Swedish) b. ??/*Den boken vet jag att Gusten har inte läst. that book-the know I ATT Gusten has not read ‘That book, I know Gusten hasn’t read.’

282 

 Andreas Blümel

Under (37), the extraction difference is due to different underlying structures as in (40): (40) a. Hvilken film sagde hun [CP at bø rnene allerede havde set t] b. *Hvilken film sagde hun [N/D at [α i skolen havde bø rnene allerede set t]] Finally, a lexical ambiguity analysis of at delivers the following prediction: Genuine C may manifest C-agreement (41/42) (Richards 2007, Chomsky 2008):29 (41) a. ob-st du noch Minga kumm-st whether-2SG you to Munich come-2SG ‘... whether you come to Munich’ (Bavarian) b. ob-ts ees/ihr noch Minga kumm-ts whether-2PL you-PL to Munich come-2PL ‘... whether you (pl) come to Munich’ c. (I frog me,) warum-st des ned moch-st. I ask myself, why-2SG this not make-2SG ‘I wonder why you don’t do it.’ (42) Ich denk de-s doow Marie I think that-2SG you-SG Marie ‘I think that you will meet Marie.’

ontmoet-s. meet-2SG

(Limburgian)

For N/D-at, by contrast, agreement with the grammatical subject of the embedded clause is unexpected. If embedded V2 is associated with N/D but never with C, this raises the expectation that there is no language which simultaneously exhibits embedded V2 and C-agreement. In other words, C-agreement phenomena exhibit strict complementarity with V2. What I would like to call “Zwart’s puzzle” is a straightforward confirmation of this prediction. There seems to be a typological gap30 which, however, receives a simple solution under (37). I.e. the constraint (43) is an illusion and reduces to the fact that embedded V2 is associated with obligatorily non-agreeing N/D, never with potentially agreeing C:31 (43) *... V [C-Agr [V2-C... ]]

29 The examples are from Bayer (1984), Haegeman & van Koppen (2012). 30 Jan Wouter Zwart (p.c.), among others. He remarks: “I’ve always thought that this is a significant absence.” 31 Intended are those V2-languages where V-in-C is uncontroversial, i.e. the prediction does not scope over V2-in-T-languages.  The prediction developed here has nothing to say about the question why it is that what I dub N/D does not exhibit agreement with a subordinate clause. Head-final clauses like Turkish or Japanese have phenomena in which ϕ-bearing D-heads might very well exhibit agreement with subjects internal to clauses that appear to be selected by D. Much of what I say here depends on the exact nature of the N/D-element, pending further investigation. The fact that at is not inflected in any event indicates that it does not bear ϕ-features.

 Exocentric root declaratives: Evidence from V2 



 283

Even in languages which in principle allow both embedded V2 and C-agreement, they cannot show up in the same clause (Zwart 1993:291). Frisian serves as an example:32 (44) a. Heit sei dat-st do soks net leauwe moast dad said that-2SG you such not believe must-2SG ‘Das said that you should not believe such things.’ (Frisian) b. Heit sei dat/*dat-st do moast soks net leauwe dad said that/that-2SG you must-2SG such not believe ‘Das said that you should not believe such things.’ Independent language-internal evidence for an asymmetry between dat+V2 and das+non-V2 in Frisian comes from Ger de Haan (cited in de Vries 2012:180). He observes that bound variable readings are available in verb-final structures (45.b/45.c) but not in V2 subordinate clauses (45.a). De Vries concludes that the observations indicate “that the alleged complement clause takes high scope” vis-a-vis the universal quantifier: (45) a. *Eltsenieni  everyone b.  Eltsenieni  everyone c.  Eltsenieni  everyone

hie has hie has hie has

sein said sein said sein said

dat hyi wist it net. that he knew it not dat hyi it net wist. that he it not knew hyi it net wist. he it not knew

(Frisian)

Notice that a CP-recursion analysis (Kroch & Iatridou 1992, Vikner 1995, Holmberg & Platzack 1995) has nothing to say about why (43) holds.33

5.1.2 Embedded V2 in German again Reis (1997) suggest that V2-clauses embedded under bridge verbs in German are “relatively disintegrated subordinate clauses” in what looks like the host clause (while functioning as the argument of the relating predicate, Reis 1997): (46) Maria hat gemeint Hans bringt Mary has meant Hans brings ‘Mary said Hans brings Monika home.’

die the

Monika Monika

nach to

Hause. home

32 Frisian dat represents a prima facie counterexample to the generalization in Leu 2015 that it is really V2 and the morpheme d-/th- that are in complementary distribution. But see his fn. 18. 33 Yiddish exhibits embedded V2 (Diesing 1990). Presumably, it – and similar languages – will at least partially be analyzable as an instance of V-to-T (Diesing 1990 for Yiddish embedded subject- and topic-initial V2-clauses).

284 

 Andreas Blümel

She suggests that they are adjoined to their host clauses (Reis 1997:138), while the association between the complement position of the embedding verb and the argument-fulfilling V2-clause is left open: (47) Maria hat [TP [TP e gemeint] [α Hans bringt die Monika nach Hause]] If so, this means for the current analysis that unlabeled structures can participate in the ongoing derivation,34 presumably by pair merge (Chomsky 2004). (48) What this means is that adjoined unlabeled structures partake in the computation and remain unlabeled when a structure like \ref{tp} is transferred to the interfaces. Thus adjoined structures are exempt from requiring a label.35 This converges with recent ideas about adjuncts more generally, cf. the contribution by Hornstein & Nunes (2008). Notice that beyond this encouraging result, the current conception sheds light on the reason how absence of a label comes about, while this is merely postulated in Hornstein and Nunes’ work (if plausibly so). The question how such adjuncts are licensed is an independent matter. Apparently German features such a licensing property while Dutch lacks it. Also, Scandinavian utilizes N/D to license embedded V2, but not the German strategy. Frisian appears to employ both.

5.2 Some open questions Here I would like to address some open questions and briefly and tentatively touch on ways to answering some of them. It goes without saying that a fuller, cross-linguistic account of the hypothesis advanced here exceeds the limits of this paper, but I hope to show how current syntactic theorizing has a sufficiently rich vocabulary to meeting (31) in different ways. V2-clauses in German have many subtypes, such as weil-V2 and V2-relatives all of which are usually subsumed under “root phenomenon” (Wegener 1993, Wegener 2000, Gärtner 2000, Holler 2005, Steinbach & Antomo 2010, Reis 2013).

34 In accordance with assumptions in POP. Notice that this analysis appears to be compatible only with the “effect on outcome”-based solution to avoid multiple prefield-occupation alluded to above, because transfer of α would render α inaccessible to pair merge. 35 Although tempting, I refrain from going into the idea that adjuncts are defined as unlabeled structures that participate in the derivation (Hornstein & Nunes 2008), simply by set merge (cf. Oseki 2014 for a recent attempt to abolish pair merge altogether).



 Exocentric root declaratives: Evidence from V2 

 285

Their exact analysis is pending, giving the conclusion reached here. Regarding weil-V2 and the like, however, it strikes me as noteworthy that they share properties of V2-clauses proper as if the element introducing them were invisible.

5.2.1 Root phenomena cross-linguistically Regarding the hypothesis (31), parametrization of labellessness of declarative roots strikes me as quite counterintuitive and I take it to be a non-starter: (31) is an interface condition regarding the syntactic format of declarative clauses. As such, I take it to be universal and invariant. If so, it must be the case that particular grammars exhibit variation when it comes to meeting this condition. What are other, languageparticular strategies to avoid labeling of root clauses? For example, in English? Here we have to distinguish at least two cases. English might not have a uniform way of avoiding labels in root contexts. As in German, we might have to separates subjectinitial declaratives from everything else. A declarative with topicalization or residual V2 might employ the strategy I have suggested here for V2-languages: (49) a. [α [DP This book ] [ϕP John likes ]] b. [α [VP Like this book ] [ϕP John does ]] c. [α [PP To Bill ] [ϕP John gave the book ]] (50) [α [PP Under no circumstances ] [would John leave ]] Regarding subject-initial declaratives several options come to mind. A fundamental question is whether or not they lack a label as their non-subject-initial counterparts. The fact that subjects have somewhat of an exposed or “privileged’’ status leads me to suspect otherwise (and they do so not only in English but in German as well36). Consider the idea that subject-initial declaratives do receive a label after all, namely ϕ: (51) [ϕP John likes this book ] An idea proposed in Chomsky 2014 in a different but related context is that declarative null-C might be derivationally a non-label/phase. He suggests that in embedded clauses, null-C inherits all its features to its proxy, thereby virtually disappearing (and voiding that-trace effects37). In root contexts, we could

36 I am grateful for an anonymous reviewer for pressing me on this point. 37 Cf. also Rose Deal (2016).

286 

 Andreas Blümel

employ a similar rationale: Null-C inherits all its features to T and thereby ceases to provide a label. After inheritance, the structure looks roughly as follows: (52) [α C [β John likes this book ]] At this point, two scenarios are conceivable: The LA searches α vacuously and proceeds to β where ϕ is detected on both D and T – the root is labeled ϕ as needed for subject-initial declaratives.38 Alternatively, the disappearance of C renders α labelless – as needed for root declaratives generally. I.e. the labeling algorithm looks into α and detection of label is impossible since C is invisible and search into β is not an option for Minimal Search. In the case of V2 subject-initial sentences, a similar solution suggests itself if the feature-based assumption is adopted: Instead of being inherited to T, C retains its ϕ-set (Legate 2011). There are numerous open issues. One is, of course, the nature of VSOlanguages in which V is arguably in C. What is their way of meeting (31)? Moreover, is there a relation between label avoidance at the root level and the wealth of root transformations of other languages (Miyagawa & Jiminez-Ferndandez 2014 for a recent take on the latter)? This appears plausible, in my view, and I would like to go into these matters in future work.

6 Summary Let me summarize this paper: Labeling V2-clauses as CP was motivated by assuming that endocentricity uniformly holds for all categories and the X͞-theoretic notion that all phrases have or provide for specifiers. In a Merged-based system, these are questionable stipulations – such properties should be epiphenomena at best. In addition, a movement trigger was needed to explain why an XP must occupy the prefield (EPP, Occurrence or Edge Features). Once we give up these ideas, base endocentricity on a Third Factor (Minimal Search), and consider V2-C, the motivation for movement or base generation becomes clear, no diacritic is needed and the elusive promiscuity of the prefield-XP is explained. In addition, the idiosyncrasies of declarative root(-like) clauses receive an explanation in terms of labeling: XP-Merger preempts labeling of root declaratives. Needless to say that the current analysis provides support for a sparse conception of syntax, without specifiers and based on the simplest conception of Merge, in line with much recent research.39

38 The option considered is very much against the spirit of Minimal Search. 39 Chomsky 2004 et seq; Ott 2012; Boeckx 2014; Seely, Epstein, & Kitahara 2014.



 Exocentric root declaratives: Evidence from V2 

 287

Acknowledgments: I am especially grateful to Dennis Ott for suggestions that entered into this paper. Regarding the core idea, he was independently thinking along the same lines. Thanks also to Mailin Antomo, Huba Bartos, Michael Brody, Marcel den Dikken, Joe Emonds, Ángel Gallego, Hans-Martin Gärtner (Danke für die ausführlichen Kommentare!), Remus Gergel, Chris Götze, Erich Groat, Anke Holler, Gereon Müller, Ad Neeleman, Hagen Pitsch, Omer Preminger, Marc Richards, Luigi Rizzi, Volker Struckmeier, George Walkden, Hedde Zeijlstra and an anonymous reviewer for feedback, discussion and encouragement, as well as the audiences of many conferences and workshops. All errors are my own.

References Bayer, J. (1984). COMP in Bavarian syntax. The Linguistic Review, 3. 209–274. Bloomfield, L. (1933). Language. New York: Henry Holt and Co. Blümel, A. (2012). Successive cyclic movement as recursive symmetry-breaking. In N. Arnett, & R. Bennett (eds.), Proceedings of the 30th West Coast Conference on Formal Linguistics, 87–97. Boeckx, C. (2008). Bare syntax. Oxford: Oxford University Press. Boeckx, C. (2014). Elementary syntactic structures. Cambridge: Cambridge University Press. Brandner, E. (2011). Verb movement in German exclamatives : From syntactic underspecification to illocutionary force. In S. Lima, K. Mullin, & B. Smith (eds.), NELS 39 : Proceedings of the 39$^{th}$ annual meeting of the North East Linguistic Society. 1, 135–148. Amherst: University of Massachusetts, Department of Linguistics. Chomsky, N. (1986). Knowledge of language . New York: Praeger. Chomsky, N. (1995). Bare phrase structure. In G. Webelhuth (ed.), Government and binding theory and the minimalist program. Cambridge, MA: MIT Press. Chomsky, N. (1995). The minimalist program. Cambridge, MA: MIT Press. Chomsky, N. (2000). Minimalist inquiries: The framework. In D. M. R. Martin (ed.), Step by step: Essays in syntax in honor of Howard Lasnik, 89–155. Cambridge, MA: MIT Press. Chomsky, N. (2001). Derivation by phase. In M. Kenstovicz (ed.), Ken Hale: A life in language. Cambridge, MA: MIT Press. Chomsky, N. (2004). Beyond explanatory adequacy. In A. Belletti (ed.), Structures and beyond, 104–131. Oxford: Oxford University Press. Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry, 36(1). 1–22. Chomsky, N. (2008). On phases. In R. F. C. Otero, & M.-L. Zubizarreta (eds.), Foundational issues in linguistics, 133–166. Cambridge, MA: MIT Press. Chomsky, N. (2013). Problems of projection. Lingua, 130. 33–49. Chomsky, N. (2014). Problems of projection: Extensions. unpublished. Collins, C. (2002). Eliminating labels. In T. D. Epstein (ed.), Derivation and explanation in the minimalist program, 42–64. Oxford: Blackwell. de Vries, M. (2012). Parenthetical main clauses – or not? On appositives and quasi-relatives. In L. Aelbrecht, L. Haegeman, & R. Nye (eds.), Main clause phenomena: New horizons, 177–201. Amsterdam: John Benjamins.

288 

 Andreas Blümel

Deal, A. R. (to appear). Cyclicity and connectivity in Nez Perce relative clauses. Linguistic Inquiry. den Besten, H. (1983 [1977]). On the interaction of root transformations and lexical deletive rule. In W. Abraham (ed.), On the formal syntax of the Westgermania. Amsterdam: John Benjamins. Diesing, M. (1990). Verb movement and the subject position in Yiddish. Natural Language and Linguistic Theory, 41–79. Donati, C., & Cecchetto, C. (2014). Deciding between the external and internal definition of label. Eide, K. M. (2007). Finiteness and inflection: The syntax your morphology can afford. unpublished. Emonds, J. (2004). Unspecified categories as the key to root constructions. In D. Adger, C. d. Cat, & G. Tsoulas (eds.), Peripheries: Syntactic edges and their effects, 75–120. Dordrecht: Kluwer. Emonds, J. E. (2012). Augmented structure preservation and the tensed S constraint. In L. Aelbrecht, L. Haegeman, & R. Nye (eds.), Main clause phenomena: New horizons, 21–46. Amsterdam: John Benjamins. Fanselow, G. (2002). Quirky subjects and other specifiers. In I. Kaufmann, & B. Stiebels (eds.), More than words, 227–250. Berlin: Akademie Verlag. Fanselow, G. (2004). Cyclic phonology-syntax-interaction: Movement to first position in German. In S. Ishihara, M. Schmitz, & A. Schwarz (eds.), Interdisciplinary studies on information structure I, 1–42. Fanselow, G., & Lenertova, D. (2011). Left peripheral focus: mismatches between syntax and information structure. Natural Language and Linguistic Theory, 169–209. Fernández, Á. J., & Miyagawa, S. (2014). A feature-inheritance approach to root phenomena and parametric variation. Lingua, 145. 276–302. Frey, W. (2005). Zur Syntax der linken Peripherie im Deutschen. In F. J. d’Avis (ed.). Göteborg. Frey, W. (2006). Contrast and movement to the German prefield. In V. Molnár, & S. Winkler (eds.), The architecture of focus, 235–264. Berlin: Mouton de Gruyter. Gärtner, H.-M. (2000). Are there V2 Relative clauses in German? Journal of Comparative Germanic Linguistics, 97–141. Haegeman, L. & M. van Koppen (2012). Complementizer agreement and the relation between C0 and T0. Linguistic Inquiry, 43. 441–454. Holler, A. (2005). On non-canonical clause linkage. In S. Müller (ed.), Proceedings of the 12th International Conference on Head-Driven Phrase Structure Grammar, Department of Informatics, University of Lisbon, 157–177. Holmberg, A. C. & Platzack (1995). The role of inflection in Scandinavian syntax. New York, Oxford: Oxford University Press. Hornstein, N. J. & Nunes (2008). Adjunction, Labeling, and bare phrase structure. Biolinguistics, 2. 57–86. Iatridou, S. & A. S. Kroch (1992). The licensing of CP-recursion and its relevance to the Germanic verb-second phenomenon. Working Papers in Scandinavian Syntax, 50. 1–24. Irwin, P. (2012). Unaccusativity at the interfaces. NYU Ph.D. dissertation. Legate, J. (2011). Under-inheritance. Talk given at NELS 42. Leu, T. (2015). Generalized x-to-C in Germanic. Studia Linguistica. 1–32. Manzini, R. (2010). The structure and interpretation of (Romance) complementizers. In E. P. Panagiotidis (ed.), The complementizer phase, 167–199. Oxford University Press. Meinunger, A. (2006). Interface restrictions on verb second. The Linguistic Review. 127–160. Moro, A. (2000). Dynamic antisymmetry. Cambridge, MA: M.I.T. Moro, A. (2007). Some notes on unstable structures. unpublished. Obata, M. (2010). Root, successive-cyclic and feature-splitting internal merge: Implications for feature-inheritance and transfer. University of Michigan Ph.D. dissertation.



 Exocentric root declaratives: Evidence from V2 

 289

Oseki, Y. (2014). Eliminating pair-merge. unpublished. Ott, D. (2011). A note on free relative clauses in the theory of phases. Linguistic Inquiry 42. 183–92. Ott, D. (2011). Local instability: The syntax of split topics. Berlin/New York: De Gruyter. Ott, D. (2014). An ellipsis approach to Contrastive Left-dislocation. Linguistic Inquiry 45. 269–303. Ott, D., & de Vries, M. (2015). Right-dislocation as deletion. Natural Language and Linguistic Theory, 1–50. Pesetsky, D. (1982). Paths and categories. MIT Ph.D. dissertation. Petersson, D. (2009). Embedded V2 does not exist in Swedish. Working papers in Scandinavian Syntax 84. 101–149. Reis, M. (1997). Zum syntaktischen Status unselbständiger Verbzweit-Sätze. In C. Dürscheid, K. Ramers, & M. Schwarz (eds.), Sprache im Fokus, 121–144. Tübingen: Niemeyer. Reis, M. (2002). Wh-movement and integrated parenthetical constructions. In J. W. Zwart, & W. Abraham (eds.), Proceedings of the 15th Germanic Syntax Workshop, 3–40. Amsterdam: John Benjamins. Reis, M. (2013). “Weil-V2”-Sätze und (k)ein Ende? Anmerkungen zur Analyse von Antomo & Steinbach (2010). Zeitschrift für Sprachwissenschaft, 221–262. Richards, M. (2007). On feature inheritance: An argument from the phase impenetrability condition. Linguistic Inquiry 38. 563–572. Roberts, I. (2004). The C-system in Brythonic Celtic languages, V2, and the EPP. In L. Rizzi (ed.), The cartography of syntactic structures, Vol. 2, 297–327. New York and Oxford: Oxford University Press. Roussou, A. (2010). Selecting complementizers. Lingua 120. 582–603. Sabel, J. (2015). Variationen von V2. unpublished. Seely, T. D., Epstein, S. D., & Kitahara, H. T. (2014). Labeling by minimal search: Implications for successive cyclic A-movement and the conception of the postulate ‘phase’. Linguistic Inquiry 45. Steinbach, M. & M. Antomo (2010). Desintegration und Interpretation: Weil-V2-Sätze an der Schnittstelle zwischen Syntax, Semantik und Pragmatik. Zeitschrift für Sprachwissenschaft 29. 1–37. Vikner, S. (1995). Verb movement and expletive subjects in the Germanic languages. New York and Oxford: Oxford. University Press. Webelhuth, G. (1992). Principles and parameters of syntactic saturarion. Oxford: Oxford University Press. Wegener, H. (1993). Weil – das hat schon seinen Grund. Zur Verbstellung in Kausalsätzen mit weil im gegenwärtigen Deutsch. Deutsche Sprache, 21, 289–305. Wegener, H. (2000). Da, denn und weil – der Kampf der Konjunktionen. Zur Grammatikalisierung im kausalen Bereich. In R. Thieroff (Ed.), Deutsche Grammatik in Theorie und Praxis, 69–81. Tübingen: Niemeyer. Zwart, J.-W. (1993). Dutch syntax: A minimalist approach. Rijkuniversiteit Groningen Ph.D. dissertation. Zwart, J.-W. (1997). Morphosyntax of verb movement: a minimalist approach to the syntax of Dutch. Dordrecht: Kluwer.

Index A-movement 4, 8, 17, 33, 34, 252 Acyclic incorporation 10, 92, 95, 111, 112 Agree 9, 10, 29, 48, 52, 53, 64–66, 71, 79, 101, 109, 263, 272, 274–276 agreement 2, 5, 65, 66, 69, 70–72, 74–77, 79, 80, 83–85, 101, 142, 211, 281, 282 cartography 8–10, 70, 76, 78, 87 categorization 12, 13, 203, 205, 227, 258–261 clitics 107, 109, 110, 205, 251 comparatives 173, 174, 176, 180, 183, 190 complement clauses 127, 128, 132, 134, 137, 153, 279 complementizer agreement 281 Condition on Extraction Domain 163, 167–170, 196 Connectivity 127, 128, 136–140, 144, 147–152, 154 criterial freezing 8, 17, 35, 39 derivations 63–65, 100, 111, 119, 120 Distinctness Condition 11, 156, 157 Distributed Morphology 62, 79, 81, 87, 203, 205, 210, 235 exocentric 276 expletive 41, 48, 277 external argument 206, 209, 211–213, 218, 242, 244, 274 Final-over-Final Condition 161, 163, 170, 172, 173, 176, 184–189, 191, 193–197 freezing 35–39, 42, 91–93, 101 Germanic 172, 242, 263 halting 17, 30, 35, 41, 69–72, 74–78, 87 Head-Final Filter 161, 163, 170–175, 177–185, 187–197 Headless XP-movement 10, 91–93, 103, 112 Hiaki 12, 206, 210, 211 islands 122, 123, 153, 169, 270

DOI 10.1515/9781501502118-012

label 18, 23, 26, 28, 40, 49, 51, 53, 55, 70, 80, 99, 162, 205, 272 labeling 17, 18, 28, 29, 55, 70, 91, 96, 98–103, 112, 117, 155, 163, 282 labeling algorithm 53–55, 59, 70, 71 LCA 163, 165–167, 169, 185, 186, 197 left dislocation 127, 128, 130, 139, 154 leftward movement 248–250, 253 lexical insertion 80, 85–87, 248 Merge 17, 24, 25, 28, 37, 48, 50–53, 72, 96, 97, 162, 204, 224, 272, 273 Merge(X,Y) = {X,Y} 48, 49, 51, 63, 64, 66, 272 minimalism 4, 9, 20, 24, 29, 30, 48 minimalist program 72, 117, 161, 204 minimal search 17, 28–34, 40, 42, 70, 71, 86, 263, 272, 273, 279 movement 33, 36, 69–74, 76, 77, 91, 92, 99, 107, 224, 264 multiple movement 91, 92, 94, 107, 112 nanosyntax 77, 79, 86, 87 nominalization 206, 207, 214–217, 241, 245, 249, 256 null phase head 92, 98, 112 one-replacement 206–208 optional movement 69, 78, 87 order preservation 91, 92, 94, 95, 107, 108, 110–112 pair merge 52, 219, 224, 280, 284 phase 17, 34, 56, 57, 59, 64, 91, 92, 97–102, 122, 246, 264 prefield 263–266, 271, 276, 277 projection 23–25, 27, 35, 162–164, 204, 211, 212, 215, 222, 273 reconstruction 138, 139, 147–152 root 5–7, 62, 81, 203–206, 212, 214, 219–222, 225, 226, 246, 274 root clause 122, 123, 130, 136, 155, 156, 264, 266, 272, 276, 277, 279, 285, 286

292 

 Index

satellite hypothesis 127, 128, 131, 133, 135, 138, 139, 141, 144, 147, 148 self merge 222 sentential subjects 127–130, 155, 156 set merge 219, 224 simplest merge 18, 26, 27, 29–31, 34, 37, 42, 49, 60, 62, 66, 273 small clauses 234, 235 spell-out 49, 56–58, 117–119, 122, 123 strong minimalist thesis 22, 26, 28, 35, 50 superlative 178, 179 suppletion 12, 206, 210–214 topic drop 136, 144–146, 265 tough construction 174–177

transfer 10, 49, 50, 53, 56, 91, 98, 117–124 – strong transfer 117, 119–124 – weak transfer 117, 119, 120, 124 transformation 43, 47–49, 52 triggers for movement 69 unary merge 224, 225 Universal 21 161, 163, 170, 196 unlabelled nodes 234–236, 246, 253, 258, 259 V2 263, 264, 266, 267, 269–271, 276, 277, 279–286 word order variation 78