Symmetry, Shared Labels and Movement in Syntax 9783110522518, 9783110520125

What is the trigger for displacement phenomena in natural language syntax? And how can constraints on syntactic movement

191 103 726KB

English Pages 180 [176] Year 2017

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Contents
Abbreviations
1. Preface
2. Introduction
2.1 Successive-cyclic Movement as intermediate labeling Indeterminacies
2.2 X-Immobility and Criterial Freezing
2.3 ATB involves Forked chains
3. Minimalist Reflections
3.1 Conceptual Underpinnings
3.2 On Movement as Internal Merge
4. Propagating Symmetry
4.1 The Syntax of Successive-Cyclic A-Movement
4.2 The Evidence
4.2.1 The successive Cyclicity Hypothesis within the Theory of Phases
4.3 Previous Analyses
4.3.1 Shortest Step Approaches
4.3.2 Moving-Element-driven Approaches
4.4 A Novel Approach to Successive-cyclic A-Movement
4.4.1 From PS-Grammar to a Labelless Bare Phrase Structure
4.5 Some Consequences and Extensions
4.5.1 On the Timing of Labeling and Bavarian wh-Questions
4.6 Summary
5. Shared Labels and Criterial Freezing
5.1 Introduction
5.2 Shared Labels and Full Interpretation
5.3 X-Immobility (XIM)
5.3.1 Introducing XIM
5.4 Criterial Freezing
5.4.1 Criterial Freezing and A-movement
5.4.2 Criterial Freezing and A-movement
5.5 Summary
6. In Defense of Forked chains
6.1 Introduction
6.1.1 A Word of Motivation
6.1.2 Organization of this Chapter
6.2 Properties of Coordination and ATB
6.2.1 General Properties of Coordination
6.2.2 General Properties of ATB
6.3 Previous Analyses
6.3.1 Asymmetric Analyses
6.3.2 Symmetric Analyses
6.3.3 A Hybrid Analysis – Ha 2007
6.4 The Current Analysis
6.4.1 Coordinative Core
6.4.2 ATB from the Coordinative Core
6.4.3 Splitting up CC
6.5 Open Issues and Remaining Questions
6.6 Summary
7. Summary and Outlook
Bibliography
Index
Recommend Papers

Symmetry, Shared Labels and Movement in Syntax
 9783110522518, 9783110520125

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Andreas Blümel Symmetry, Shared Labels and Movement in Syntax

studia grammatica 81

Herausgegeben von Manfred Bierwisch, Hans-Martin Gärtner und Manfred Krifka unter Mitwirkung von Regine Eckardt (Konstanz), Paul Kiparsky (Stanford)

Andreas Blümel

Symmetry, Shared Labels and Movement in Syntax

ISBN 978-3-11-052012-5 e-ISBN (PDF) 978-3-11-052251-8 e-ISBN (EPUB) 978-3-11-052018-7 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2017 Walter de Gruyter GmbH, Berlin/Boston Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com

| Diese Arbeit widme ich meiner Großmutter, Gerta Sterzik.

Contents Abbreviations | IX 1

Preface | 1

2 2.1 2.2 2.3

Introduction | 2 Successive-cyclic Movement as intermediate labeling Indeterminacies | 4 ¯ X-Immobility and Criterial Freezing | 6 ATB involves Forked chains | 7

3 3.1 3.2

Minimalist Reflections | 9 Conceptual Underpinnings | 9 On Movement as Internal Merge | 13

4 4.1 4.2 4.2.1 4.3 4.3.1 4.3.2 4.4 4.4.1 4.5 4.5.1 4.6

Propagating Symmetry | 16 ¯ The Syntax of Successive-Cyclic A-Movement | 16 The Evidence | 17 The successive Cyclicity Hypothesis within the Theory of Phases | 18 Previous Analyses | 35 Shortest Step Approaches | 35 Moving-Element-driven Approaches | 37 ¯ A Novel Approach to Successive-cyclic A-Movement | 41 From PS-Grammar to a Labelless Bare Phrase Structure | 42 Some Consequences and Extensions | 69 On the Timing of Labeling and Bavarian wh-Questions | 71 Summary | 73

5 5.1 5.2 5.3 5.3.1 5.4 5.4.1 5.4.2 5.5

Shared Labels and Criterial Freezing | 74 Introduction | 74 Shared Labels and Full Interpretation | 75 ¯ X-Immobility (XIM) | 79 Introducing XIM | 79 Criterial Freezing | 80 ¯ Criterial Freezing and A-movement | 80 Criterial Freezing and A-movement | 94 Summary | 96

VIII | Contents 6 6.1 6.1.1 6.1.2 6.2 6.2.1 6.2.2 6.3 6.3.1 6.3.2 6.3.3 6.4 6.4.1 6.4.2 6.4.3 6.5 6.6

In Defense of Forked chains | 97 Introduction | 97 A Word of Motivation | 98 Organization of this Chapter | 98 Properties of Coordination and ATB | 99 General Properties of Coordination | 99 General Properties of ATB | 104 Previous Analyses | 107 Asymmetric Analyses | 108 Symmetric Analyses | 115 A Hybrid Analysis – Ha 2007 | 125 The Current Analysis | 128 Coordinative Core | 128 ATB from the Coordinative Core | 131 Splitting up CC | 145 Open Issues and Remaining Questions | 148 Summary | 152

7

Summary and Outlook | 153

Bibliography | 155 Index | 165

Abbreviations 1 2 3 ACC ASP BEI CLF COP DE DAT EXPL FOC FUT GER HAB IMPF INF M NOM NS PL PRF PRS PST PTCL Q REFL REL SBJV SBJV-PTCL SG SUBJ SUO

first person second person third person accusative aspect Mandarin marker in passive-like constructions classifier copula Mandarin marker introducing nominal attributes dative expletive focus futur genitive habitual imperfect infinitive masculine nominative non-subject plural perfect present past sentence particle question particle/marker reflexive relative subjunctive subjunctive particle singular subject Mandarin functional particle

1 Preface This book is a revised version of my dissertation, which I defended in Frankfurt/Main in April 2014. I would like to foremost thank my advisor Günther Grewendorf for being a demanding teacher, for his rigorous example as well as his support and encouragement. I am grateful to the other committee members Cecilia Poletto, Helmut Weiß, Gert Webelhuth and Markus Bader. I also thank the “FFM-syntax-crew”: Marc Richards, Erich Groat, Leah Bauke and Andreas Pankau for frequent and thought-provoking discussions on numerous issues. All the people listed have contributed to the amazing environment that made it possible to pursue theoretical syntax at FFM. At Göttingen, my current affiliation, I would like to express my gratitude to Anke Holler, Chris Götze, Hagen Pitsch, Götz Keydana and especially Joost Kremers for giving me crucial hints regarding ways to go on. I’ve met too numerous people at conferences and workshop to thank individually here. I would, however, like to point out Dennis Ott and Cedric Boeckx, whose interest in my work has encouraged me and kept me going. I thank Manfred Bierwisch and Hans-Martin Gärtner for their support in publishing this book. An anonymous reviewer deserves many thanks for a detailed and critical feedback on the manuscript. My parents supported me continuously and patiently. Finally, Mingya has accompanied, supported and loved me throughout. All mistakes and shortcomings in this book are, of course, my own.

DOI 10.1515/9783110522518-001

2 Introduction This book is concerned with recent ideas and discussions about a number of related problems in the theory of syntax. Within the last twenty years efforts have gone into recasting numerous properties of grammar from the era of Principles and Parameters (Chomsky (1981b)) in terms of fewer, more principled and more elegant operations and mechanisms – in line with goals of Minimalism (Chomsky (1995b)). One of the problems in these efforts concerns the nature of the setforming operation responsible for building hierarchical structures, commonly ¯ called Merge. This operation lies at the heart of features of both X-Theory and Movement. For quite some time now, practitioners of syntax have worked with either more or less Minimalist machinery, and it might have seemed as if Bare Phrase Structure theory (Chomsky (1995a)) is somewhat of a notational equivalent of their theoretical precursors or in any event does not deliver new empirical results. I believe recent developments in the theory of syntax have shown quite clearly that this is not the case. In particular, endocentricity – the notion that all phrases have one most prominent element determining the phrases’ syntactic category – has come under close scrutiny (Collins (2002), Chomsky (2013)). These investigations have given rise to surprising and very promising results, which, however, are not reformulable in terms of preceding theoretical terms. In particular, endocentricity and certain instances of syntactic displacement are related in interesting ways – the topic of this book. There is somewhat of a consensus that Merge is an operation that takes two syntactic objects α and β to yield a set γ. This set, in turn, enters into further computation with other syntactic objects, which can be simplex or complex. In order to do so, the character of γ is important: The most prominent element in γ determines its category label and, as a consequence, its distribution. We can thus ask: Is Merge an asymmetrical operation along the lines of (1) or a symmetric one as in (2)? (1)

Merge(α, β)={α, {α, β}}

(2)

Merge(α, β)={α, β}

Depending on the answer to that question, a number of subquestions arise. To decide between these alternatives, let us recall questions asked by Chomsky in the beginning of the Minimalist Program: (1) what are the general conditions that the human language faculty should be expected to satisfy and (2) to what extend is the language faculty determined by these conditions, without special structure that lies beyond them? [(1). . . ] has two aspects: what conditions DOI 10.1515/9783110522518-002

2 Introduction

| 3

are imposed on the language faculty by virtue of (A) its place within the array of cognitive systems of the mind/brain, and (B) general considerations of conceptual naturalness that have some independent plausibility, namely, simplicity, economy, symmetry, nonredundancy, and the like?

For roughly a decade, option (1) was widely considered a simple and natural way ¯ to capture the endocentricity property of X-theory by means of the undoubtedly simpler Bare Phrase Structure theory. In (1) the syntactic category label is transparent: α, but not β, projects and thus γ is αP. But this maneuver presupposes an operation PROJECT whose conceptual motivation is unclear. Matters differ for (2) and I will devote the following pages to defending a structure building component which relies on (2) alone. Questions regarding (1) arose right from the start: Why pick α as the label instead of β? Why not pick the union of (the head of) α and β as a label? Or the intersection, i.e. a feature which α and β have in common? Aside from such concrete questions, which are ultimately empirical, Chomsky’s general considerations of conceptual naturalness include symmetry and economy, a property option (1) does not exhibit. Recent reconsiderations of how “headedness of phrases” comes about are numerous, and I think it is fair to say that Chomsky’s contributions in On Phases and Problems of Projection represent a breakthrough on the matter in that the symmetric option (2) receives empirical support, but also in that endocentricity – where it holds – derives from the economy principle Minimal Search. A ramification of this idea is that whenever Merge assembles two phrases to yield {XP, YP}, dominance relations needed to formulate the notion specifier are of little help: Neither is XP a specifier of Y, nor is YP a specifier of X. The same is not true for (1) (assuming α and β are each phrasal): α projects and hence dominates β which is, as a consequence, the specifier. In this sense, the very notion specifier, including concepts that rely on it, are not part of a system that embraces (2). In this work, I explore empirical repercussions of the bare mechanism schematized in (2). In particular, I adopt recent ideas about labeling in Chomsky (2008, ¯ 2013) and show three things: First, I demonstrate that successive-cyclic A-movement can be elegantly derived by the idea that symmetric structures of the form {XP, YP} cannot receive a label unless manipulated by movement of one member. Secondly, I suggest a simple principle, partly motivated independently, which dictates that [-Max]/[-Min]-categories may not undergo movement. In connection with the dynamic labeling system of Chomsky (2013), I then take up the discussion from chapter 4 of how movement ever stops, given the powerful “symmetry breaking” mechanism, and show how Criterial Freezing by Rizzi (2006) can be deduced by adopting the suggestion in Chomsky (2013) that a subset of {XP, YP}-

4 | 2 Introduction

structures can be labeled by a prominent feature both X and Y share: If {XP, YP}=α gets labeled by F, the intersection of X and Y, the terms of α, XP and YP, loose their [+Max]-status and are thus immobile. This result entails that Merge is symmetrical rather than asymmetrical (in line with Boeckx (2008a), Chomsky (2008, 2013), pace Kayne (1994) et seq, Narita (2010)). I explore consequences of this idea empirically with Hebrew “nested interrogatives” (Preminger (2006)) and show that a relatively simple analysis of this phenomenon can be obtained. Finally, taking the “shared” label idea seriously, I ask what can be gained if we extend this to movement chains and look at “shared launching sites” in across-the-board movement of coordination. I argue that there is considerable (partly novel) empirical support for such “forked chains,” and make a specific suggestion how ATB movement can be understood, using mainly just Merge, phases, and Third Factor principles like Minimal Search (Chomsky (2008, 2013); Martin and Uriagereka (2011)). Before delving into empirical matters in chapter 4, I will elaborate briefly on Minimalist principles and goals in chapter 3.

2.1 Successive-cyclic Movement as intermediate labeling Indeterminacies ¯ Practically any syntax textbook features successive-cyclic A-movement as exemplified in (3-a), the topic of chapter 4: (3)

a.

[Which book]i do you [vP ti think/believe [CP ti C=that Mary [vP ti wrote ti ]]] b. *You think/believe which book (that) Mary wrote?

The phenomenon has for quite some time represented a deep problem for syntactic theory in that a plausible trigger for the intermediate movement steps is missing. believe/think embed declarative subordinate clauses, thus disqualifying a semantic trigger for movement. The intermediate movement steps standardly assumed in such long-distance dependencies seem to occur for merely formal reasons. I will side with this intuition and argue for a very simple idea:¹ Inter¯ mediate steps of successive-cyclic A-movement (3-a) and their obligatory nature

1 I should stress that Chomsky (2013) has independently had the same idea as I present in this section and which is partly published in Blümel (2012). Originally, my own approach was based on ideas from a talk by Chomsky (2010) in Stuttgart, which touched on the ways the EPP and the subject-in-situ generalization by Alexiadou and Anagnostopoulou (2001, 2007) could be derived by labeling. These ideas naturally suggested an extension to successive-cyclicity.

2.1 Successive-cyclic Movement as intermediate labeling Indeterminacies | 5

(3-b) derive from the need to endow the embedded clause with a syntactic category (a label). I adopt basic assumption of labeling as in Chomsky (2008, 2013). If the structure has the format {LI, SO}=α, where LI is a lexical item and SO a syntactic object – a phrase –, LI is detected as the label of α by a general labeling algorithm. The prevalence of endocentricity is thus epiphenomenal of Minimal Search and the fact that Merge mostly takes LI and SO as arguments. Problematic cases are {LI, LI} and {SO, SO}. In this book I am primarily concerned with the latter. I show that successive-cyclic movement derives from the need to endow structures of the format {XP, YP}=β with a label. Once XP moves from within phasal YP to YP’s sister position, a structure is created whose label cannot be detected, forcing further movement of XP. β’s label can now be detected by the labeling algorithm as a consequence of this movement, because XP’s trace is invisible to the labeling process. {hXPi, YP} is labeled Y at the point of movement (the next phase level). This process iterates, “propagating” up the problematic symmetry in each cycle. Successive-cyclic movement thus results from the interaction of two factors: phases enforce locality of movement and cyclicity, while the need to endow structures with a category by Minimal Search functions as a movement trigger. Of course, XP=DP and YP=CP/vP in most cases. I argue that shortest chain link-based and feature driven movement analyses of the phenomenon are dispensable, given these assumptions about Merge, phases and labeling. In particular, the latter analyses redundantly need labeling as conventionally understood, including its common stipulation (such as the one that the target of movement projects) and additional assumptions. As for the problem what stops the movement (witness the contrast (4-a)/(4-b)), I discuss two ideas: (4)

a. John wonders [what C Mary will eat] b. *What does John wonder Mary will eat?

The first is has been independently suggested in the literature by Collins (2002) and Boeckx (2008a), who observe that probing features might be candidates which provide an asymmetry needed to derive endocentricity. Boeckx’ principle is stated in (5): (5)

Probe-Label Correspondence Axiom: The label of {α, β} is whichever of α or β probes the other, where the Probe = Lexical Item whose uF gets valued.

The effect of (5) for (4-a) would then run as follows: subordinate C bears a Q-probe which agrees with the DP what. Raising the latter to the sister of the subordinate CP then yields {DP, CP}=γ. By (5) γ is labeled C. I discuss this way of solving the halting problem with an alternative by Chomsky (2013). His idea is that γ is labeled

6 | 2 Introduction

by a feature common to (the heads of) DP and CP, namely the question marker q (cf. Cable (2007)). I argue that the latter conception, while non-standard, has a number of interesting consequences and is to be preferred.

¯ 2.2 X-Immobility and Criterial Freezing In chapter 5 of the book I first propound a conceptual argument in favor of Chomsky’s (2013) idea that it is not the entire lexical item of an XP which provides the label for XP but a feature on that lexical item. For example, if a little v-head merges with a complement, say, a root, to yield {v, root}=α, the structure is labeled v, i.e. it is labeled by the entire lexical item, according to traditional assumptions. However, features not needed and indeed harmful for the representation of α enter into α’s label, such as, for instance, uninterpretable φ-features. These features are not interpretable at the semantic interface. Contrary to the common assumption within Bare Phrase Structure theory (Chomsky (1995a)) that the entire lexical item provides the label of the phrase, I argue that a more parsimonious theory involves a representation, where uninterpretable features are not part of the label of the structure, meeting desiderata of Full Interpretation. I argue that a labeling procedure based on Minimal Search is selective enough to pick a designated feature on the lexical item to meet and satisfy no more than demands of the Concep¯ tual Intentional system. I then suggest a simple principle, called “X-Immobility,” which restricts Internal Merge to [+Max] and [+Min]-categories. I show that this principle derives the ban on intermediate “projections,” which is independently well-established. Based on this, I show that the principle derives Criterial Freezing in a straightforward way: If indeed an {XP, YP} structure can be labeled by a prominent feature borne by both X and Y, it follows that XP and YP are rendered [Max]-categories at that point in the derivation. Hence, they are immobile by XIM. ¯ Criterial Freezing effects both in A- and A-movement are deducible this way. Finally, I turn attention to Hebrew “nested interrogatives,” (Preminger (2006)) exemplified in (6): (6)

Yosi yada [CP [et ma] Dan šaxax [CP [le-mi] Rina natna hle-mii het Yosi knew acc what Dan forgot dat-who Rina gave mai]] ‘Yosi knew what the thing was such that Dan forgot to whom Rina gave it.’

I argue that the construction in fact provides evidence for the idea that in {XP, YP}-structures where a prominent feature on X and Y is shared, “stability” ob-

2.3 ATB involves Forked chains

| 7

tains only if Y agrees with X (or vice versa). In particular, I argue that in nested interrogatives, the most deeply embedded C bears a q-probe, thus qualifying as an embedded interrogative selected by interrogative selecting predicates. However, q is inherited to a functional head F below CP (Richards (2007), Chomsky (2008)), so that the latter becomes a probe by inheritance, undergoing agree with the closest wh-goal in the domain of C. This goal raises to SPEC-FP and becomes criterially frozen in a fashion described before. A second wh-element raising to SPEC-CP results in {wh, CP}. Despite sharing of the feature Q on both elements, the con¯ figuration is unstable, forcing successive-cyclic A-movement due to the lack of agreement between the members. This is the way the nested interrogative pattern of Hebrew comes about. I show that effectively, the analysis is arguably simpler than the one endorsed by Preminger and addresses problems that remain open in his work.

2.3 ATB involves Forked chains In chapter 6 I suggest a novel approach to across-the-board ATB movement: (7)

Who did John meet and Mary kiss?

I supplement the analysis by new arguments for the existence of “forked chains,” which numerous recent works take issue with or seek alternatives for (Munn (1993), Zhang (2010, chapter 9), Nunes (2001), Citko (2011)). The analysis does not employ special devices such as empty operators (Munn (1993)) or null pronominals (Zhang (2010)) in the second conjunct, sideward movement (Nunes (2001)) or Parallel Merge (Citko (2011)). I show that “forked chains” (Ross (1967), Williams (1977)) are nothing special or unusual within a minimalist framework, once the mechanism of chain formation is properly formulated. The empirical arguments in favor of forked chains come from German, namely the existence of wh-copying across-the-board (8) and the novel observation of ATB-remnant movement (9): (8)

Wen hat Maria gemeint [wen Peter gesehen hat] und [wen Jens who has Mary meant who Peter seen has and who Jens betrogen hat] cheated on has ‘Who did Mary say that Peter saw and that Jens cheated on?’

(9)

[VP X Gelesen] hat [Maria [X das Buch] tVP ] und [Peter [X den Artikel ] tVP ]. read has Mary the book and Peter the article

8 | 2 Introduction

I show in detail that such facts confront analyses which deny the existence of Forked chains with serious problems. I develop an account of forked chains constructions and argue that for remnant-ATB certain provisos must be made with respect to the “identity condition” on movement chains, in particular that the condition must be slightly relaxed to accommodate (9). The suggested solution is based on the conception of trace invisibility (Chomsky (2000)), rendering the VPs in the conjuncts sufficiently non-distinct to count as copies in a movement chain. The general analysis of ATB I suggest rests on the assumption that coordination of the familiar asymmetric format [CoordP XP [Coord′ Coord [ YP ]]] (cf., inter alia, Kayne (1994)) derives by movement of either XP or YP from a symmetric core, which I call “coordinative core” CC. CC comes about by Merging XP with YP to yield {XP, YP}, both of which are of the same syntactic category (Chomsky (2013)). Upon Merger of the conjunct with CC, the symmetry between XP and YP needs to be broken to render CC labelable. ATB applies to CC, whose members are subject to a phase-based parallelism constraint (Kasai (2004)) which restates the Coordinate Structure Constraint (Ross (1967)). This parallelism constraint requires that the element to undergo ATB (call it ZP) must be in parallel structural positions within XP and YP. The actual ATB-step, then, involves movement of a single instance of ZP, optionally either out of XP or YP. Crucially, this movement is accompanied by chain formation formulated in terms of Minimal Search (Martin and Uriagereka (2011)). If Minimal Search targets {XP, YP}, ZP is ambiguously detected and identified as a chain member. Thus ATB and Forked chains are a by-product of Minimal Search for a member accessible both in XP and YP. The fuller theory of ATB, involves raising of either XP or YP to the sister-ofConjunct position. I discuss two scenarios of how this movement process relates to ATB.

3 Minimalist Reflections 3.1 Conceptual Underpinnings In a talk Chomsky gave some years ago,¹ he expressed regret over having introduced the term Minimalism, stressing that it basically means the normal pursuit of science. I agree. In a wider sense, I believe Minimalism is much in the tradition of earlier works and changes in Generative Grammar: empirical analyses that comprise non-redundant technology and unification of disparate phenomena and/or principles have been methodological hallmarks of the research paradigm since its inception. While this Occam’s razor heuristic can be easily conceived in principle, it often appears to be in need of being flanked by meta-principles, which guide the search for the best analysis. Within the Minimalist Program, such meta-principles have a particular ontological status and it this narrower sense of “Minimalism” which I believe has primarily engendered confusion and commotion (and still does). This small chapter is devoted to laying out some of these guiding principles and background assumptions. As such, it will also serve as the backdrop against the analyses of the subsequent chapters. I will nevertheless continue to use the term, intended to refer primarily to linguistic work by Chomsky and others over the last twenty years. The minimalist program shifted some of the goals of linguistic theory.² Before delving into this shift, let us first recall key questions of generative grammar, as laid out in e.g. Chomsky (1981a, 3) (1)

a. b. c.

What constitutes knowledge of language? How is knowledge of language acquired? How is knowledge of language put to use?

The answer to the first question is a description of the grammar of a speaker of a language (say, French), i.e. “a theory concerned with the state of the mind/brain of the person who knows a particular language.” An answer to the second question was some version of the Principles and Parameters P&P approach, a characterization of Universal Grammar, the innate language faculty which is the biological foundation for the acquisition of natural language grammars. Accordingly, “UG is a theory of the “initial state” of the language faculty, prior to any linguistic experience.” Chomsky (1981a, 4) The P&P as conceived in the early 1980s comprises

1 Cf. Chomsky (2010). 2 Cf. Boeckx and Hornstein (2010) for exposition of different periods. DOI 10.1515/9783110522518-003

10 | 3 Minimalist Reflections

a number of modules of the grammar, each of which was attributed to UG. Part of the theory is the so-called T-Model which encompasses four distinct levels of representation: Deep Structure DS, Surface Structure SS, Logical Form LF, Phonological Form PF: DS

(2)

SS LF

PF

Any sentence has a specific structural make-up at each level of representation. A generic rule applies between the level DS and SS as well as SS and LF respectively. This rule is called “Move α” and states to “move anything anywhere.” General as the rule is, its application is restricted by a number of principles and parameters listed below. The interaction of Move α with Principles and Parameters ensures that the grammar generates all and only grammatical sentences and excludes ungrammatical ones. – Principles applying at different levels of representation: ¯ – X-Theory (DS, Directionality of the head is parametrized.) – θ-Theory (DS) – Case Filter (SS) – Extended Projection Principle EPP (SS) – Binding Theory (SS?, LF?³) – Empty Category Principle ECP (LF) – Moreover: Theory of Barriers, Projection Principle (applies at all levels), Government Module, Control Module The minimalist program in part takes P&P as its starting point and asks why the faculty of language has the properties it does. There are many possible P&Pmodels, but which one is the right one? Minimalism seeks to address that question. In doing so, one goal is to show to what extent properties of the grammar can be attributed to demands or conditions of linguistic interfaces outside of syntax, in particular the sensory-motor system and the conceptual intentional system, roughly sound and meaning. These systems interface with the syntactic levels of representation formerly known as PF and LF. In addition, one of the hypotheses Chomsky proposed as part of the program is that the “computational system (‘syntax’) central to human language is an ‘optimal’ solution to the central

3 There was some debate about the point in the derivation at which the Binding Theory applies.

3.1 Conceptual Underpinnings | 11

task of language: relating sound and meaning.” (Boeckx and Hornstein (2010)) Showing that this is the case requires a number of things: with respect to syntax, it means operationalizing “optimality,” non-redundancy, etc. by means of specific syntactic theories which allow for empirical predictions. Postulating syntactic mechanisms must be well-motivated and is subject to strict criteria: Can we motivate the device in terms of interface conditions? Or is the device the computationally most efficient one?⁴ With respect to the interfaces, it means finding out what their specific properties and conditions are. In this context, it is worthwhile to consider the following quote (Chomsky (2005)): We can regard an explanation of properties of language as principled insofar as it can be reduced to properties of the interface systems and general considerations of computational efficiency and the like. Needless to say, these “external” conditions are only partially understood: we have to learn about the conditions that set the problem in the course of trying to solve it. The research task is interactive: to clarify the nature of the interfaces and optimal computational principles through investigation of how language partially satisfies the conditions they impose, not an unfamiliar feature of rational inquiry.

If interface conditions as well as conditions of efficient computation were given a priori, the life of minimalists would be much easier – but language is what it is and not what we like it to be. What Chomsky describes is an oscillating research process: minimalist investigation is theory formation with an eye on plausible interface conditions. If successful, a part of this process involves illuminating and clarifying interface conditions, which in turn informs further syntactic theorizing. Also, much work in recent years has gone into investigating so-called Third Factor principles, i.e. principles of efficient syntactic computation. Chomsky (2005, 6) characterizes the three factors of language design as follows: (3)

Genetic endowment [UG, . . . ]

(4)

Experience, which leads to variation, within a fairly narrow range, as in the case of other subsystems of the human capacity and the organism generally.

(5)

Principles not specific to the faculty of language.

The third factor falls into several subtypes: (a) principles of data analysis that might be used in language acquisition and other domains; (b) principles of structural architecture and developmental constraints that enter into canalization, organic form, and action over a wide range, including principles of efficient computation, which would be expected to be of par-

4 The notion of computationally efficient is not absolute but must be interpreted relative to a given model of syntax. It is thus not surprising to find a certain choice of mechanism in one of Chomsky’s writings and a different one in another one.

12 | 3 Minimalist Reflections

ticular significance for computational systems such as language. It is the second of these subcategories that should be of particular significance in determining the nature of attainable languages.

Third factors are language-external and language-independent and thus not a part of Universal Grammar, but general economy principles that apply to physical phenomena more generally. We will see later how such an abstract principle enters syntactic theory with concrete empirical predictions. In the course of developing syntactic theory in such a minimalist fashion, properties formerly attributed to Universal Grammar UG change their status from ¯ being axioms of the theory to being epiphenomena. These include X-Theory, Movement, the Empty Category Principle ECP, the Extended Projection Principle EPP, Binding Theory, the Control module, etc. Compare this to developments in physics within the last century: Gravity described in terms of the general theory of relativity is not a force – treated as an axiom within a Newtonian approach – but a side-effect of curved spacetime: objects only appear to be attracted to other objects in accordance with their mass, but according to the theory of relativity move on the spacetime bent by matter. Part of the excitement and fascination of doing science this way is the shift in perspective when looking at known phenomena rather than discovering new ones – although this perspective does not, of course, preclude the discovery of new ones, quite to the contrary. Just as physicists reinterpret physical phenomena in the light of new theories which give rise to profound insights, Minimalism seeks to uncover fundamental principles of grammar, which underly properties of the P&P-era. Just as, ideally, force becomes a dispensable notion in physics, concepts employed within earlier theories of Generative Grammar turn out to be not only superfluous, but actually informulable (as repeatedly stressed by Chomsky). Other parts within the architecture of the grammar that have either been abolished wholesale or reinterpreted in different ways are the levels of representation DS and SS: the latter has been replaced by the operation Spell-Out, which delivers pieces of structure to the LF and PF, while the former is, if anything, now understood under the rubric of the vP-phase. I will have to say more about that in the next chapter. This book deals primarily with the syntactic side of the language faculty and thus has only occasionally something to say about interactions between the computational system and the interfaces. Before I will engage in specific empirical analyses, I would like to comment on what I take to be the simplest form of the hierarchical structure building operation dubbed Merge. Part of the motivation for doing so is that varying views of the operation are to be found in the literature, in particular when it comes to explicating what movement amounts to. I will show

3.2 On Movement as Internal Merge

| 13

that criteria along the lines sketched above can be consulted to decide between alternatives.

3.2 On Movement as Internal Merge The purpose of this section is to offer reflections on what has been called simplest Merge (SM) and its potential to unify structure building (base-generation of old) with syntactic displacement. I will describe its properties, highlight parallelisms between its external and internal application, and compare it to two other conceptions of Movement in the literature: First, what I would like to call “traceless Movement” (TLM), represented by work by e.g. Starke (2011) and, tentatively adumbrated by Müller (2011, Chapter 3.7)⁵ and multi-dominance, represented by work by numerous authors (e.g. Abels (2012)). To the best of my knowledge, no one has claimed that external Merge, i.e. the extraction of lexical items from the lexicon leaves a trace in the lexicon. Likewise, no one has suggested that external merge of α from the lexicon deletes α from the lexicon (such that α is subsequently unavailable). The same is not true for internal Merge, i.e. movement. Here such claims are commonly implicitly adopted. This asymmetry has persisted and characterized long periods of generative grammar.⁶ I believe that much recent work and, in particular, approaches to movement like the ones alluded to above continue this tradition and the reasons, in my view, stem in part from an insufficiently pronounced description of elementary properties of Merge (SM), despite much effort. I will conclude that if one wants to maximize parallels between internal and external Merge, the “copy theory” is virtually unavoidable. Taking the parallels between internal and external Merge to be a cardinal point, I will compare SM with the other two versions of Merge listed above. I will show that the above conceptions of Merge are forced to introduce an asymmetry between external (base-generation) and internal Merge (movement), and thus fall short of a true unification of the two, while SM remains as the only version that achieves unification. It seems to me that a lot of attention has been devoted to movement as internal Merge, but close attention to a crucial property of external Merge with an eye on consequences for its internal counterpart is missing. As I will show, it is external Merge which can serve as a model to what movement should be like, assuming that assimilation of the two is a desirable trait. 5 The authors start from completely different motivations and with very different applications. Müller points to important consequences this idea has for operator-variable relations in formal semantics. 6 Chomsky has repeatedly called this a mistake of his own.

14 | 3 Minimalist Reflections

A TLM conception assumes that internal Merge literally does not leave anything, neither trace nor copy in the launching site of a displaced element. Upon movement of XP, XP’s original position is just empty as indicated by a dash in (6-b): (6)

TLM a. b.

[ Y XP ] [ XP [ . . . [ Y – ]]]

Suppose we retain the notion of a lexical array or a lexical subarray.⁷ Suppose further we want to maximize parallelisms between external and internal Merge. It seems to me that the TLM conception could then arguably be a quite sensible idea: Assuming that upon selection of a lexical item from the lexical subarray, nothing is left in there,⁸ an analogous result appears quite appropriate. However, if external and internal Merge differ only with respect to the source of the syntactic object that undergoes Merge, this conception begs the question what happens to elements extracted directly from the lexicon. Epstein et al. (2014) have suggested recently that such an analysis is empirically viable: Do such elements too “disappear” (finally, Merge applies to them)? The common and indeed necessary view seems to be that elements from the lexicon remain there in the course of a derivation. Under these simple assumptions and the principle of maximizing parallels between internal and external Merge, TLM encounters the problem mentioned. A solution to the problem above for TLM is, of course, to say that internal Merge is in crucial respects different from external Merge: Assuming that a lexical subarray can indeed be dispenses with, external Merge leaves, while internal Merge does not leave, a copy. This begs the question how and why such an asymmetry should exist. The unification is lost then. Let me now turn to a discussion of Multi-dominance (MD). MD takes movement to be literal re-Merger of a syntactic object α, i.e. α undergoes an initial Merger (external Merge) and subsequently this token α undergoes re-Merger (movement) elsewhere in the structure (typically higher). The effect of re-Merger is that α simultaneously occupies two positions, definable by context. For example, α in (7), α simultaneously occupies the sister position of β and γ:

7 Cf. Chomsky (2000). 8 I abstract away from the idea that the operation that extracts elements from the lexical subarray is somehow different from Merge or that the subarray comprises lexical items with a numerical index that is reduced at each instance of selection.

3.2 On Movement as Internal Merge

(7)

| 15

→ β

α γ

β

α

In this section I will highlight certain conceptual shortcomings of MD (cf. Larson (2016) for independent points of criticism); in chapter 6 I offer a discussion of empirical problems for an MD-view of ATB-movement. Similar considerations as in the TLM conception apply to MD: In all MD approaches I am aware of, an asymmetry between external and internal Merge is – often tacitly – assumed: While internal Merge of a syntactic object α yields a result where α occupies two positions at once, external Merge must crucially involve copying into the current derivation the extracted lexical item. That is, we never have the analogue of MD as the effect of external Merge. Such a situation would be one in which a lexical item simultaneously occupies two positions: It is part of the lexicon and at the same time occupies a position in the current derivation. For example, it means that a verb like loves introduced into the derivation is simultaneously part of the lexicon and part of the derivation, i.e. it is “multiply dominated” in a structure of the current derivation and by the lexicon – clearly, an absurd result. Unless we introduce an asymmetry between external and internal Merge – which requires justification – it is hard to avoid that conclusion. As far as I can see, we are left with the choice to either introduce such an asymmetry and concede that external Merge involves copying while internal Merge does not, or external Merge becomes incoherent. This concludes my discussion of MD. After this exposition of SM, I would like to show how a simple and novel ana¯ lysis of successive-cyclic A-movement can be developed, crucially based on SM.

4 Propagating Symmetry ¯ 4.1 The Syntax of Successive-Cyclic A-Movement ¯ The central problem in the syntactic analysis of successive-cyclic A-movement has been to identify the trigger of intermediate movement steps, i.e. steps which create gaps as in (1): (1)

What did John say e that Mary likes t?

As evidence and plausibility is lacking that movement to these positions comes about by feature specifications of the subordinate C-head – the embedded clause is not interrogative –, the phenomenon appears to defy feature-driven approaches to movement. A semantic movement trigger which attributes the incentive to move what to scopal properties of the matrix C head faces a problem of global computation: one analytical option is that what “knows” where to move from the beginning of the derivation. This approach involves potentially massive look-ahead, because the final landing site can be infinitely far away, structurally speaking. Alternatively, the wh-element is forced to move when the terminal C head is merged, which begs the question why movement is partitioned the way (1) suggests, i.e. why e is created. Bošković (2011, 328) aptly characterizes approaches to movement within Minimalism, saying that the guiding principle has been Last Resort: “movement must happen for a [formal] reason.” Movement thus salvages a formal inadequacy, and he goes on to classify existing approaches as follows: (a) the inadequacy is always in the target (Attract); (b) always in the moving element (Greed); (c) in the target or in the moving element (Lasnik (1995)’s Enlightened Self-Interest). The current study aligns with the idea that movement is a Last Resort operation. But the trigger for movement is neither in the target nor in the mov¯ ing element alone. In this chapter I propose a novel idea of long distance Adependencies, based on a recent idea (Chomsky (2013), Ott (2011a)) that XP-YP structures, i.e. merger of two complex phrases, yields an unlabelable, in a sense, “too symmetric” structure. Movement serves as a means to repair this defect. So although movement happens for a formal reason, it is not triggered by any of the inadequacies listed in (a)-(c). Rather Merge (both internal and external) creates structures, some of which cannot receive a label unless these structures are manipulated (by movement). Endocentric projection is implicit in all previous analyses, all of which employ additional means to trigger or partition movement. I show that such add-ons are not needed and that in order to derive successivecyclic movement, phases and symmetry-breaking movement for the purposes DOI 10.1515/9783110522518-004

4.2 The Evidence

| 17

of (a derivative of) projection are sufficient. The following passage in Chomsky (2015)’s independently developed approach sums up the current view: [The Labeling Algorithm] is trivial for {H, XP} structures, H a head. In this case, [the Labeling Algorithm] selects H and the usual operations apply. The interesting cases are {XP, YP}, neither a head, in which case [the Labeling Algorithm] finds {X,Y}, the respective heads of XP, YP, and there is no label unless they agree. In that case, the label is the pair of the agreeing elements. An element raised by [Internal Merge] to create this structure is in what Rizzi calls a ‘criterial position.’ It follows that [Internal Merge] is successive-cyclic, driven by labeling failures, continuing until a criterial position is reached

The chapter is organized as follows: in section 4.2 I recapitulate familiar pieces of evidence in favor of successive cyclicity, and in section 4.3 I present and discuss previous approaches to the phenomenon. In section 4.4 my proposal is laid out, and in section 4.5 I discuss theoretical ramifications and in particular address the question of the timing of labeling. Section 4.6 summarizes the chapter.

4.2 The Evidence ¯ In this section I will survey the pieces of evidence for successive-cyclic A-movement, i.e. the idea that long distance dependencies result from establishing sev¯ eral local A-dependencies as schematized in (2), and not from one non-local movement process e.g. movement in one fell swoop (3): (2)

[CP whati do you think [CP t i that Mary did ti ]]

(3)

[CP whati do you think that Mary did ti ]

The idea that long distance dependencies come about in such a piecemeal fashion goes back to Chomsky (1973).¹

¯ 1 Although widely accepted today, the idea that long A-dependencies partition this way is not entirely uncontroversial. Based on an analysis by Rackowski and Richards (2005), den Dikken (2009) in particular has recently cast doubt on the validity of this conclusion, discussing, among others, some of the data presented here and proposes alternatives to movement via intermediate SPEC-CP positions. He concedes successive cyclicity only via SPEC-v*P-positions, SPEC-CP being an inherently terminal position, never a stopover/escape hatch. Neither the empirical evidence he advances to argue against successive cyclicity, nor the conceptual motivation behind his own alternative appear to me convincing. In fact, I fail to see a conceptual motivation.

18 | 4 Propagating Symmetry

Initially related to Subjacency, successive cyclicity has mostly been viewed as being a necessary effect of locality restrictions on movement:² long-distance dependencies as conceived of in (3) violate these well-motivated conditions. An analysis in terms of successive cyclicity reconciles this contradiction: long-distance movement consists of several local subtransformations, each of which obeys locality restrictions. Over time, a considerable body of crosslinguistic evidence has confirmed and consolidated the hypothesis.³ Initially, the hypothesis was motivated mainly conceptually.

4.2.1 The successive Cyclicity Hypothesis within the Theory of Phases In this subsection, I will present some of the pieces of evidence that have accumu¯ lated over the years in support of successive-cyclic movement in A-movement. An important question is: what are the exact touch-down points in successive-cyclic movement, i.e. what are intermediate landing sites? For almost two decades the idea has been around that next to SPEC-CP, long distance movement targets the verb phrase (Chomsky (1986)). In the wake of phase theory (Chomsky (2001) et seq), the question of whether or not DPs (Svenonius (2004), Heck (2004)) or PPs (Abels (2003)) count as intermediate landing positions have become pressing. In the following outline I will confine myself to evidence in favor of successive cyclicity via the nodes CP and vP. Before delving into the pieces of evidence, I will first touch on the conceptual motivation for phases and then describe its mechanics.⁴ Consider the the explanation in Chomsky (2004, 124): Ideally, phases should have a natural characterization in terms of [Interface Conditions]: they should be semantically and phonologically coherent and independent. At SEM, vP and CP (but not TP) are propositional constructions: vP has full argument structure, and CP is the

2 Conditions on transformations have undergone various incarnations and reformulations since subjacency, such as bounding nodes (Chomsky (1981b)), barriers (Chomsky (1986)), shortest move (Chomsky (1993, 1995b)), phases (Chomsky (2001) et seq), etc. 3 Apart from the facts presented here, evidence comes from, among others, stylistic inversion in French (Kayne and Pollock (1978)), tonal downstep in Kikuyu (Clements et al. (1983)), resumption in Hebrew (Borer (1984)), selection of subject pronouns in Ewe (Collins (1993)), wh-agreement in Chamorro (Chung (1994)), and, more recently, question particles in German (Bayer (2010)). 4 The conceptual motivations advanced over the years in favor of phase theory were varied. I refer the inclined reader to Chomsky’s technical work since 2000 as well as inter alia the contributions in Gallego (2012), Ott (2009b) and Richards (2004, 2007). For critical discussion, cf. Boeckx and Grohmann (2007), Grewendorf and Kremers (2009) and Ott (2013), among others.

4.2 The Evidence

| 19

minimal construction that includes Tense and event structure and (at the matrix, at least) force.

Phases are “propositional” units in the qualified sense of having Tense and force (CP) and full argument structure (vP). Regarding the CP/TP-level, Chomsky assumes that Tense is derivatively on T, not inherently: T may be specified for Tense only in the presence of C. This dependency between C and T is then fleshed out in terms of the notion Feature Inheritance. Moreover, it is C which is responsible for force, which arguably entails that root clauses cannot be TPs or in fact anything other than C. Regarding the vP, this unit assumes functions traditionally ascribed to Deep Structure within the GB-era, namely the assignment of thematic roles. In this way, the hypothesis that vP is a cyclic node recasts the traditional conception that θ-role assignment derivationally precedes operations that feed surface structural representations.⁵ Phases serve as formal instructions for specific interpretations at the Conceptual Intentional systems. Within phase theory, the idea is that a given phase head triggers the operation transfer. The term transfer refers to the idea that pieces of structure created by Merge are periodically handed over to the semantic and the phonological component⁶ to be interpreted there. Syntactic material inside a transferred unit cannot undergo further syntactic operations. This is captured in the Phase Impenetrability Condition of which Chomsky proposes two versions. I will dub the first and stricter version plainly PIC and describe it first. I will call the second, more relaxed version PICweak and describe it subsequently. Consider (4): (4)

Phase Impenetrability Condition (PIC Chomsky (2000, 108)): In phase α with head H, the domain of H is not accessible to operations outside α, only H and its edge are accessible to such operations.

Suppose K is a head which triggers some operation (e.g. agree). Then the material that is accessible for K is distributed as illustrated in (5), where the non-accessible portion is the complement of H: (5)

[ K [α β H [. . . γ . . . ] ]] | {z } | {z } accessible inaccessible

5 Notice that the match is close but not perfect in that e.g. the transitive v-phase assigns accusative Case – the Case filter applies – as a reflex of the operation agree within that cycle already and thus “at Deep Structure” as it were. 6 By using the word “phonological” interface I have in mind the sensory-motor system more generally, including modes of language externalization other than phonological ones, such as signs as used in sign languages.

20 | 4 Propagating Symmetry

Transfer applies at every phasal node, which means that Transfer is cyclic and so is the necessity to “rescue” a long-distance moved phrase by displacement out of the transferred – and hence opaque – portion. The landing side of such an escape hatch movement is a specifier of the phase head, part of the so-called phase “edge,” which comprises the phase head itself as well as any specifier of the phase head. As noted, movement of an XP must proceed via phase edges, lest XP be caught in the transferred domain, i.e. movement to the edge ensures that XP remains accessible for further operations in the next cycle. By this logic, the spe¯ cific touch down points in successive-cyclic A-movement correspond to the phase heads, as illustrated in (6): (6)

Who did Mary [vP v+say [CP that Bill [vP [v+saw ]]]]

The transferred domain, or “domain” for short, is the complement of the phase head. To make things more perspicuous, take for instance the derivation of the verbal phrase under the assumption that the functional head v is a phase head: (7)

a.

VP

Merge(V, DPObj ) V

b.

DPObj

v’

Merge(VP, v) v

VP V

c.

Transfer VP

DPObj

v’ v

VP V

d.

DPObj vP

Merge(v’, DP Subj ) DPSubj

v’ v

VP V

DPObj

4.2 The Evidence

e.

| 21

T’

Merge(T, vP) T

vP DPSubj

v’ v

VP V

DPObj

The VP and all material in it are inaccessible to subsequent syntactic operations at step (7-c) in the derivation, when the complement of the phase head undergoes Transfer. So for example, the PIC predicts that movement of the DPObj is impossible after step (7-c) in the derivation.⁷ Movement of the object to the edge of the vP is thus a prerequisite for any long-distance dependency. Let us now consider the weaker version of the PIC. Empirically, this version was motivated by the fact that we appear to find instances of syntactic operations which stretch over little-v heads. In particular, Icelandic nominative object constructions exhibits agreement between ϕ-probe on T and the object (taken from Richards (2012, 198) and corrected): (8)

Henni leiddust þeir. her.dat bored.3pl they.nom ‘She had found them boring.’

Thus what we appear to find, structurally, is something like (9) where T agrees with the internal argument: (9)

[TP T ϕ [vP v [VP V DP ]]]

If v triggers Transfer of its complement, the prediction is that the internal argument DP becomes inaccessible to syntactic operations outside the vP – agree between T and the DP is unexpected. Thus, a weaker version that delays Transfer of the domain of v appears to be empirically required. Such a delay is what the PICweak provides for:

7 This description in fact raises the issue of how movement of the object can take place at all. It seems as if its movement has to precede Merger of the external argument or Transfer be delayed until after the external argument is merged.

22 | 4 Propagating Symmetry

(10)

Phase Impenetrability Conditionweak (PIC weak Chomsky (2001, 14)): Given a structure [ZP Z . . . [HP α [H YP]]], with H and Z the heads of phases: The domain of H is not accessible to operations at ZP, only H and its edge are accessible to such operations.

With this second version in place, Transfer of the VP applies at the point the second phase head, i.e. C, is merged. The ϕ-probe on T can thus access the internal argument in (9) and agree can be established. The question of the relationship between these two versions of the PIC has engendered quite some discussion. Epstein et al. (2016) give what is in my view quite an attractive unification of the two, effectively giving an answer to the question why it is two versions that we have (and not, say, three or any number of PICs, cf. Richards (2012)). With this phase-wise derivation model, the traditional T-Model of the GB-era receives a reconception along the lines schematized in (11): (11)

Numeration/Lexikon Spell-Out LF

Spell-Out

PF

LF

Spell-Out

PF

LF

...

PF

Thus, it is not the global output of the derivation which is handed over to the interfaces, but small chunks, each of which has, by assumption, a specific coherent and independent interpretation on both sides. With these remarks as a background, let us survey the empirical indications for this choice of cyclic nodes. 4.2.1.1 The Evidence for successive Cyclicity in SPEC-CP The first piece of evidence is semantic in nature, namely binding options made ¯ possible by non-local A-movement of a wh-phrase. Consider the baseline sentence in (12-a)/(13-a), indicating that the anaphors inside the object-DPs are subject to Principle A of the Binding Theory, i.e. they must be locally bound. Once the wh-phrases move to the matrix clause as in (12-b)/(13-b), however, this interpretational option becomes available, i.e. the sentences turn ambiguous:

4.2 The Evidence

(12)

(13)

| 23

Semantic Evidence: Intermediate Binding-options (Barss (1986, 2001)) a. b.

Who said that Johni thinks that Billj bought pictures of himself*i/j ? [Which pictures of himselfi/j ] does Johni think that Billj bought?

a.

The meni believed that the womenj had placed (these) [portraits of [themselves/each other]*i/j ] in a scrapbook. I wonder [which portraits of [themselves/each other]i/j ]k the meni believed that the womenj had placed tk in a scrapbook.

b.

¯ If we assume that A-movement can reconstruct into the intermediate SPEC-CP position the possibility of these readings could be accounted for as follows: Assume that movement is internal Merge, as conceived above in chapter 3. Then it narrow syntactically delivers identical copies in the base position as well as the intermediate SPEC-CP position: (14)

[Which pictures of himself] does John think [which pictures of himself] that Bill bought [which pictures of himself]

(15)

a.

b.

I wonder [which portraits of [themselves/each other]] the men believed [which portraits of [themselves/each other]] that the women had placed [which portraits of [themselves/each other]] in a scrapbook. I wonder [which portraits of [themselves/each other]] the men believed [which portraits of [themselves/each other]] that the womenj had placed [which portraits of [themselves/each other]] in a scrapbook.

With these syntactic representations, copies are accessible at the interface and explain the added binding options. Moreover, Interface Conditions on the interpretation of copies are needed to restrain their interpretation such that not all of them receive an interpretation, cf. Chomsky (1993) for one such approach. It is important to point out, however, that while these occurrences are accessible for the interfaces, they are invisible to the computational system as I will detail below. The second piece of evidence comes from word order. Subject-auxiliary inversion SAI is occasionally taken to be a reflex of local wh-movement in standard English as well as many dialects such as West Ulster English. However, (16-a)/(16-b) show that in the latter dialect SAI exists in non-local wh-movement, a phenomenon which is absent in standard English: (16)

Subject-auxiliary inversion (Belfast English) a. b.

Who did John hope [did we see]? What did Mary claim [did they steal]? (Henry (1995))

24 | 4 Propagating Symmetry

If wh-movement touches down in the intermediate SPEC-CP position, locally triggering SAI, (16-a)/(16-b) could receive an account: the difference between standard English and West Ulster English might then be due to a phonological (headmovement, cf. Chomsky (2001)) parameter that either prohibits or demands SAI in the presence of a phonologically deleted copy of the wh-word in SPEC-CP. An argument somewhat similar in spirit comes from Grewendorf (2002, 220). It concerns distributional patterns of the left periphery in subordinate clauses in German. He argues in favor of successive cyclicity on the basis of verb second V2 complement clauses: (17)

a.

Peter glaubt [Maria hat den Trainer von Unterhaching gesehen] Peter believes Mary has the coach of Unterhaching seen ‘Peter believes Mary has seen the coach of Unterhaching.’ b. *Peter glaubt [hat Maria den Trainer von Unterhaching gesehen] Peter believes has Mary the coach of Unterhaching seen

SPEC-CP needs to be occupied in embedded V2-contexts as the ungrammaticality of (17-b) shows; once a constituent occupies the position immediately preceding the finite verb the sentence becomes fine (17-a). (17-b) forms a minimal pair with (19), where long distance movement takes place. The sentence is fine despite the lack of a pronounced constituent before the finite verb in the embedded clause. And indeed, placing a constituent before the finite verb yields an ungrammatical result (20-a)/(20-b), showing that long distance movement cannot “skip” the intermediate SPEC-CP-position, which can only be occupied by a single constituent.⁸ (19)

Wen glaubt Peter [hat Maria gesehen]? who believe Peter has Mary seen

(20)

a. *Wen glaubt Peter gestern hat Maria gesehen? who believe Peter yesterday has Mary seen b. *Wen möchtest Du wissen, wer gesehen hat? who would like you know who seen has

8 At first glance, German Left Dislocation (18) appears to be a challenge to this generalization: (18)

[Den Jungen], [den] habe ich gesehen. the boy him have I seen

I refer the reader to Ott (2014) for an biclausal analysis of this construction that retains the V2-rule.

4.2 The Evidence

| 25

Based on such facts, Grewendorf concludes that (19) is really a V2-clause in disguise, namely the effect of an unpronounced wh-copy in subordinate SPEC-CP, triggering V2.⁹ Also, intermediate copies of simplex wh-words in colloquial German provide fairly transparent traces of successive cyclicity. Assuming the copy theory of movement, so called wh-copying (in German and other Germanic languages, cf. among others, Felser (2004)) provides suggestive clues about the path of longdistance wh-movement: 9 This line of reasoning presupposes that glaubt Peter in (19) is a matrix clause and not an integrated parenthetical within a V2 root clause, which has been under debate for quite some time (cf. among others Reis (1995), Bayer (2006), Kiziak (2007)). The following facts support Grewendorf’s stance. Consider (21): (21)

Wen glaubt Peter habe Maria gesehen? who believes Peter has.sbjv Mary seen

The verb is in subjunctive mood, which is unavailable in matrix contexts: (22)

*Wen habe Maria gesehen? who has.sbjv Mary seen

The argument along the lines of Reis, Bayer, etc. is confronted with the problem that either (21) is predicted to be bad or (22) predicted to be good, both contrary to fact. Along the same lines, we can construct an argument from variable binding effects and reconstruction of Principle C. (I am grateful to a reviewer for providing me with the hint at the variable binding effects which deliver sharper results than the ones from Principle C – and I concur with his/her judgments.) (23)

a.

b.

(24)

*Welchen Bericht über ihni , und das hat jeder Angestelltei bestätigt, hat nur which report about him and that has every employer confirmed has only Maria verfassen können. Mary written could Welchen Bericht über ihni hat jeder Angestelltei gemeint, hat nur Maria which report about him has every employer thought has only Mary verfassen können. written could

a.

Welchen Bericht über Hansi , und eri hat dies bestätigt, hat Maria verfasst? which report about Hans and he has this confirmed has Mary written b. ??Welchen Bericht über Hansi hat eri gemeint hat Maria verfasst? which report about Hans has he thought has Mary written

The examples show a contrast with respect to reconstruction of the restriction of the whexpression. A bound variable construal is available in (23-b) suggesting proper integration of what looks like the host clause hat jeder Angestellte gemeint. By contrast, a parenthetical apposition as in (23-a) disallows that interpretation and suggests structural orphanage of the parenthetical clause. Likewise, in (24-b) a Condition C-effect can be observed while there is none in (24-a). These facts suggest that part of the wh-expression is reconstructed in the subordinate SPEC-CP position in (23-b)/(24-b) while no such reconstruction obtains in (23-a)/(24-a), where the string und er hat dies bestätigt is uncontroversially parenthetical.

26 | 4 Propagating Symmetry

(25)

Wen denkst Du wen Maria gesehen hat? who think you who Mary seen has ‘Who do you think Mary has seen?’

Assuming the copy theory of movement, special pronunciation rules are required to explain the multiple phonological occurrences of the wh-words (cf. Nunes (2004) for one approach). A somewhat similarly phenomenon comes from West Ulster English which allows for Q-stranding in intermediate SPEC-CP positions: (26)

Quantifier float (West Ulster English, McCloskey (2000)) a. b. c. d.

What all do you think that he’ll say that we should buy? What do you think all that he’ll say that we should buy? What do you think that he’ll say all that we should buy? What do you think that he’ll say that we should buy all?

Let me stress that under the analysis suggested in this book, such facts cannot be accounted for in terms of structurally leaving the quantifier in intermediate positions. Anticipating the analysis to follow, the problem is that once a quantified phrase occupies an intermediate SPEC-CP position and subextraction of what takes place, α either cannot be labeled at all, or is labeled by the stranded quantifier, both of which are problems: (27)

[α [ all twhat ] [ C=that . . . ]].

One solution to this problem, which is in principle compatible with the assumptions in this book is employing the Distributed Deletion theory advanced by Fanselow and Çavar (2002). Under that conception, it could be that the entire QP ¯ undergoes successive-cyclic A-movement all the way to its terminal position. If movement leaves copies, it is the standard that the entire lower copies are deleted, phonologically. According to Fanselow and Çavar (2002), it is conceivable that deletion may selectively target material both in the upper and in the lower copy: schematically, suppose deletion targets X but not Y in an upper copy [YP Y X], then X is pronounced in the lower copy while Y undergoes deletion there. So it could be that (26) receives a treatment as in (28). (28)

a. b. c.

[ [what all] . . . [what all] [ that . . . [what all] [ that . . . ]]] [ [what all] . . . [what all] [ that . . . [[what all] [ that . . . ]]] [ [what all] . . . [what all] [ that . . . [what all] [ that . . . ]]]

Fanselow and Çavar (2002, 15) advance a pragmatic constraint that requires two different grammatical functions in two different positions, effectively forcing their

4.2 The Evidence

| 27

pronunciation in different positions: “Suppose that XP bears a feature f1 that requires that XP be overtly realized in position A, and an additional feature f2 that forces XP into position B. Then XP is split up in languages like Croatian or German.” A similar take for (28) would have to show that the stranding positions have independent grammatical characteristics in West Ulster English, allowing the specific profile that we see. Let me finally address morphological evidence from Irish. The complementizer go is similar to the English one that, (29-a). Relative clause contexts such as (29-b) shows that the Irish complementizer system is sensitive to the absence or ¯ presence of an A-dependency: in the presence of this dependency, a shows up; the superscripted Lindicates lenition: (29)

Morphological Evidence: Complementizer Alternations (in Irish, McCloskey (2002)) a.

b.

Creidim gu-r inis sé bréag. I-believe GO-pst tell he lie ‘I believe that he told a lie.’ [an ghirseach]i a ghoid na síogaí ti the girl aL stole the fairies ‘the girl that the fairies stole away’

¯ In (30) the A-type complementizer a shows up in each subordinate clause. This is accounted for if the head of the relative eleven years stops over in every intermediate SPEC-CP position, reiterating the local pattern in (29-b): (30)

Aon bhliaim déag is dóigh liom a deireadh m’athair a one year ten a L -COP [prs] I-think aL say [pst-hab] my father aL bhí sé nuair . . . was he when ‘It’s eleven years old that I think my father used to say that he was when ...’

The pieces of evidence surveyed in this section are After this survey of data, which speak for the idea that long distance displacement involves landing in intermediate SPEC-CP positions, let me now turn the idea that movement is even further partitioned, namely by requiring intermediate landing in SPEC-vP as implicated by phase theory.

28 | 4 Propagating Symmetry

4.2.1.2 The Evidence for successive Cyclicity in SPEC-vP In the literature, observations and arguments in support of successive cyclicity via SPEC-CP have been much more prominent than such for movement via vP-SPEC as sketched in (31): (31)

[CP whati does Mary [vP ti v [ read ti ]]]

The reason for this asymmetry is presumably partly historical: the former concept preceded the latter by more than a decade so that there was simply more time for evidence to accumulate. It should be said that interest in the latter has arisen foremost with the wake of phase theory by Chomsky (2001, 2008),¹⁰ who suggested that finite CP and transitive vP are special nodes,¹¹ so called phases, requiring cyclicity of movement. What are the empirical arguments for cyclicity via SPEC-vP? The evidence in favor of movement via SPEC-vP, which I present here, is fourfold: First, it is semantic in nature and concerns scopal properties of moved elements. Secondly, it is syntactic in nature and involves stranding of a subpart of a constituent along the movement path. Thirdly, it is lexical in nature and concerns elements in the edge of the vP, whose presence or absence is conditioned by (long-distance) whmovement. Finally, it is again syntactic or distributional in nature and concerns cooccurrence restrictions vis-a-vis other phrases. Reconstruction Effects This section is devoted to a brief description of the semantic effects which have been advanced to support the idea that successive-cyclic movement proceeds via SPEC-vP. Fox (1998, 157 ff) gives examples to show that syntactic¹² reconstruction of material within complex wh-phrases must be possible into a position between the subject and the object. This is abstractly represented in (32-a), where QP is a universally quantified subject, followed by an object pronoun within the same clause; X and Y designate potential reconstruction sites. (32-b) represents a configuration with the positions of the subject and the object reversed:

10 A historical precursor of vP-phases is bounding node theory in Chomsky (1986), which at the time has not engendered as much empirical research on the issue of successive cyclicity via VP as the phase-theoretic recast, to the best of my knowledge. 11 Whether or not intransitive vPs count as phases is disputed, cf. Legate (2003) for arguments of favor of intransitive vP-phases. 12 Fox assumes that the possibility of Condition C effects in scope reconstruction indicates that syntactic reconstruction must be at work and not semantic reconstruction.

4.2 The Evidence

(32)

| 29

a.

[ Which . . . pronouni . . . r-expressionj ] . . . [TP QPi X [VP pronounj Y . . . ]]

b.

[ Which . . . pronouni . . . r-expressionj ] . . . [TP pronounj X [VP QPi Y . . . ]]

According to Fox’ logic, reconstruction into both X and Y in (32-b) is predicted to be bad, because of the concomitant Condition C violation induced by the coreferent pronoun. By the same token, schema (32-a) forces reconstruction in X if the bound variable reading is available and the sentence still grammatical; crucially, reconstruction must not target Y, a position c-commanded by the pronoun, lest a Condition C violation obtains. Pertinent examples from Fox are given in (33-a)-(34-b). I supplement them with the ones in (35-a) and (35-b) without an ECM-verb like in (33-a)/(33-b) and with judgments from informants: (33)

a.

[Which (of the) paper(s) that hei wrote for Ms. Brownj ] did every studenti get herj to grade? b. *[Which (of the) paper(s) that hei wrote for Ms. Brownj ] did shej get every studenti to revise?

(34)

a.

(35)

a. ?[Which paper that hei gave Maryj ] did every studenti persuade herj to read carefully? b. *[Which paper that hei gave Maryj ] did shej persuade every studenti to revise?

[Which (of the) paper(s) that hei gave Maryj ] did every studenti ask herj to read carefully? b. *[Which (of the) paper(s) that hei gave Maryj ] did shej ask every studenti to revise?

The bad examples (33-b)/(34-b)/(35-b) correspond to schema (32-b) and show that reconstruction for scope by every student is not possible. The reason is that that reconstruction site, which corresponds to position Y in (32-b), is c-commanded by a pronoun coreferential with the R-expression within the complex wh-phrase – a violation of Condition C of the Binding Theory. As the grammatical cases (33-a), (34-a) and (35-a) show, there is a reading in which the pronoun he is bound by every student. At the same time, reconstruction cannot target a position below her as this would induce a Condition C violation. Consequently, there must be a position below the subject and above her into which reconstruction takes place. This position corresponds to X in (32-a). Now if reconstruction is conceived as the semantic interpretation of (parts of) a copy left by movement (cf. Chomsky (1993)), it follows that movement must

30 | 4 Propagating Symmetry

target X before proceeding to its final landing site. What is X? The observations above do not force us to conclude that X is the specifier of v, but could likewise be interpreted as evidence for movement via SPEC-VP.¹³ However, evidence from other domains point to the conclusion that SPEC-vP is involved, as we will see later. This concludes the recapitulation of semantic evidence in favor of successive cyclicity in SPEC-vP. Let me now turn to an argument from stranding. Grammatical Markers Left by Movement In this paragraph I describe lexical or morphological evidence in favor of the idea ¯ that A-extraction involves intermediate touch-downs to the edge of vP. Mandarin Chinese provides the lexical clues and Dinka provides morphological traces for this hypothesis. I describe each phenomenon in turn. Mandarin Chinese features a functional particle suo, which according to Jiang (2008), provides evidence for successive cyclicity via SPEC-vP. The particle appears in relative clause structures and so-called long passives. First, (36) shows that the appearance of suo is restricted to object relatives and cannot appear when a subject (37-a) or an adjunct (37-b) is relativized: (36)

Lisi (suo) ai de ren Lisi suo love de man ‘the man that Lisi loves’

(37)

a.

b.

[[e] (*suo) mai naxie shu] de na ge ren suo buy those book de that clf man ‘the man who bought those books’ [Lisi (*/?suo) piping Zhangsan [e]] de yuanyin/fangfa/shijian Lisi suo criticize Zhangsan de reason/method/time ‘the reason/method/time that Lisi criticized Zhangsan’

The following contrast shows that sou can optionally occur in “long passives” (as indicated by the presence of a pronounced subject) as in (38-a) while it is impossible in short passives (38-b):

13 Here I skip the possibility that both SPEC-VP and SPEC-vP are involved, cf. Epstein and Seely (2002), Müller (2010), among others. The latter in particular has been a proponent of the idea that every phrase is a phase. Those conceptions are at odds with the remarks in section 4.2.1 about the vP phase corresponding to θ-role assignment and recasting Deep Structure.

4.2 The Evidence

(38)

a.

b.

| 31

zhe-xie shiqing bu neng bei tamen (suo) liaojie. these thing not can bei they suo understand ‘These things cannot be understood by them.’ zhexie shiqing bu neng bei (*suo) liaojie. these thing not can bei suo understand ‘These things cannot be understood.’

Moreover, suo may optionally appear in the context of unaccusative verbs: (39)

[na tiao he zhong (suo) chen [e]] de chuan bu-ji-qi-shu. that clf river middle suo sink de boat countless ‘Boats that sank in that river are countless.’

Interestingly and in contrast to (37-a), once a subject moves long distance as in (40), the structure turns grammatical: (40)

zhe jiu shi Zhangsan suo xuancheng [[e] mai le shi dong fangzi] this exactly is Zhangsan suo claim buy asp ten clf house de na ge ren de that clf man ‘This is the man who Zhangsan claimed bought ten houses.’

Furthermore, suo can appear multiple times as in (41): (41)

zhe shi [ [wo (suo) yiwei ] Zhangsan (suo) xihuan [e] ] de this is I clf mistakenly-thought Zhangsan suo like de na ge ren. that clf man ‘This is the man that I mistakenly thought that Zhangsan likes.’

Distributionally, there is evidence that suo is located in the vP-erea and not higher because manner adverbials must precede but not follow the particle: (42)

wo jiao-jin-nao-zhi suo zai (*jiao-jin-nao-zhi) jiejue de I twist-finish-brain-juice suo progress twist-finish-brain-juice solve de wen-ti problem ‘the problem that I am solving by racking my brain’

Criticizing previous analyses of the phenomenon, Jiang (2008) suggests “SUO can be seen to signal the ‘activator’ of the EPP feature of the phase head of vP which triggers successive-cyclic A-bar movement through the vP phase, required if the relativization targets an element inside the complement domain of v.” She thus

32 | 4 Propagating Symmetry

takes suo as a lexical indication for movement via SPEC-vP. When no movement via the edge of vP takes place as in local subject or adjunct relativization, suo is not licensed. By contrast, in object relativization, suo signals the presence of an EPP feature responsible for successive-cyclic movement via the edge of vP. (Jiang assumes that the optionality of suo in object relativization reflects the possibility of a ∅-exponent of the EPP-feature.) Admittedly, the arguments for successive cyclicity via SPEC-vP from Jiang (2008) are not as compelling as one would hope. I will now turn to what in my view are stronger reasons. Morphological evidence for intermediate touch-downs in the vP-edge, which loosely resemble the distributional behavior of the functional particle in Mandarin Chinese, comes from the Nilo-Saharan language Dinka. van Urk & Richards (2015) show that both local and non-local extraction of plural wh-elements is obligatorily accompanied by stranding of a plural morpheme ké in the left edge of the vP, marking the path of successive cyclicity. First, consider the contrast between (43-a) and (43-b), which shows that no mark is left by fronting of a singular whelement, while movement of a plural wh-element goes hand in hand by stranding ké before the verb: (43)

a.

b.

Yeŋà cíi Bˆol tˊiŋ. ˉ ¨ who prf.ns Bol.gen see ‘Who did Bol see?’ Yèyîŋa cíi Bˆol ké tˊiŋ. ˉ ¨ who.pl prf.ns Bol.gen pl see ‘Who all did Bol see?’

Omission of the plural morpheme ké in the context of plural wh-extraction is out as (44) shows: (44) *Yèyîŋa cíi Bˆol tˊiŋ. ˉ ¨ who.pl prf.ns Bol.gen see What is interesting is that in long-distance movement too the plural marker must show up on the edge of every vP crossed: (45)

a.

b.

Yeŋà yˊe tàak cíi Bˆol tˊiŋ. ˉ ¨ ¨ who impf.2sg think prf.ns Bol.gen see ‘Who do you think Bol saw?’ Yèyîŋa yˊe ké tàak cíi Bˆol ké tˊiŋ. ˉ ¨ ¨ who.pl impf.2sg pl think prf.ns Bol.gen pl see ‘Who all do you think Bol saw?’

4.2 The Evidence

| 33

What the three examples in (46) show is that the presence of ké is obligatory in all the vP’s passed by wh-movement: (46)

a. *Yèyîŋa yˊe tàak cíi Bˆol tˊiŋ. ˉ ¨ ¨ who.pl impf.2sg think prf.ns Bol.gen see b. *Yèyîŋa yˊe tàak cíi Bˆol ké tˊiŋ. ˉ ¨ ¨ who.pl impf.2sg think prf.ns Bol.gen pl see c. *Yèyîŋa yˊe ké tàak cíi Bˆol tˊiŋ. ˉ ¨ ¨ who.pl impf.2sg pl think prf.ns Bol.gen see

It is suggestive to interpret these findings in the following way: the presence of ké in (43-b) and (45-b) indicates the presence of a copy of the fronted plural whphrase in the specifier of every vP, and, in fact, once a copy of a plural wh-phrase the occupies specifier of vP, the plural marker must show up. This is the reason, then, why the examples (44) and (46) are ungrammatical. Of course, this line of reasoning presupposes successive cyclicity via SPEC-vP. Cooccurrence Restrictions in Dinka Further interesting syntactic clues for successive cyclicity via the edge of the verb phrase comes again from Dinka. The structure of the relatively simple argument by van Urk and Richards (2015) runs as follows (essentially analogous to the one by Grewendorf (2002) based on German Vorfeld-phenomena): First, they show that the edge of both CP and vP in declaratives needs to be occupied by an XP in Dinka. Secondly, they show that in long-distance wh-movement, these very positions cannot be lexically occupied. The conclusion is that long-distance whmovement proceeds via these positions, blocking additional material. I will here present the evidence van Urk and Richards (2015) advance in favor of the hypothesis that long-distance movement proceeds over the edge of v. van Urk and Richards (2015) observe that the position before the verb cluster must be occupied in Dinka, cf. (47-a). (47-b) shows that the preverbal position cannot be unoccupied. (47)

ɣˋɛn cí mîir tˊiŋ. ¨ I prf giraffe see ‘I see a giraffe.’ b. *ɣˋɛn cí tˊiŋ mîir. ¨ I prf see giraffe

a.

34 | 4 Propagating Symmetry

Hence, once either the direct (48-a) or the indirect object (48-b) precedes the verb, the structures become grammatical; in these cases the respective other object follows the verb: (48)

a.

b.

ɣˋɛn cí Ayén yiˊɛn kitˋap. ˉ ¨ I prf Ayen give book ‘I gave Ayen a book.’ ɣˋɛn cí kitˋap yiˊɛn Ayén. ˉ ¨ I prf book give Ayen

The examples in (49-a) and (49-b) show that it is not possible for both objects to follow the verb, leaving the preverbal position with no lexical material: (49)

a. *ɣˋɛn cí yiˊɛn kitˋap Ayén. ˉ ¨ I prf give book Ayen b. *ɣˋɛn cí yiˊɛn Ayén kitˋap. ˉ ¨ I prf give Ayen book

Now compare this state of affairs with the constituent question in (50-a) and (50-c). In these cases, the preverbal position need not be occupied and, as (50-b) and(50-d) show, must not be occupied by lexical material, in contrast to the declarative context in (48-a)–(49-b): (50)

a.

Yeŋà cíi mòc yiˊɛn kitˋap? ˉ ¨ who prf.ns man give book ‘Who did the man give the book to?’ b. *Yeŋà cíi mòc kitˋap yiˊɛn? ˉ ¨ who prf.ns man book give c. Yeŋˊo cíi mòc yiˊɛn Ayén? ¨ ¨ what prf.ns man give Ayen ‘Who did the man give the book to?’ d. *Yeŋˊo cíi mòc Ayén yiˊɛn? ¨ ¨ what prf.ns man Ayen give

van Urk and Richards (2015) interpret these facts as follows: the requirement for the preverbal position to be occupied by phrasal material is satisfied in the context of wh-movement. How so? They identify the preverbal position as SPEC-vP. By hypothesis, wh-movement proceeds through landing in SPEC-vP before moving on to SPEC-CP: (51)

[CP yeŋà cíi mòc [vP hyeŋài yiˊɛn hyeŋài kitˋap ]] ˉ ¨

4.3 Previous Analyses | 35

Hence, the preverbal position is occupied in examples (50-a) and (50-c), albeit inaudibly. The fact that additional lexical material is in complementary distribution with wh-movement (cf. (50-b) and (50-d)) confirms the overall conclusion and completes the picture.¹⁴ This concludes my overview. After presenting empirical arguments for successive cyclicity in long-distance movement via the positions SPEC-vP and SPECCP, let me now turn to technical ways in which this derivational pattern has been effected.

4.3 Previous Analyses As mentioned in the initial section of this chapter, successive-cyclic movement steps do not appear to systematically correlate with any semantic effect, and it is thus plausible to analyze them as being purely formal. Spurious wh-features on intermediate C-heads which trigger wh-movement have been proposed in the literature (cf. Stepanov and Stateva (2006)), but such devices lack plausibility. One finds extrinsic principles in the literature (Müller and Heck (2000)), but again, one wonders why the very specific conditions ought to hold. Essentially, they stipulate the right outcome rather than explain successive cyclicity. In this section I will review what appear to me to be quite successful attempts in characterizing intermediate movement steps in successive-cyclic movement as in a sense “accidental,” or in any event, untriggered. In the end of each review I address shortcomings of the proposals.

4.3.1 Shortest Step Approaches Following Takahashi (1994), Boeckx (2003) develops a theory of successive-cyclic movement that relies on the Principle of Unambiguious Chains (PUC). A chain is “unambiguous if it contains at most one strong position” (p. 13, by convention, I mark such positions with *). Accordingly, a chain with more than one strong occurrence is ruled out. A strong position is defined in terms of feature strength, and a strong feature which needs to be checked overtly. It follows that chains – under the contextual notion of chains in Chomsky (2000) – comprises at most

14 Admittedly, the picture is more intricate and I refer the reader van Urk and Richards (2015)’s work for details. However, the details do not shatter the basic conclusion of successive cyclicity via SPEC-vP.

36 | 4 Propagating Symmetry

one occurrence of a feature that needs to be checked overtly. This theory derives contrasts such as (52-a)/(52-b): (52)

a.

John was arrested → CH(John)={T*, V} b. *John seems is clever → CH(John)={T*seems , T*is , Adj}

Within this theory successive-cyclic steps are taken not in order to check features on intermediate sites but simply to meet the condition that movement must proceed in local steps. In other words, the only probe-goal relation that is established is between the (strong features of the) head of the final landing site and the insitu goal, not between some intermediate head and the goal. The successive-cyclic character of movement is a by-product of the requirement that chain links must be as short as possible. Indeed, intermediate EPP-checking is excluded by principle, as it violates the PUC. Movement takes place only as the final landing site is introduced into the tree. If it were not for the “local steps” character of derivations, successive-cyclic movement could just as well take place in one fell swoop, because the crucial, movement-driving relationship is between the non-local probe and the goal. As Boeckx puts it: “the relevant operation underlying movement is Form Chain. [. . . ] Last Resort is relevant only to the formation of a chain, not links of a chain. [. . . and the] formation of a chain must have feature-checking motivation, but formation of chain links need not.” A consequence is that movement may in principle violate the Extension Condition. Consider why: in (53-a) a wh-element is in-situ while a C-head is merged. The derivation proceeds until a probing C-head is introduced (53-b). It is only now that the wh-element may move, because agree/Match is a prerequisite of movement, and the intermediate C-element does not entertain an agree-relation with the wh-word. In (53-c) the wh-element will thus move at the point when C[Q] merges and is forced to do so via the intermediate C, because movement must proceed locally. However, it is this step (53-c) which violates the Extension Condition as movement is not to the root. (53)

a. b. c.

C . . . C[Q] . . . C . . . C[Q] . . . C . . .

(54)

[CP C[*wh] . . . [CP spec-C C . . . wh[wh] ]] Mvt Mvt

4.3 Previous Analyses | 37

Thus there is a trade-off between the Extension Condition and Local Steps: just as there used to be a principle “Merge over Move,” we have a principle “Local Steps” ruling over the “Extension Condition.” A virtue of this theory is that there is no stipulated wh-feature on intermediate C-heads to derive the successive-cyclic steps. However, certain drawbacks speak against the account, as I will discuss now. The major problem of the analysis has been mentioned already: although it captures a crucial aspect – the relationship between the terminal C-head and the wh-word – the Extension Condition is violated. Moreover, the analysis depends on meta-conditions of movement like the Minimal Link Condition or the Shortest Steps requirement, i.e. extrinsic conditions on movement which do not follow from independent properties of the grammar.

4.3.2 Moving-Element-driven Approaches Bošković (2007, 2011) copes with the problem of intermediate movement steps of successive-cyclic movement by resurrecting the notion Greed. He assumes that wh-words bear unvalued, uninterpretable [Q]-features. Their interpretable valued counterpart is located on the interrogative C-head, as is standard. Moreover he adopts Chomsky (2000, 2001)’s phase model, according to which movement is restricted. Specifically, he adopts the strict version of the Phase Impenetrability Condition PIC repeated here: (55)

PIC: The domain of H is not accessible to operations outside HP; only H and its edge are accessible to such operations. (cf. Chomsky (2001))

Accordingly, an XP is accessible to a probing or attracting head only if the XP occupies a lower phase edge position. If the XP does not reach the phase edge position, the operation Transfer hands it over to the interfaces as part of the complement of the phase head (the phase heads’ domain). So a phase edge is the only way for a moving element to escape a phase. Furthermore, he adopts the operation agree (Chomsky (2001)) which is a unidirectional relation and requires that the [uQ] on the wh-word c-command the corresponding interpretable [Q]-feature on the Chead. From this it follows that the wh-goal must reach a position c-commanding the [iQ]-feature/the probe on a C-head, lest it will not receive a value and lead to a crash. Movement, in turn, rescues the wh-word from being transferred along with a respective domain without the [uQ]-feature receiving a value; and movement will continue phase by phase until the wh-word moves into the position where

38 | 4 Propagating Symmetry

[uQ] can probe (i.e. c-command) [iQ] on the interrogative C-head. This is captured in the following formulation of Last Resort: (56)

X can undergo movement iff without the movement the structure will crash. Bošković (2011, 332)

“In the hope of better days,” wh thus moves as a means to guarantee eventual agree: (57)

→ wh

CP

[uQ]

wh

CP

[uQ] C [iQ]

C . . . twh . . .

[iQ]

. . . twh . . .

If CP and v*P are taken to be a phases, successive-cyclic movement of a goal must proceed through phase edges – boxed in (58) – dictated by the PIC. If a Goal does not reach the phase edge of a phase head PH1 it gets lost in a domain impenetrable at a next higher phase level (PH2, (58-c)): the domain that is Transferred is opaque to subsequent syntactic operations. (58)

a.

[PH2 . . . [ GOAL [ PH1 [domain . . . hGOALi . . . ]]]]

b.

[PH2 . . . [ GOAL [ PH1 ///////// [domain ///////////// . . . hGOALi ///// . . . ]]]]

c.

[PH2 . . . [ SPEC-PH1 [ PH1 ///////// [domain/////////// . . . GOAL///// . . . ]]]]

Successive-cyclic movement must thus proceed from edge to edge (ignoring the v*P-phases here): (59)

a. b.

did John say [CP C=that Mary likes what] → movement of what into the matrix clause is barred by the PIC (i) [CP what C=that Mary likes twhat ] (ii) did John say [CP what C=that Mary likes twhat ] → movement of what into the matrix clause is fine

Bošković’s analysis solves the problem of the trigger of movement of wh-elements to intermediate landing sites without resorting to the EPP or Edge Features: wh needs to move in order to eventually establish an agree-relationship. A sample derivation is given below:¹⁵

15 Understrike implicates a valued feature.

4.3 Previous Analyses | 39

(60)

What do you think that Mary bought? a.

[CP C=that . . . what[iWH][uQ]] → [uQ] on wh forces wh-movement to SPEC-CP (lower CP-phase)

b.

[CP what[iWH][uQ] C=that . . . twhat ] → Building of matrix clause

c.

[CP C[uWH][iQ]=did . . . [CP what[iWH][uQ] C=that . . . twhat ]] → agree(C[uWH] , wh[iWH])

d.

[CP C[uWH][iQ] =did . . . [CP what[iWH][uQ] C=that . . . twhat ]] → [uQ] forces Movement of what to SPEC-CP (upper CP-phase)

e.

[CP what[iWH][uQ] C[uWH][iQ]=did . . . [CP twhat C=that . . . twhat ]] → agree(wh[uQ] , C[iQ] )

f.

[CP what[iWH][uQ] C[uWH][iQ]=did . . . [CP twhat C=that . . . twhat ]]

Like the Minimal Chain Link approach, Bošković succeeds in developing an analysis without resort to spurious intermediate features to trigger intermediate movement steps. However, there are a number of reasons why the approach is untenable. The criticism that follows carries over to the work by Zeijlstra (2012), who develops a mirror version of the Boškovićean analysis in an effort to capture additional phenomena. After presenting my own analysis in section 4.4, I extend the criticism of Bošković’s approach by contrasting it with my own. The account just presented suffers from at least three grave problems: first, phrases need to be able to probe, technically, in contradiction to Bare Phrase Structure, secondly, the analysis involves look-ahead and thirdly, it effectively violates the PIC and thus undermines the very motivation for movement. Bošković’s treatment implicitly but crucially relies on the following assumption: moving phrases as a whole are able to probe, i.e. heads must be able to project their features up to a maximal projection, and this XP – i.e. the [uF] that percolates up to the highest node – must be able to probe for corresponding interpretable features. As (61-a) shows, the probe [uQ] on X is too deeply buried within XP to c-command [iQ]. Consequently, a mechanism as is illustrated in (61-b) must be invoked allowing the probe to “telescope”:

40 | 4 Propagating Symmetry

(61)

a. XP X[uQ]

CP C[iQ]

. . . tXP . . .

b. XP[uQ] X[uQ]

CP C[iQ]

. . . tXP . . .

In Bošković’s analysis this hidden assumption is gilded by the use of simplexlooking symbols, but wh-movement is obviously phrasal movement in most, if not all, cases. Thus feature percolation or projection is a necessary ingredient in this analysis. However, both feature percolation and projection have been subjected to severe criticism (cf. Cable (2007, 2010); Heck (2004); Chomsky (2013)), the most principled being that percolation is an additional mechanism in the grammar, which cannot be derived from or reduced to other operations (Merge, Move, agree). Even if one were to object that the tree graph representation above contains the information that the category label X is really the feature bundle which includes the feature Q, it is unclear how to translate this information into Bare Phrase Structural terms: in the set (62), which is a representation of (61) in the sense of the old Bare Phrase Structure system of Chomsky (1995a), where the underlined instance of X represents the projected label, it is unclear in which way this label can c-command – hence probe – material outside of XP. (62)

{X, {X . . . }}=XP

This has always been a correspondence problem between arboreal representations and set theoretic renderings, which, however, has never been solved. If we take seriously Bare Phrase Structure and the set theoretic language it is formulated in, no process along the lines of (61) is possible. Under the assumption that no such mechanism exists, Bošković’s analysis is simply unworkable. Notice that none of these extra mechanisms are available in a label-free syntax approach, which the analysis I suggest below is couched in, and none need be, as I will show. A second problem is the look-ahead character of this analysis: prior to reaching any phase edge it is already clear that movement of the wh-element must take

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 41

place, because the need to eventually undergo agree with the [iQ]-feature on terminal C-head is encoded in the feature structure of the wh-word from the start. Let me finally address an issue with Bošković’s (and Zeijlstra’s) system which strikes me as the probably most severe and profound one, and which has not been discussed extensively, to the best of my knowledge. If both phases and copies are ingredients of the analysis, the PIC appears to be violated inevitably, due to the need to “update” lower members of the movement chain. Consider a structure like (63): (63)

[ XPuF :_ [ YiF :v . . . hXPuF :_i]]

X bears an uF and Y, which bears a valued (v) interpretable feature iF, is a phase head, triggering transfer of Y’s complement. Within this system, somehow the uF on the lower copy of XP needs to be transmitted the information that uF on the upper copy of XP has received a value upon agree between uF on X and iF on Y (or reversed as Zeijlstra’s suggests): (64)

[ XPuF :v [ YiF :v . . . hXPuF:_i]]

To count as identical with the upper copy of XP, the lower copy must countercyclically receive the same value on uF as the upper one in violation of the PIC – in (64) the uF whose value must be updated is highlighted in boldface. Since (lower copies of) XP can be infinitely deep in the structure in successive-cyclic movement, the problem is aggravated. In fact, the observation hints at what seems like a wide-ranging implication: a feature on the the head of a non-trivial movement chain must not receive a value as the result of agree, otherwise the problem I have just described arises – this appears to hold at least for any movement that reaches beyond a phase edge. One could salvage the problem by endorsing a multi-dominance view of movement. However, as I have discussed before, this conception of movement encounters independent problems (cf. chapter 3.2). ¯ Summarizing this section, all previous takes on successive-cyclic A-movement which I have considered here fall short of being principled and suffer from technical or conceptual problems, respectively. Let me now turn to the new analysis, which I believe has virtues while opening a host of novel questions. I will address some of them in the passages to come.

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement ¯ In this section, I develop my own account of the syntax of successive-cyclic Amovement. I will start with an overview of phrase structure grammar to more

42 | 4 Propagating Symmetry

modern conceptions of labelless Bare Phrase Structure theories, as some background might be required. Moreover, some comments on the notion of chains are appropriate to lay the ground for certain aspects of the analysis, which follows the section on chains. As the mechanism to trigger the movement of intermediate steps is quite powerful, it is necessary to say something about how the movement ever stops, which I do in a subsequent section. Finally, some speculations about the extension of this approach will follow and eventually I summarize the chapter.

4.4.1 From PS-Grammar to a Labelless Bare Phrase Structure In the Standard Theory (Chomsky (1965)) phrase structure rules PSRs were involved in generating expressions (sentences, phrases). At the time, recursive rewriting rules of the familiar format in (65) and lexical insertion rules coalesce to generate output strings: (65)

a. b. c.

S → NP, VP VP → V, NP ...

These rules encode dominance and precedence relations among the nodes and furthermore specify categorial information (a verbal node VP dominates V, etc.). There are numerous problems with this view of phrase structure grammar. Take the VP-rule (65-b) above: formally, there is nothing in the system that prevents us from reformulating that rule like in (66) in which the initial symbol VP is headed by an adjective, i.e. by a terminal node, whose categorial value is not identical to the categorial value of the initial symbol: (66)

VP → A, NP

In other words, the fact that the output of the VP-rule (65-b) comprises a verb is a mere accident and does not follow from independent properties of the grammar, because there is no restraint on the use of symbols in the rewrite rules. A related drawback of this system is that PSRs restate other lexical properties by syntactic means (cf. Grewendorf (2002, 33)) and are thus redundant. Take, for instance, the transitive lexical item love: its categorial information and its argument structure are lexical¹⁶ properties. Doubly encoding this information both in the lexicon and in the categorial component of syntax is clearly undesireable if it turns out to be sufficient to let one component alone do the work.

16 However “lexicon” is conceived, cf. Halle and Marantz (1993) et seq.

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 43

¯ X-theory (Chomsky (1970), Jackendoff (1977)) represented a step forward in this regard in that it economically divides the labor between the lexical and the phrase structure component of the grammar: lexical properties like categorial values, the need for an internal argument or lack thereof, etc. project into the syntax, mediated by the Projection Principle, which dictates that lexical information is syntactically represented throughout the derivation (cf. Chomsky (1981b, 29 ff)). The universal schema for phrase structures is (67), where the variable X is instantiated by the specific categorial value that the lexical entry happens to have; by stipulating that X is constant throughout the projection line, X’-theory is more restrictive than PSRs: (67)

a. b. c.

XP → ZP, X’ X’ → X’, WP X’ → X0 , ZP

By this format, phrases are obligatorily and uniquely headed (endocentricity), and the terminal node, in turn, dominates the lexical item. Guided by lexical information (such as argument structure), the head may combine with a complement to form the obligatory intermediate bar-category, and this bar-category may combine with a specifier to form the obligatory maximal projection. Notions like specifier, head, complement and adjunct can be defined on the basis of hierarchical relations (the complement is the sister of the head, the specifier is the sister of a bar-level category and immediately dominated by the maximal projection, etc). The phrase structure component still encodes dominance relations; precedence relations have been proposed to be subject to the directionality parameter (cf. Chomsky (1981b)). However, values of syntactic categories (V, N, A . . . ) are not doubly determined by the PS-component and the lexicon, but are solely lexical information and enter the syntax by projection. 4.4.1.1 Outline of a Merge-based Grammar More recently,¹⁷ Chomsky has advocated a view according to which both precedence relations and even projection rules are outsourced from syntax, which is based primarily on a set-forming operation called Merge. Merge is one name for an unbounded generative procedure that creates hierarchical syntactic structures, represented by sets; being recursive and applying to lexical items or complex syntactic objects, this procedure yields discrete infinity. External Merge EM refers to creating a set {X, Y}, where X and Y are distinct.

17 Chomsky (1995b, 2004, 2005, 2007, 2008, 2013).

44 | 4 Propagating Symmetry

In the late 1990s, it was recognized that the syntactic phenomenon of displacement (transformations) can be recast as Merge which targets an element inside the structure already built,¹⁸ indeed, not allowing for this option would need justification – in this sense, movement comes “for free” (...).¹⁹ Internal Merge IM thus refers to creating {X, Y} where one is contained in the other – say, X in Y. If X inside Y is not tampered with (cf. Chomsky (2005)), IM of X to Y yields a set containing two copies of X, one in the “First Merged” position of X, and one being the sister of Y. As Merge/narrow syntax creates unordered sets, linear order is not encoded in syntactic structures, at least if Merge is not constrained in any further ways²⁰ – and it has been suggested that precedence relations are determined when syntactic structures are transferred to the sensory-motor system.²¹ Let us now turn to the abandonment of projection rules. Chomsky motivates ¯ this move on the one hand on the basis of familiar problematic aspects of X-theory, namely the introduction of bar-levels or heads that are not part of the lexicon and thus violate the inclusiveness condition (cf. Chomsky (1995b, 249)). On the other hand, viewed from the relational perspective of Bare Phrase Structure (Chomsky (1995a)), the projection rules stated in (67) are quite stipulative: why is it that ZP is the specifier of X0 but not XP the specifier of Z0 ? I.e. why is ZP subordinate to

18 Cf. Boeckx (2008a) for the formulation that Merge is “source independent.” 19 This “freedom” has been interpreted as unconstrained application of Merge (Merge α, cf. Boeckx (2014)), as opposed to freedom in the sense that its being part of the theory of syntax comes at no costs. I tend to side with Chomsky (2004, 111) in assuming that the application of IM is constrained – in ways to be understood and specified: “Both external and internal Merge are constrained in how they apply. We would like to show that the constraints are principled, deriving from” interface conditions on the one hand, and general conditions of computational efficiency on the other. For now, I take “for free,” to mean that IM is not added to the system as a theoretical primitive, but is logically implied in Merge, its absence requiring stipulation. 20 As in Kayne (2008). Interestingly, a set-based view of the “categorial component” has historical antecedents as early as the 1960s, cf. Chomsky (1965, 124–127)’s discussion of Curry (1961) and Saumjan and Soboleva (1963), who, according to Chomsky, had proposed rules like S → {NP, VP} instead of the concatenative PSRs. 21 We could formulate a conceptual economy argument in favor of a postsyntactic linearization procedure: both morphology and phonology exhibit linear order, and as for morphology, there are reasons to believe that it (partially) takes place postsyntactically as in Distributed Morphology (cf. Halle and Marantz (1993) et seq or in particular Raimy (2000) for a linearization-based analysis of reduplication). Now a priori it seems to me that an architecture of the grammar that places all temporal ordering procedures after syntax is to be preferred over a grammar that divides linearization into syntax-linearization and morphology/phonology-linearization, because ideally there is just one linearization component. However, the picture is likely to be more complex than this, in particular, as temporal order is not uniform as Idsardi and Raimy (2010) emphasize, but locating temporal relations solely where they are needed no doubt has conceptual appeal.

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 45

XP but not the other way around? If Merge(X,Y) yields no more than {X, Y}, dominance relations between the members Merged are not specified. But so sparse a conception of Merge is arguably the null hypothesis and shifts the locus of explanation away from UG to syntax-external features. As Ott (2011a, 61) stresses in his outline of this theoretical development: “If phrase-structure grammar [including its stipulations such as specifier, head, complement (A.B.)] can be dispensed with in toto, this implies a significant reduction of the explanatory burden placed on UG.” The initial step towards a purely Merge-based grammar was early Bare Phrase Structure theory, which still makes use of a concept of projection. In this theory, phrases still provide information about dominance relations among nodes; linear relations, by contrast, are not encoded in syntax,²² but are taken care of in the phonological component PF. However, recently even projection was argued to be dispensable in favor of a label-free system (cf. Collins (2002), Chomsky (2004) et seq, Seely (2006)). To illustrate this development, suppose α and β undergo Merge, yielding the syntactic object γ (={α, β}). Let β be a phrasal syntactic object SO and α unspecified (i.e. either a lexical item LI or an SO). In the early days of Bare Phrase Structure (Chomsky (1995a)), Merge was enriched with a mechanism that provides for information about the label of the output category γ, as represented in set theoretic notation in (68), where the underlined instance of α represents the label of γ; the tree in (68) corresponds to γ. If α = LI, the label of γ is α; if α is an SO, the head of α is the label of γ. What is crucial is that Merge is in a sense smarter than what we would minimally expect: in addition to taking two SOs and forming a set from them, this mechanism attaches a label to the set, using (the head of) either member as the label. (68)

Merge(α, β) → {α, {α, β}}

α α

β

By contrast, in a label-free system, the projection rule is unavailable as (69) shows – here, the output of the Merge-operation delivers nothing but the category-free set of elements (or a label-less, “flat” tree γ): (69)

Merge(α, β) → {α, β} α

β

22 Pace Kayne (1994) and subsequent works.

46 | 4 Propagating Symmetry

When we compare (68) and (69), it becomes clear that (69) is significantly simpler in terms of what the operations have to shoulder: in order to derive a structure like (68), we need some mechanism that picks the most prominent element to become the label of the phrase; (69), by contrast, is the bare output of Merge. While (68) provides the information that γ is headed by α, (69) does not, and in this sense is not endocentric per se, i.e. if endocentricity means that the most prominent element in a given set is structurally represented. What Chomsky suggests is that the process of projection which underlies the representation in (68) is dubious and should be unavailable if we want to reduce the inventory of our theoretical primitives. Unlike (68), (69) represents the output of a radically symmetric syntax, in which neither temporal order nor inclusion relations are specified. And if the search for symmetry in the laws of nature is any good as a guideline in doing science, a symmetric conception of Merge is to be preferred over an asymmetric one. However, we know that for certain syntactic processes and for semantic interpretation (such as semantic selection) we need to know what the most prominent element in a given phrase is, something which (68) gives us by means of a projection rule, but something we must somehow compensate for if we do not want to resort to projection. In other words, we have to make sure that projection falls out as an effect, using no more than a system as represented in (69). To render projection an epiphenomenon, Chomsky (2007, 23) proposes that a labeling algorithm detects the most prominent element in a given set by an overarching principle of Minimal Search, a third factor:²³ Note that the notion ‘label’ is playing only an expository role here. In constructions of the form H-XP (H a head), Minimal Search conditions determine that H is the designated element (label) that enters into further operations. H will be the probe, and wherever selection enters - possibly only at the CI interface - H is the only functioning element, whether selecting or selected.

What these remarks suggest is that labeling is very similar to probing by uninterpretable, lexically unvalued ϕ-features for the purposes of agreement and thus proceeds downward. In recent phase theory (Chomsky (2004) et seq, Richards (2007)) ϕ-features effectively define restrictions on the space of syntactic operations (e.g. phases), and their valuation by the operation agree coincides with

23 Ott (2011a, 64, fn.10) recognizes two interpretations of labeling by Minimal Search, one that does and one that does not do without labels represented in the structures that Merge creates. Effectively, he works with a system where labels are represented – but they still come about at the phase level by Minimal Search. It is unclear to me, however, in which sense these two interpretations might make a difference, either theoretically or empirically.

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 47

cyclic transfer to the interfaces. If the same is true for the labeling algorithm, endocentricity is effected at the phase level by identifying a designated element in each set within the phase head’s c-command domain. I.e. just as agree and movement results from an “all-powerful” phase head, imposing cyclicity on the derivation, endocentricity too comes about in a periodic fashion, as part of cyclic transfer. Chomsky suggests that the computational system requires identification of a label of a unit only if it makes further use of it or if the unit enters into semantic interpretation. If so, and if all syntactic operations and semantic interpretations happen at the phase level, this suggests that label search takes place at the phase level, i.e. where the decision for further computation is made and where syntax feeds semantic interpretation by means of transfer. Finally, we do not want structures to be superfluously labeled (as an automatic reflex of Merge, as it were), which at the point of phasal transfer turn out not to undergo further computation or which do not receive an interpretation. For the ongoing discussion I follow Ott (2011a) in assuming that structures are labeled at the phase level and use Chomsky (2008, 145)’s labeling definition as a starting point: (70)

Detect label (DL) at the phase level For each {H, α}=γ, H an LI and α an SO, detect H as the label of γ.

This way of formulating the labeling procedure adheres closely to the inclusiveness condition, because no extra symbols or diacritics are introduced into the derivation, i.e. syntax takes LIs from the lexicon, forms sets from them and rearranges these sets. In addition, no separate projection rule is involved. The current perspective turns the view upside down that “projection” takes place bottom-up from the lexicon into syntactic structures: just like probing, DL scans the phasal domain downward, identifying each LI in a given set as the only “substantial” category (sets/non-LIs being invisible) and hence as the label of that set. Now suppose P is a phase head and DL applies to the node we have called γ above, and suppose α=LI: →DL

(71) P

... α

P β

... α

β

γ becomes virtually labeled, i.e. the node is not structurally altered to accommodate a label α, but the system identifies γ as containing α as its most prominent member (here mnemonically tagged by underlining). Upon DL, the computational system treats γ as αP by virtue of α being the LI in γ.

48 | 4 Propagating Symmetry

After these considerations of how endocentricity derives from efficient computation, let me now make some remarks on chain formation in syntactic displacement. This will form the background of how to come to cope with the problem of labeling structures which have the format XP-YP. 4.4.1.2 Theoretical Preliminaries: A Remark on Chains From the perspective of the copy theory of movement, where a chain of α can be contextually defined as the set of all occurrences of α (Chomsky (1995b, 251 ff), Chomsky (2000, 115)), at least three effects have been claimed to ensue (cf. Chomsky (2000, 115 ff), Ott (2011a, 62)): trace-invisibility in ϕ-agreement, symmetrybreaking movement as a means to render a XP-YP-structure labelable, and blocking movement of lower copies (Chomsky (1995b, 304)). I will here make an attempt to explain and illustrate how trace invisibility and symmetry-breaking movement theoretically follows from the copy theory and this conception of chains. The issue is not trivial and has engendered so much explicit or implicit disagreement and alternative conceptions over the past years (until very recently, cf. Martin and Uriagereka (2011)), as well as rejection of some consequences (such as trace invisibility, cf. Richards (2004)) that I will be somewhat lengthy, elaborate and repetitive in the discussion that follows (for a lucid exposition, cf. Ott (2011a, 62)). My approach to a class of transformations crucially hinges on these considerations and assumptions.²⁴ The motivation for defining chains in the above way has to do with a problem the copy theory of movement poses: in the absence of coindexation or a Numeration that employs indices to “count” access to lexical items,²⁵ how can multiple occurrences of α in a movement chain be distinguished from multiply extracting α from the lexicon? The problem is that in a given structure introducing α n times by EM instead of moving α n-1 times yields the same structure. But uncontroversially, we want to make a difference between the two instances of John in (72-a) and (72-b) respectively: (72)

a. b.

John1 sees John2 John was seen hJohni

24 Notice that any theory involving copies needs to address the problems that I will discuss here. If, as Chomsky has repeatedly argued, copies are the null hypothesis to account for movement, conceptions of transformations involving different means of movement require justification. 25 Cf. Chomsky (2000, 114).

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 49

For example, suppose we externally merge XP1 and YP, yielding {XP1 ,YP}.²⁶ We add intermediate Mergers to form an object ZP (={Z . . . }), and we then internally merge XP1 to ZP, to give (73): (73)

{XP2 , {Z . . . {hXP1i,YP}}}

IM of XP1 yields two copies of XP, XP2 being the upper one.²⁷ XP1 and XP2 as such cannot be distinguished and no instruction with regard to their interpretation at the interfaces is available. In other words, unless we say something further, the operations IM or EM are not informative as to whether we are dealing with two distinct entities of XP (outputs of EM) or essentially the same object occurring in two positions (outputs of IM). Consequently, we need a way to tell the system to distinguish applying IM to an XP on the one side, from multiply applying EM to an XP on the other. This is so, because for both the sensory-motor and the conceptualintentional systems the two instances of, say, the DP John in (72-a) mean something completely different from externally merging John once, and then applying IM to it as in (72-b). The obvious task is to make syntax deliver to the interfaces the right kinds of instructions to make this difference. One way to do so is to define a movement chain by the ingredients the system already gives us as in Chomsky (2000, 115): an occurrence of α is its sister; thus an occurrence of α can be represented as the pair < α, β >, where β is the sister of α. Now a chain is defined as the set of all the occurrences of α, which for (73) means that CH(XP)={< XP, ZP >, < XP, YP >}, i.e. the set of pairs which represent each occurrence of XP. How can we tell which member is the head and which is the foot of the chain (the set is unordered)? Chomsky suggests that this follows from the fact that ZP properly contains the lower member. This whole movement chain, is a new object that is created by IM (and not if EM applies to XP twice; although I will qualify this momentarily) – it is as though XP now “stretches” over all its occurrences. Transparently, this definition of chains, gets along without indices. Now XP1 is by definition not the entire object, because the chain of XP is defined as the set of all – not just some – occurrences of XP. In other words, prior to IM of XP1 in (73) we have three objects: ZP, XP1 and YP. Somewhat paradoxically, after IM of XP1 we are still left with three – not four – objects, despite there being two copies of XP; unlike EM, IM does not add an object to the structure, but rather

26 Subscripts for expository convenience only. 27 I abstract away from the possibility of merging XP1 with ZP, yielding a multi-dominance tree; in such a conception there are no copies.

50 | 4 Propagating Symmetry

“extends” one of the objects internal to the structure. This extended object is a chain.²⁸ Obviously, what I have just said is not trivial. In particular, the question might need addressing how a syntactic object SO is defined,²⁹ and how this definition squares with this conception of chains, because it appears as though chain formation “overwrites” the initial definition of SO: assume an SO is recursively defined as in (74): (74)

X is a syntactic object SO iff a. b.

X is a lexical item LI X is a set {α, β}, where α and β are SOs.

Now consider {XP1 ,YP} prior to movement. Why is it that IM can change the status of XP1 , i.e. turn it from an SO to a “semi-SO” or “non-SO”? Saying that IM re-defines an SO is not really satisfying; definitions hold or they don’t, but they are not alterable at will. One way to approach the issue is to say that chains are created at every Merge-operation, irrespective of its source: IM and EM create a chain. The interfaces are then able to make the necessary distinction by being delivered a trivial or a non-trivial chain: only in the latter case will the object be interpreted as the output of IM, and as the output of EM otherwise. I leave the issue at that and henceforth continue using the term “chain” in the sense of non-trivial chains. The crucial formal asymmetry between the topmost members in a movement chain and any lower member is that the former – the sister of the moved element – properly contains all lower occurrences while this does not hold for the latter: in (73), {XP, YP} is contained in ZP, but the reverse does not hold: {XP, ZP} is not contained in YP. Now suppose Z in (73) is a matching probe for XP1 . The reason why Z fails to probe for XP1 after the latter moves – leaving aside countercyclicity for the sake of the argument –, is that the lower occurrence of XP1 is not visible: it is not an object amenable to syntactic operations, but part of a now discontinuous object XP. As Ott (2011a, 65) puts it: the original {hXP1 i,YP} “is now asymmetric, in that it only properly contains YP.” {XP2 , ZP}, by contrast, symmetric in that it properly contains both XP and ZP. It follows that agree with – or intervention by – the top

28 These considerations might have a bearing on the Conjunct Constraint (Grosu (1973)) which bars movement of one conjunct in a coordinate structure. If a copy left by movement were just the same element as an externally merged instance, it would be mysterious why something like the Conjunct Constraint is operative at all. 29 I thank Erich Groat (p.c.) for his critical and incisive remarks which have led me to include these ontological considerations.

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 51

member in a movement chain is possible. This asymmetry between a top member of a chain, which contains all occurrences of α, and any lower occurrence of α is the reason for trace invisibility and also for an asymmetry with respect to labeling. It is true that the lower occurrence of XP, XP1 , remains untouched by the application of IM – as required by the No Tampering Condition (Chomsky (2007, 2008)). XP is not properly contained in {hXP1 i,YP} and hence not an element subject to syntactic operations. These considerations on movement chains lead me to the question how XP-YP structures receive a label. 4.4.1.3 Merged-based Grammar and Labels The labeling problem can be stated as follows: in a system entirely based on the set-forming operation Merge (internal, IM, and external, EM), objects formed by Merge need to be identified, i.e. labeled, if they are to participate in further syntactic operations (such as movement), or if they are to receive an independent semantic interpretation.³⁰ But what exactly is “labeling”? How does it work? When does it take place? What is its status in the architecture of the faculty of language? ¯ Labeling replaces and derives certain properties of X-theory, one variant of ¯ phrase structure grammars: part of X-theory is the headedness principle, according to which every phrase have at least and at most one head. Further stipulations are the restriction to one specifier and one complement per XP. Moreover, trees are binarily branching. Lexical features of the heads determine subcategor¯ ization, s-selection, the syntactic category, etc. So in a sense, X-theory is ‘headdriven’; relevant features are projected from the head to the intermediate and the ¯ maximal projection level. Two relations can be read off of X-trees, namely head¯ complement and specifier-head. Finally, X-theory provides for recursiveness. Under a Merge-based theory much of this is given up and derived: bar-levels and XP-levels violate the Inclusiveness Condition in being elements that are not part of lexical information. As such, they should be avoided under Minimalist guidelines. Merge is an operation that combines α and β, resulting in the twomembered set {α, β}; the restriction of Merge to 2 arguments yields binarity of branching. As the output set of a Merge operation can be subject to Merge again, recursiveness is derived. There is no upper limit to the amount of specifiers, a no¯ tion that has no status in the theory; thus the restriction of X-theory to one spe-

30 Cf., for example, Chomsky (2007, 8) “If an element Z (lexical or constructed) enters into further computations, then some information about it [=the label, AB] is relevant to this option” (my emphasis). One could interpret this to mean that if the derivation terminates, i.e. if the root node is reached, labels are superfluous, cf. my approach to verb second in Blümel (2017).

52 | 4 Propagating Symmetry

cifier per phrase is dumped without replacement. Crucially for the present discussion, there is no concept of projection. Instead, the “telescope property” (Horn¯ stein et al. (2006)) of X-theory is derived by Minimal Search, i.e. the identification of a distinguished element in a given set by a higher probing head. Endocentricity is thus an epiphenomenon. Although not formally defined in the works of Chomsky, we can get some clues what Minimal Search might mean. First, he envisages Minimal Search to be a Third Factor concept, which is thus not part of the Faculty of Language in the Narrow Sense (FLN) but rather a language-independent principle: “Elementary considerations of efficient computation require that Merge of α to β involves Minimal Search of β.”³¹ If Merge is all there is to Narrow Syntax,³² Minimal Search (and hence “labeling”) cannot be part of it; rather, labeling can be part of Narrow Syntax only insofar as the elements undergoing Merge (i.e. the LIs) are concerned; the label identification mechanism itself results from the interaction of two Third Factor principles, phases/cyclic Transfer and Minimal Search, both concepts to yield efficient derivations. Second, next to labeling, Minimal Search derives probegoal relations³³ and also c-command.³⁴ Suppose P is a phase head that searches its domain, and suppose the set {H, XP}=γ: (75)

P . . . {H, XP}

H is the structurally closest Lexical Item (LI) relative to the c-commanding P. Now if γ comes about by EM(H,XP), it is clear at P that H assigns a θ-role and XP receives it, not the other way around, thus that there is an asymmetry/dependency.³⁵ Suppose XP={X, ZP}, then X is buried too deep in γ to be the label of γ - the search is “minimal” /local and determines labels one set at a time, similar to probing. Label determination resembles probing – which terminates once the lowest matching goal is reached (cf. Hiraiwa (2001)) – in that once a label within a given set is defined, the label within the next lower set must be determined, all the way down until the edge of the next lower phase is reached. Thus if XP needs a label, P’s search goes on - and X will be determined a label, namely the label of XP. What all of this also means is that labels are not determined before Transfer: Minimal Search is relevant for phase heads only. These are the heads that “search” (probe)

31 32 33 34 35

Chomsky (2004, 109), my emphasis, footnote omitted. Cf. also Epstein et al. (2009). Cf. Chomsky (2005, 11/12). Cf. Chomsky (2004, 113), and Chomsky (2007, 9). Cf. Chomsky (2004, 113); Chomsky (2005, 14) and Chomsky (2007, 9). Cf. also Chomsky (2005, 14).

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 53

and drive all operations; non-phase heads do not (or only derivatively). At Transfer – marked by the arrow –, these elements are determined, boldfaced in the schema below: (76)

a. b.

P . . . {H, XP} → P . . . {H, XP} P . . . {H, {X, ZP}} → P . . . {H, {X, ZP}}

label of γ label of XP

The second part of Chomsky’s remark above concerns the label of a set comprising two complex sets and such sets that results from IM - I will say more about this below. Chomsky (2007, 12) hypothesizes that Narrow Syntax is optimized primarily with regard to the Conceptual-Intentional system. As such, an optimal outcome of labeling a set formed by Merge appears to be that no label is present at all. But if the conceptual intentional system dictates that a set have one prominent element (an interface condition) or if the computational system needs to identify a label of a given set to function at all, i.e. to operate further on this object, the “identification of a label” is necessary. In the latter case, we expect labeling to be the computational system’s most efficient way of meeting its own demands, which means we’re dealing with general conditions of computational efficiency (cf. Chomsky (2004, 106)). By the Inclusiveness Condition³⁶ we expect there to be no redundancy: if it was not for semantic or syntactic reasons, provide no label at all – labeling should have a Last Resort-type of character. From all of this it follows that there needs to be some algorithm determining the label of a set formed by Merge. Chomsky (2008) suggests the following two statements: (77)

If EM of XP and the lexical item (LI) H yields {H, XP}, then H is the label.

(78)

If α undergoes IM to β, forming {α, β} then the label of β is the label of {α, β}.

Omitting the phase heads, which are relevant in that (77) and (78) apply only at these cyclic nodes, (79) and (80) illustrate the workings of (77) and (78) respectively: (79)

a. b.

{ V, DP } = “VP” { C, TP } = “CP”

(80)

a. b.

{ DP, { v* . . . hDPi. . . }} = “v*P” { DP, { C . . . hDPi. . . }} = “CP”

36 Cf. Chomsky (1995b, 228).

54 | 4 Propagating Symmetry

Now recently³⁷ Chomsky (2010) suggests that in certain cases units formed by Merge might not need a label at all. In a more general critique of the notion specifier,³⁸ he remarks that TP and its specifier is such a candidate, first, because this set never acts like a syntactic unit – it cannot be raised –, and secondly because this entity has no independent semantic interpretation, i.e. independent of the presence of/selection by C. All of this evidently implies that he has given up (78) in this general form: retaining (78) does not even elicit the question whether {SPEC, TP} is labeled or not. Conceptually, disposing of this condition is the right move, because it is a stipulation that has no independent justification – it cannot be derived from Minimal Search (cf. Richards (2010)). Clearly, giving up (78) yields a simpler grammar with less theoretical primitives, a grammar that optimally relies on (77) alone to identify the most prominent element in a given set. But once (78) is given up, we must ask: how do syntactic categories resulting from IM receive a label? Some of them uncontroversially participate in further syntactic operations or receive an independent semantic interpretation such as CP (it can be raised, extraposed, for semantic interpretation it is not contingent on other categories, etc.). For the case of raising a subject to SPEC-TP, Chomsky (2010) suggests that the connection between C and T is established prior to raising the subject,³⁹ a consequence of recent developments in phase theory (feature inheritance, FI, which I will not go into at this point). Thus there might not be a need for the set {SPEC, TP} to receive a label. What this appears to mean is that no ambiguity arises because C selects and renders T the label at a point when SPECTP is not yet introduced. How about CP? I will return to this case in more detail below. Already in Chomsky (2008, fn.34) an example is given in which both (77) and (78) fail to apply: externally merging a complex subject to v*P yields an unlabelable, symmetric structure, because neither merge-member is an LI. (81)

{DP, v*P}

37 For programmatic remarks see already Chomsky (2004, 109). 38 The relation “specifier of a head” is an “illegitimate concept” (Chomsky (2010)) within a framework confined to Minimal Search, from which other operations and concepts derive (probing for goals, labeling, c-command, etc.). In the following I use the term specifier in a descriptive sense. 39 Which, in turn, he takes to be the first “solid” argument in favor of the predicate-internal subject hypothesis. For discussion of this countercyclic operation and an interesting theory in which the subject is really raised to form a separate ‘peak’ next to TP, cf. Epstein et al. (2009).

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 55

As the syntactic object in (81) must receive a label, something must happen. Adopting general ideas by Moro (2000, 2007),⁴⁰ Chomsky suggests that one of the two members must raise at the phase level, sometimes referred to as “symmetrybreaking movement”: (82)

C {DP . . . {hDPi, v*P}}

This specific instance of symmetry-breaking movement then partially derives the Extended Projection Principle (EPP), i.e. the requirement that SPEC-TP be occupied. In other words, the EPP is neither a principle of UG (as in GB/Principles and Parameter theories), nor a feature on T in need to be checked (Chomsky (1995b, 232)), nor is it exclusively⁴¹ the effect of C transmitting its ϕ-feature set to T (FI, as Chomsky (2008)). Instead, it results from the labeling ambiguity which arises when externally merging an external argument with v*P. This ambiguity is resolved by movement in that the set (81) is labeled by vP upon movement of DP.⁴² The reason why movement qua copy theory solves a labeling problem has to do with the creation of chains/discontinuous objects. Consider the following tree:

40 See already Chomsky (1995a) who attempts to accommodate certain consequences of Bare Phrase Structure theory with the Linear Correspondence Axiom (LCA, Kayne (1994)). In Moro’s monograph raising of either member must take place so as to render a too symmetric structure linearizable by the LCA, which maps LIs in a hierarchical structure to linear precedence by asymmetric c-command. “Too symmetric” is to be understood in terms of mutual c-command in a {XP, YP} (or {X,Y}-) configuration, which yields a linearization contradiction: by the LCA, XP both precedes and linearly follows YP. Hence movement is triggered to destroy such symmetries and make linearization possible. 41 As we will see later, ϕ-feature inheritance does play a role in the EPP in that IM of XP to YP is subject to a matching requirement of the prominent feature (cf. Chomsky (2008, 2013), Miyagawa (2010)). 42 This explanation may be principled, but it is partial, because a number of questions remain: How about elements like proper names or pronouns whose structure has been argued to be simplex? What is it that forces internal arguments to raise to SPEC-TP in passives and unaccusatives (constructions that have been associated with defective v; Legate (2003) argues that in these constructions the object moves successive-cyclically, i.e. via the vP-edge. If so, maybe only the terminal movement step to SPEC-TP is an instance of symmetry-breaking – yielding a uniform SPEC-vP to SPEC-TP-pattern –, while the one raising the object to the vP-edge is plausibly of a different kind.)? Why do we sometimes need to insert expletives? Why is it that the subject – and not the v*P – raises?

56 | 4 Propagating Symmetry

(83)

β a

... α b

c

Using non-identical symbols only for exposition, suppose that a is created by IM of b, then b and a are occurrences of a discontinuous object A, which is the movement chain comprising all occurrences of A. Under the contextual definition of chains, an occurrence of, say, b is individuated by the sister of b; informally Occurrence(b)=c and A=CH=(Occurrence(a), Occurrence(b)). Transparently, all occurrences of the chain A are included in β, while it is not the case that all members of A are included in α: a(’s occurrence) is not. By contrast, c is fully included in α. It is this asymmetry between syntactic objects, which include all members of a movement chain and such that do not, which is responsible for visibility for syntactic operations: the entire movement chain – the topmost occurrence – is visible for agree, movement and labeling, while lower occurrences evade search. Thus topmost occurrences of movement chains can induce intervention effects, render a structure unlabelable, etc. while lower occurrences of chains cannot (cf. Chomsky (2000, 115)’s contextual definition of chains and Ott (2011a, 62)). Returning to the general case, (77) cannot label a structures of the form {XP, YP}, because no element is more prominent than the other. This necessitates movement of XP (or YP). With XP a discontinuous object, YP, the sole properly contained element in {XP, YP} effectively becomes the label because the algorithm cannot “see” the discontinuous object as we have established above: (84)

XP . . . {hXPi, YP}

In (84), {hXPi, YP} is labeled Y. Can movement of an XP to ZP may at least in principle feed back into a “too symmetric”, unlabelable structure {XP, ZP}? This would then be again in need of desymmetrization by IM as in (85). I will call such a derivation “symmetry-creating movement”: (85)

{XP, {Z . . . {hXPi, YP}}}

Richards (2010) makes the same observation and argues that giving up (78) raises the problem that moving an XP to SPEC-ZP results in a new labeling problem, a problem that is reiterated infinitely many times. He dismisses the possibility:

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 57

movement no longer becomes a coherent strategy for solving the instability problem, since now any specifier formed by merging [a syntactic object SO] to SO - be it by IM or by EM - creates the same problem. Movement only shifts the problem up a step, to a higher projection, requiring further IM.

The observation is to the point, but I do not share the view that propagated symmetries or instabilities are a bad result, quite to the contrary. What is clear is that this powerful mechanism needs taming – and below I will suggest just that: a restriction on or halt to the problem-solving-by-problem-creating loop: not all specifiers are unstable, otherwise they all would be forced to move infinitely, invariably yielding XP-clusters at roots. Consequently, conditions on and justifications of stable specifiers have to be given. But rejecting the consequence of a feedback movement altogether deprives us of a precious empirically possibility: this pro¯ cess and pattern underlies intermediate steps of successive-cyclic A-movement. If we allow for symmetry-creating movement, we have a straightforward answer to the question what triggers intermediate movement steps in successive-cyclic movement. I hasten to add that I am not claiming that symmetry-breaking is the only trigger of movement; I here confine myself to “residual EPP”-cases, i.e. movement types which defy an explanation in terms of “effect on outcome” (Chomsky (2001)). 4.4.1.4 Propagating Symmetry In the literature, failures of labeling a structure of the form {XP, YP} are considered, a structure which is formed by EM. It appears natural to ask whether {XP, YP}structures formed by IM can lead to labeling failures, a question I have answered affirmatively above. Once EM and IM are considered variants of a single operation merge targeting different sources, this question is justified. A consequence of this idea is that movement solves the labeling problem downstairs, but only to create a new one in the target, i.e. symmetry feeds back into itself. I claim that this principally infinite symmetry-destroying/symmetry-creating character of derivations is the source, i.e. the ‘trigger’ of intermediate steps of successive-cyclic ¯ A-movement and lies at the heart of the phenomenon. Suppose Ph stands for phase head. (86)

a. b.

{ ZP, YP } ZP, {Ph. . . { hZPi, YP }}

The symmetry between ZP and YP needs to be broken because no element is more prominent than the other, i.e. no label can be identified. IM of ZP at the phase level (86-b) achieves this, because now ZP is discontinuous and thus its lower

58 | 4 Propagating Symmetry

occurrence is invisible for labeling. The structure that results from this symmetrybreaking movement result in a configuration that parallels the initial problematic/symmetric structure (compare (86-a) and {ZP, PhP}, a representation of (86-b) which is more perspicuous for comparison with (86-a)) once the stipulation that the “target projects” is given up: (87)

ZP { Ph . . . { hZPi, YP }}

Consequently, ZP must move further at the phase level in order to break the newly created symmetry, i.e. to render {ZP, PhP} labelable: (88)

ZP { Ph1 . . . {hZPi, { Ph . . . { hZPi, YP }}}}

The unbounded character of successive-cyclic movement follows (illustrated with ¯ A-movement here): (89)

a. b. c.

Who did Mary [v*P hwhoi say [CP hwhoi that John [v*P hwhoi saw hwhoi]]] Who did Fred [v*P hwhoi claim [CP hwhoi that Mary [v*P hwhoi said [CP hwhoi that John [v*P hwhoi saw hwhoi]]]]] ...

Remember the system of successive cyclicity by Bošković (2007) in section 4.3.2, arguably the most widely adopted view of successive-cyclic movement of these days: in his system, the trigger of movement – the uF – is built into the moving element right from the onset of the derivation, clearly a derivational look-ahead system. Movement, in that system, needs to take place to guarantee eventual agree by the probe on the moving element. The current proposal, by contrast, does not involve a look ahead problem: movement is a corollary of the need to identify a label, which is independently needed. Thus, no probe on the moving element needs to be postulated, and no mechanism to let the probing feature percolate up to the phrasal level to ensure c-command between uF and the corresponding iF on the landing site. Numerous consequences follow. First, observe that the idea above forces me to say that wh-pronouns have a phrasal structure and are not heads, otherwise no intermediate symmetries ensue.⁴³

43 For wh-words this has been independently argued for on different grounds, cf. Barbiers et al. (2009). It also follows from the system Marantz (1997), et seq proposes, in which lexical categories derive from category-neutral roots plus category-defining phase heads (v, n, a). Then

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 59

Notice also that the symmetry can be broken by moving either element. The “wrong” choice yields (90-a) and (90-b), both of which are sharply out in English: (90)

(Mary said) {who, {that John saw twho }} a. *[[CP that John saw twho ] Mary said who tCP ]. b. *Mary [v*P [CP that John saw twho ] said who tCP ].

¯ How can we exclude (90-a)/(90-b)? Under X-theoretic assumptions movement of this unit – the CP – could be ruled out by a general stipulation banning movement of bar-level categories. Within a purely Merge-based system we do not have such a restriction at our disposal, and indeed, this would be highly artificial: in {who, CP} no element is subordinate to the other and bar levels/endocentricity are derivative. (I will return to the matter in the next chapter.) Also, there are good empirical reasons for allowing both members to move: Ott (2011a, chp 3.2.6) argues convincingly and at length that the derivation which underlies split topicalization in German (91-a) involves what he calls Bare Predicate Structures of the form {DP, NP}, which is base generated as the complement of the verb, and subsequently split, surfacing as a fronted or topicalized part and a remainder, the former a property-denoting bare NP and the latter a full elliptical DP. Now the obligatory movement comes about by the need to label and asymmetrisize the Bare Predicate Structure as schematized in (91-b): (91)

a.

b.

[NP Französische Bücher] hat Amina bisher [DP nur wenige gute e] French books has Amina so far only few good gelesen. read ‘As for French books, so far Amina read only few good ones.’ NP hat Amina bisher {, DP} gelesen.

Crucially, he does not attribute the obligatory surface ordering of the topic and the remainder to syntax but to a pragmatic constraint requiring that topic and comment must be such that the comment is about a (previously mentioned) topic. Thus information structure – not syntax – requires that the NP and not the DP raises when the Bare Predicate Structure is split by symmetry-breaking movement

√ the phrasal status of wh-elements is indeed what we expect ({what/who} is at least [n root]; √ {how/where/when} is at least [adv root]; etc.). Notice secondly that there are wh-words that are clitics, which can undergo successive-cyclic movement (Ian Roberts and Cecilia Poletto, p.c.). Again, I see no other choice but to say that these are phrasal. At least for such cases, cliticization is a purely phonological process.

60 | 4 Propagating Symmetry

in (91-b): the NP sets the interpretive frame for the comment that contains the remainder-DP. He argues that raising the DP is possible as long as in the surface output, the pragmatic condition of topic-comment organization is met. This is the case, he shows, in cases like (92-a) with the VP-remnant movement derivation in (92-b): (92)

a.

b.

[VP [NP Französische Bücher] gelesen] hat Amina bisher [DP nur French books read has Amina so far only drei langweilige] three boring ‘As for reading French books, Amina only read three boring ones so far.’ (i) DP . . . {NP, } gelesen (ii)

[{NP, } gelesen] hat Amina bisher DP

In line with the principled syntactic possibility breaking the symmetry by raising either member, I take the problem posed by (90-a)/(90-b) appears to be real. There is at least one way to rule out “flip-flop” derivations like (90-a)/(90-b): what must not happen, intuitively, is that the sister of the head of a movement chain cannot be the tail of a movement chain.⁴⁴ But the problem is more general and arguably falls under the rubric of proper binding condition.⁴⁵ (90-a)/(90-b) violate the generalization that remnant movement must be of a different movement type than the evacuation movement and in these examples both evacuation ¯ movement and subsequent remnant movement are A-movement. I take whatever ultimately explains this condition to be the source of the ungrammaticality of (90-a)/(90-b). A number of further questions need to be addressed: (93)

a. b.

What about selection, i.e. how can X select its sister, the set {YP, ZP}, before the latter is labeled? What brings the moving element to a halt, say, in indirect questions?

44 Equivalently, a sister of a tail of a movement chain must not be a tail of a movement chain, i.e. derivations must not yield syntactic objects of the form {, }. 45 Cf. Hiraiwa (2010), who arrives at the conclusion that effects of the Proper Binding Condition PBC and restrictions on remnant movement follow from cyclic computation, the PIC and what he dubs the Unique Path Principle. Cf. also Müller (1996), Grewendorf (2003) and Abels (2007) for related alternatives of restricting remnant movement. None of these proposal is without problems.

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 61

c.

How is the initial movement triggered?

Concerning (93-a), take the embedded CP in (94) when the matrix VP is built: (94)

Who did Mary say that John saw? a.

say {who that John saw}

In the analysis developed here, the syntactic object {who, CP}=δ receives a label only after who raises: wh-raising creates a discontinuous element of which the occurrence sister of CP is invisible as we have established above; consequently C is the visible label of δ as a by-product of wh-raising. But how can say select δ at a point when it is still label-less? Selection steps in too early it seems. As a response to this problem I suggest that selection is determined at the phase level, too. Rather than being a function of EM itself, semantic selection is read off of the syntactic configuration at the point of transfer (cf. Chomsky (2004, 111)). At the next higher phase level (v*P) who must raise to SPEC-v*P, breaking the symmetry between CP and who, yielding {hwhoi, CP}. The syntactic unit transfer hands over to the interfaces exhibits the right predicate-argument configuration, namely the verb with specific semantic properties and its sister, a syntactic object that is now properly identified as CP (cf. also Ott (2011a, 71–72) on this point). On (93-b): what stops movement of the wh-element in intermediate positions in, say indirect questions like (95-a): (95)

a. John wondered what Mary has read. b. *What did John wonder Mary has read? c. ??What did John wonder whether Mary has read?

The configuration for wh-elements in intermediate positions is exactly such that further movement is expected to be enforced, because wh/CP are symmetric in both, incorrectly predicting (95-b)/(95-c) to be good: (96) wh

CP C=that

(97) wh

CP C[Q]

62 | 4 Propagating Symmetry

One possibility I have been considering is to adopt (98) from Boeckx (2008b), based on Collins (2002): (98)

Probe-Label Correspondence Axiom (PLCA): The label of {α, β} is whichever of α or β probes the other, where the Probe = Lexical Item whose uF gets valued.

In projectionist terms, (98) predicts the target category of movement to project whenever the target involves a probe and the moving element serves as that probe’s goal. Consider (99):⁴⁶ (99)

C[Q] . . . {wh[Q], v*P}

(98) could then be utilized as follows: in the preterminal configuration of successivecyclic movement (99), a [Q]-feature on the interrogative C-head probes for the [Q]-feature on the wh-element. Since [Q] on C probes for [Q] on wh, C becomes the label by (98), regardless of whether wh moves or not. But in {wh, v*P} the wh-element is forced to move, because that set is unlabelable. As (98) establishes a correspondence between labeling and probing, the label of {wh[Q] , C[Q] P} can be predicted prior to movement of wh into SPEC-CP, i.e. the formation of a specifier merely extends the preestablished label C. Even though this appears to work, I refrain from adopting (98). Conceptually, it means introducing a stipulation, namely an extrinsic condition on the label of any IM-output involving a probing operation. Coupling probing with such a condition enriches the operations unnecessarily and goes beyond Minimal Search of a subsequent phase head; but Minimal Search is really all that label identification uniformly ought to follow from. In the scenario I have just presented, label identification is Minimal Search-driven up to the preterminal stage of successive-cyclic movement, where all of a sudden a different condition applies – a somewhat unnatural move. Joost Kremers (p.c.) suggests to me an alternative: both features the respective heads in {wh[Q] , C[Q] P} provide the label, i.e. [Q] (and maybe [wh]) on C and wh; in such a configuration the impossibility of unambiguously detecting the most prominent element in the XP-YP-set would not be a problem – Minimal Search detects both features ambiguously. If this can be sustained, selection by the higher head might be all that is needed. Chomsky (2011) suggests a similar idea: [Q] on C and wh respectively is ambiguously selected bringing wh-movement to a halt. In indirect questions both

46 For the sake of simplicity I abstract from the external argument introduced by v* and the general question of how multiple specifiers are dealt with.

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 63

members of {wh, CP}, or rather their feature [Q], become the complex, “dual” label and get selected by interrogative-embedding predicates. How is this possible? Let me elaborate on this briefly. Chomsky (1995b, 244-245) discusses which element becomes the label when two elements α and β get Merged and dismisses the possibility that it is the union of α and β: “the union will be not only irrelevant but ‘contradictory’ if α, β differ in value for some feature, the normal case.” But what if they do not? I believe what Chomsky is aiming at is that here we have a deviance from “the normal case”: in a label-free, Minimal-Search-based framework, where we do not have projection of features at our disposal, we could now say that the [Q]-feature of the wh-element and the CP are in a sense equidistant from the next higher probing head, and thus the intersection – the most prominent element both XPs share – is really what the label amounts to: (100)

v*=wonder . . . {wh, CP}

(101)

→ ... v+wonder

? WhP

Wh [Q]

... v+wonder CP

...

C [Q]

[Q] WhP

...

Wh [Q]

CP ...

C [Q]

...

The idea then is that DL can in principle search both complex phrases for a designated element, but specific (non-narrow syntactic) conditions determine whether the outcome of this procedure yields deviance or not. So within XP-YP labeling is not per se impossible: it just yields a structure that meets or fails to meet demands external to syntax proper. I will elaborate on the details of the conception of dual labels as we go along; specifying which phrases are and which phrases are not eligible for dual labels is an empirical matter. I will argue that (an interpretation of) Cheng (1991)’s clause typing hypothesis helps to understand and clarify these conditions. 4.4.1.5 Empirical Evidence for Dual Labels The typology of strategies of forming wh-questions in the languages of the world is at least fourfold, according to Sabel (1998, 2000): Languages that have obligatory wh-movement (e.g. English), languages that are wh-in situ only in direct

64 | 4 Propagating Symmetry

questions (e.g. French), languages that are wh-in situ in direct and indirect questions (e.g. Mandarin Chinese), and languages that have partial and full movement strategies (e.g. German). Sabel accounts for this typology in terms of strength ([+/strong]) of the features [+wh] and [+foc], i.e. a version of the classical view that syntactic parametrization is located on features of functional heads. In this work, I would like to interpret at least one of those strategies in terms of the dual labeling strategy described above. In particular, French wh interrogatives could provide support in favor of this approach: Lasnik and Saito (1992, 1/4)⁴⁷ observe that while wh-fronting is optional in root contexts (102), movement is forced if demande ‘wonder’ selects the interrogative clause (cf. the contrast between (103-a) and (103-b)): (102)

a.

b.

(103)

Qui as-tu vu? who have you seen ‘Who did you see?’ Tu as vu qui? you have seen who ‘Who did you see?’

a. *Je me demande (que) tu as vu qui. I myself wonder (C) you have saw who b. Je me demande qui tu as vu. I myself wonder who you have saw

I take this to mean that wh-movement in (103-b) provides the embedded clause with a mark for grammatical function in the sense of Hasegawa (2005, 39) (a mark which is absent in (103-a)). In her important contribution on the ways the EPP manifests itself, Hasegawa suggests the following restrictive principle:⁴⁸

47 Thanks to Mingya Liu (p.c.) for notifying me of these facts. 48 Languages such as Dutch have constructions which prima facie violate (105), because they allow for overt wh-movement into the specifier of the interrogative complementizer of ‘if’ in nonV2 contexts: (104)

Ik weet niet wie of Jan gezien heeft. I know not who Q Jan seen has ‘I don’t know who Jan saw.’ Haegeman (1994)

It appears as though Dutch may over-fulfill the need to overtly mark the CP-domain as interrogative (an instance of Rizzi (1997, 283)’s “rare case” of force specification simultaneously by specifier and head), suggesting that for Dutch (and other languages) exclusive or be replaced by an inclusive one in (105). I here abstract away from these complications.

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 65

(105)

Marking grammatical functions: Spec vs. Head A particular grammatical function, such as question, is morphophonologically marked either by Head or by Specifier.

(105) is a disjunctive interface condition on the realization of particular grammatical features on either a head or that head’s specifier. It does not mean that narrow syntax is sensitive to these morpho-phonological manifestations, but it means that the computational system does no more than required to meet (105). For example, a wh in-situ language which realizes a Q-morpheme at C. Then (105) predicts that this morpheme disappears when the wh-word moves to SPEC-CP, because overtly realizing interrogative force at C and simultaneously exploiting overt wh-movement is redundant for the purposes of clause typing (Cheng (1991)). As Hasegawa (2008) convincingly argues, this is the case in Japanese matrix sluicing contexts,⁴⁹ where overt wh-movement suppresses the realization of ka (Q, (106)); by contrast, in wh in-situ contexts, the realization of ka is obligatory (107) in Japanese. What (105) is silent about is whether clauses – the C-domain – must be morpho-phonologically typed, and indeed we know from Mandarin Chinese that overt Q-marking in wh-questions is optional (cf. Cheng (1991)), i.e. the Q-element on Cs may remain null: (106)

A: Dareka-ga ki-masi-ta yo. B: E? Dare-ga (*ka)? ↗ ↘ someone.nom come.pol.pst I.say Yeah who.nom q ‘Someone Came.’ ‘Yeah? Who?’

(107)

Dare-ga ki-masi-ta *(ka) ↗ *↘ who.nom come.pol.pst q ‘Who came?’

Returning to (103), the sentence is insufficiently marked as interrogative, leading to a selection failure by demande. Even though there is a wh-element buried in the sentence in (103-a), this is not enough, presumably due to a sisterhood requirement of the matrix verb and an indirect question which is morpho-phonologically marked in the C-domain. Que/null-C do not meet this desideratum: what is at work in (103-a) is thus a failure to provide a transparent means of grammatically identifying the embedded sentence as a question – neither head (C) nor specifier are marked along Hasegawa’s lines. As (103-b) shows, overt wh-movement comes to a rescue: since the sister of CP is the wh-word, the clause is transparently marked

49 Embedded sluices in Japanese (and Mandarin Chinese; Mingya Liu, p.c.) are a separate matter. These involve copulas and are presumably cleft structures; they might not involve whmovement at all.

66 | 4 Propagating Symmetry

as interrogative, satisfying the selectional needs of demande and fulfilling (105). This is not to say that movement happens because the clause needs to be so typed. Rather, only the overt wh-movement variant of wh-questions which French independently allows yields the sisterhood relation to the matrix predicate, which requires visible materialization of interrogative force. By contrast, no selection restrictions are imposed on main clauses, which is part of the reason for the nonobligatoriness of wh-fronting in (102). The fact that wh-fronting is optional in the context of root clauses might indicate that they do not need a label at all or receive a label in a manner different from Minimal Search. If on the other hand, labeling uniformly comes about by Minimal Search induced by a higher phase head and if there is no phase head above root clauses, the conclusion that root clauses lack a label suggests itself independently. The grammatical function typing function of specifiers has never been entirely clear, unless a specifier-head relation is invoked. Abandoning the concept of specifiers wholesale now allows us to restate and simplify Hasegawa’s specifierbased principle as follows: (108)

Marking grammatical functions In {XP, YP} a particular grammatical function, such as question, is morpho-phonologically marked either by X or by Y.

Specifier-head relations and clause-/phrase-typing effects (by movement) now turn out to be effects of the simple fact that in {XP, YP} either X or Y must do the job of overtly realizing the grammatical feature needed for selection by higher heads. I.e. the simpler sisterhood relation suffices with no inclusion relation between XP and YP implied and no mediation by specifier-head relations necessary. By contrast, (98) repeated as (109) is reflexive of a specifier-based approach in that movement of a goal into a specifier position of a corresponding probe yields a phrase labeled by the probe, which means that the moved goal is included in and dominated by the phrase of the probe – i.e. specifier-hood is inevitably retained, which undesirably prevents the simplification of Hasegawa’s interface principle. (109)

Probe-Label Correspondence Axiom (PLCA): The label of {α, β} is whichever of α or β probes the other, where the Probe = Lexical Item whose uF gets valued.

My initial idea of using (98)/(109) to stop wh-movement in successive-cyclic movement shares with Kremers and Chomsky’s the intuition that an escape from the symmetry-loop is provided by [Q] on terminal C, the head that marks the scope position for wh-elements; it parts with theirs in being unnecessarily stipulative: a specifier-less approach is conceptually simpler and receives empirical support

¯ 4.4 A Novel Approach to Successive-cyclic A-Movement | 67

as I have just shown. It appears to me to be a rational choice then, to adopt and try to sharpen their proposals. As I will show in subsequent chapters, doing so has further empirical advantages. From the perspective of Hasegawa’s (revised) condition we can make sense of Chomsky’s idea that the EPP-phenomenon results in a structure {EA, TP}=δ, whose label is the respective ϕ-feature set of EA and T. In other words, δ is a doubly labeled structure, reminiscent of early PSRs of the form S → NP, VP and earlier ideas (compare Bloomfield (1933, 194 ff) for the suggestion that English sentences are exocentric). A label for a given piece of structure is needed, according to Chomsky, (110)

a. b.

for interpretive purposes or if the element in question undergoes further computation.

Empirically, we know that δ does not move, so (110-b) is not the reason why it needs a label. Being left with (110-a), Chomsky gives the following empirical reason why EA provides its ϕ-set for labeling: (111)

PRO to seem to be intelligent is harder than you think.

PRO in the sister position of TP is responsible for a “secondary agency” reading so EPP-raising has a semantic effect unavailable with PRO in-situ.⁵⁰ Now how about the ϕ set of T? What I take (110-a) not to mean is that ϕ-features are needed at the conceptual intentional interface, because it is precisely these that are deleted before they reach the semantic component.⁵¹ What I do take (110-a) to mean instead is that T’s ϕ-label is overtly needed to express the agreement relation between the subject/EA and the verb. Thus the “interpretive purpose” in (110-a) crucially includes grammatical/functional meanings and relations. The apparent redundancy of overtly raising EA to the specifier of a grammatically marked head (T) is in fact an economic and maybe optimal means of signaling two types of grammatical meanings: secondary agency and agreement relations. If it were not for EA-raising, the former would not obtain. If it weren’t for morphologically marking ϕ on T as a result of agree, the latter would be missing and, being reduced to mere argument structure provided by the lexical layer, language would lack “expressive power”

50 The insight that propositional units comprising T contribute an agentive meaning is traceable, at least, to Bloomfield (1933, 268), who remarks: “in English, a phrase consisting of the preposition to and an infinitive expression, belongs to the special form-class of marked infinitive phrases, whose function differs from that of unmarked infinitive expressions, since they serve as actors.” 51 I am grateful for Marc Richards (p.c.) for discussion of this point.

68 | 4 Propagating Symmetry

the functional layers (C/T and possibly v⁵²) provide (cf. Miyagawa (2010)): these give “rise to such notions as topic-comment, subject of a clause, focus, and content questions, among many other modes of expressions.” (Cf. Miyagawa (2010, 8) and the discussion of the Japanese subject oriented reflexive anaphor zibun for illustration of how functional relations are independent from lexical ones) In other words, being the targets of internal Merge, the C-T-complex provides means to express the one side of Chomsky’s duality of semantics (full argument structure being the other). Specifically, we can now say that both raised and unraised external arguments in English are “subjects of a clause” by virtue of morphological realization of ϕ-feature agree between T and EA, which is why the ϕ-label of T is needed. Second, the ϕ-label of a raised subject is needed to provide for the secondary agency reading above. (108) can thus be interpreted as a bi-uniqueness relation between a given semantic/grammatical meaning and its overt manifestation: whenever we see a given grammatical meaning overtly expressed, we expect it to be the case that narrow syntax provides the necessary means for this manifestation – and optimally also the sufficient ones. Any visible “double expression and externalization” of a given semantic/grammatical meaning is suspect from this theoretical point of view. This is especially true if Sigurðsson (2004, 11) is right in saying that “it is lexicalization that is last resort, requiring some licensing or justification [. . . ], whereas non-lexicalization [e.g. silence; non-pronunciation, AB] is the unmarked or the minimal strategy, applied whenever possible.” If an abstract morpho-syntactic feature is expressed more than once (say, by both agreement and simultaneous movement giving rise to surface specifier-head agreement), this directly contradicts Sigurðsson’s proposal that lexicalization has a “last resort” nature. Discovery of apparent redundancies in grammatical manifestations motivates to look for additional (semanto-grammatical) interpretations that might be their source (unmasking the possibility raised in footnote 48 as ill-founded or hasty, and instead urging to retain the restrictive version of (105)/(108)). Finally, on the question what initiates the movement, (93-c). The problem is pertinent for wh-objects, because not being specifiers,⁵³ they need to be introduced into the loop, i.e. moved to SPEC-v*P, as specifier-hood is implied in successive cyclicity. I suggest that movement to the edge of v* is free, and its licens-

52 Miyagawa includes v as a possible candidate among the functional heads which establish functional relations. Assuming that they can bear uϕ-features and function as phase heads, this appears reasonable. However, if in addition v assigns external θ-roles, it appears like a semifunctional head. But crucially, v does not agree with the argument it assigns a θ-role to. 53 Which under the current assumptions are all unstable – a statement I qualify in the next section.

4.5 Some Consequences and Extensions

| 69

ing is determined by extra-syntactic conditions. And there is reason to believe that something along these lines is true: wh-in-situ is not categorially excluded in English; in the absence of movement, dialogues like (112) obtain:⁵⁴ (112)

A: I am having a birthday party this weekend. B: And you invited who to your party?

Such instances of wh-in-situ in English have been studied, among others, by Pires and Taylor (2007), and they suggest a pragmatic principle for its licensing, based on work by Stalnaker (1978): as long as the information requested is expected by the speaker to be part of the Common Ground, wh-in-situ is licit. Given A’s remark, B can safely assume that A invited someone to the party. Thus B’s use of a wh-insitu question is unproblematic. Let me now delve into some reflections on and consequences of the conjecture made in this section.

4.5 Some Consequences and Extensions At first glance, the Chomsky/Kremers-conjecture appears to depart from the idea that what a labeling algorithm does is identifying the most prominent element in a given set, which up to now I have assumed is the simplex element, the LI. Under the “dual label”-conception this most prominent element might not be unique. Instead, the label identification would proceed into both phrases in parallel as it were, yielding convergent derivations only in certain cases (such that the interfaces can interpret). Moreover, it obviously means that XP-YP-structures are not unstable and unlabelable per se. Clearly, we need to address the criteria from telling unstable XP-YP-pairs from stable ones. Simple combinatorics delivers the following four possibilities for EM and IM respectively:

54 Remember that I am not suggesting that all movement is symmetry-breaking movement. A number of movement types is driven by other desiderata.

70 | 4 Propagating Symmetry Table 4.1: Combinatorial Options for Outputs of XP-YP-Merger Mode of Merger

State of Stability

Empirical Instantiation

EM(XP,YP)

instable

EM(XP,YP) IM(XP,YP)

stable instable

IM(XP,YP)

stable

EPP-raising of subjects (Chomsky (2013)) split topics (Ott (2011a)) base-generated wh-phrases (e.g. English how come) intermediate steps of successive-cyclic movement (Blümel (2012), Chomsky (2013)) stylistic fronting (Ott (2009a)) final step in successive-cyclic movement (Chomsky (2013))

Labeling has been suggested to fulfill a function at the interfaces, i.e. it serves to feed them instructions. For instance, Chomsky (2013) suggests that {EA, v*P} needs to be labeled so that a proper predicate-argument structure is delivered. Now what goes wrong if EA remains in-situ? In line with the Strong Minimalist Thesis – the thesis that narrow syntax meets the demands of the interfaces in an optimal fashion –, a plausible guess is that the CI-interface imposes on narrow syntax the requirement to deliver an unambiguously predicational unit (v*): if v*P and CP align with Chomsky’s duality of semantics (full argument structure and Tense/event structure respectively⁵⁵), vacating the external argument might serve precisely the purpose of identifying the predicate (not the argument; within VP this is achieved by EM of the internal argument alone). Now if EA stays in-situ, the C-I-interface is either “overburdened” with the predicate-argument ambiguity or mistakes the set {EA, v*P} for an argument (if v*P alone raises).

55 Cf. Chomsky (2004, 124). For illuminating comments on the conceptual necessity of phases in a system of virtually unconstrained Merge, cf. also Ott (2009b, 260 ff). He gives the following 3 hypotheses: (113)

a. b. c.

CP is a phase because it specifies force/full event structure. v*P is a phase because it represents full argument structure. DP is a phase because it is referential.

He then elaborates that “some notion of phase is conceptually necessary in any syntactic theory, in particular, when Merge is assumed to be unconstrained. There must be some way of ‘picking out’ those special structures that can instruct [Conceptual Intentional] systems; there is nothing in these structures themselves that marks them propositional [reference omitted]. In this sense, any generative model has to give substance to the notion of a ‘meaningful cycle’; the relevant [Conceptual Intentional] constraints being of course a matter of empirical discovery. Hence, it is of central importance to linguistic theory to develop a proper technical notion of propositionality that applies to those structures that instruct the conceptual systems.”

4.5 Some Consequences and Extensions

| 71

In turn, the “dual label” in {whQ , CQ P} meets a similar interface requirement: the CI-interfaces imposes on narrow syntax the condition to deliver an unambiguously interrogative unit (fitting, say, the semantic properties of an embedding predicate like wonder). Ambiguous selection of [Q] does just that – but the identity of the semantic features is crucial in this case.

4.5.1 On the Timing of Labeling and Bavarian wh-Questions The purpose of this subsection is to show that a combination of the syntactic labeling-ideas of the previous section and recent ideas of the syntax-semantics interface by Marantz (2013) solve problems of Bavarian embedded wh-question implicit in Bayer and Brandner (2008). In particular, I will show that Marantz’ notion of contextual allosemy can be fruitfully applied to what looks like polysemous complementizer. Let me begin by describing the puzzle. Bavarian famously disregards the doubly-filled Comp filter ((114), Bayer (1984)), which bans the cooccurrence of a wh-phrase and a lexical complementizer within the same left periphery of a sentence. This is in contrast to standard German (115): (114)

(115)

I frog-me, fia wos dass-ma an zwoatn Fernseher braucht. I ask-refl for what that-one a second TV needs ‘I wonder what one needs a second TV for.’ *Ich frag mich, für was dass man einen zweiten Fernserher braucht. I ask refl for what that one a second TV needs ‘I wonder what one needs a second TV for.’

How is it that a wh-phrase marks the complement clause as interrogative, while it is headed at the same time by dass, the declarative complementizer. Seemingly we are dealing with conflicting feature specifications in the left periphery of the subordinate clause. There are various strategies to tackle the problem. One is to say that dass is lexically ambiguous between a declarative and an interrogative variant, and that the latter features in indirect wh-questions. This strategy is immediately faced with the problem how to account for the ungrammaticality of (116), which strongly suggests that dass plainly has no interrogative reading or variant: (116)

*I frog-me, dass-ma an zwoatn Fernseher braucht. I aks-refl that-one a second TV needs

72 | 4 Propagating Symmetry

Grewendorf (2012) offers a different account, relying on a cartographic treatment in which the wh-operator part is sub-extracted from a dislocated XP. In effect, the left periphery is simultaneously interrogative (the upper part by virtue of the whoperator) and declarative (dass in the lower portion Fin). This might give a handle on the contrast above. However, this account fails to be a principled explanation. I would like to propose a different solution, relying on a recent theory by Marantz (2013). Put somewhat abstractly, he argues that certain functional heads are semantically underspecified with respect to their presyntactic lexical properties, deriving their particular semantic interpretation by context, much as in conditioned allophony or allomorphy. A semantic exponent is inserted “late” at the syntax-semantics interface, determined by context-sensitive rules (cf. Wood (2012, chpt. 1) for an explicit exposition of these ideas). I believe we can say much the same with respect to the problem above. Bavarian C=dass is semantically underspecified. By default, it yields a declarative interpretation. wh-movement⁵⁶ conditions C to receive a different semantic realization postsyntactically. I state this relationship somewhat informally as follows: (117)

JC=dassK ↔ Q / {wh, {___ . . . }}

The rule specifies that dass be semantically interpreted as interrogative (encoded by the feature Q) if and only if a wh-phrase occupies the sister of the CP headed by dass. This yields the correct results. Notice that this implies that the label Q is generated postsyntactically, which means that the labeling algorithm must apply after application of the rule (117). It is only then that the correct label is identified in which both Q-features of the two phrases are present. How, then, do we account for run-of-a-mill embedded interrogatives like (118)? (118)

I frog-me, fia wos ma an zwoatn Fernseher braucht. I aks-refl for what one a second TV needs ‘I wonder what one needs a second TV for.’

In particular, why is it that here C=∅ is semantically interrogative and thus has the same semantics as the semantic exponent which dass can have? I would like to respond to this question by saying that this is an instance of homosemy, i.e. C=∅ and C=dass accidentally mean the same.

56 Or base generation, which might be needed in the case of warum ‘why.’

4.6 Summary

| 73

4.6 Summary ¯ In this chapter I have reviewed influential analyses of successive-cyclic A-movement and subjected them to critique. Using the idea that failure to label XP-YP-structures can trigger movement, I have presented a novel idea, namely that long distance ¯ A-movement comes about by iterated XP-YP-structures: at the phase level a given {wh, XP} (where X={v*, C}) configuration is rendered labelable by movement of wh. The output of this movement operation, however, is equally unlabelable, thus in need of further wh-symmetry-breaking-movement. This loop continues up to the point of interrogative C, bearing [Q]. Interrogative embedding predicates ambiguously select for [Q] in wh and C, i.e. Minimal Search determines a “dual label” – a stable structure obtains so that wh-movement comes to a halt. The current analysis does not require countercyclic movement or insertion of intermediate traces as in Boeckx (2003)/Takahashi (1994): as movement is strictly phase-wise and subject to the PIC, intermediate steps must be taken at each phasal node. One of the advantages of the current analysis over Bošković’s account is that one less feature is necessary: not only can we dispense with EPP (an advantage Bošković’s analysis has over Chomsky’s system), but we can also dispense with wh features (not as a morphological category, but as a syntactically relevant feature; an advantage that in my opinion the present analysis has over Bošković’s system, cf. also Simík (2011) for a related idea and evidence that the crucial syntactic feature involved in wh-movement is Q). Movement is a by-product of the need to unambiguously label a {wh, CP}-structure (or {wh, v*P}, for that matter). The current analysis also does not run into the aforementioned feature percolation/projection problems: as label identification is, in a sense, a top-down process, neither bottom up projection nor feature percolation are needed, and no probing by the moving XP needs to take place.

5 Shared Labels and Criterial Freezing 5.1 Introduction This book revolves around recent ideas how certain instances of syntactic movement is triggered, namely when Merge creates structures of the form {XP, YP}=α which cannot receive a label or head unless such structures are manipulated (cf. Chomsky (2013), Ott (2011a), based on pioneering work by Moro (2000)). Internal Merge at the phase level of one member (say, XP) remedies this problematic symmetry by creating a chain of XP which is not properly contained in the launching structure α and thus renders Y the label of α. In chapter 4 have shown how recursive application of this mechanism in {XP, CP} yields the intermediate movement steps in successive cyclic movement, converging with the independent findings in Chomsky (2013). I argue that this solution is simpler, less costly and more elegant than previous approaches to the phenomenon. The question of what stops the movement in, for example, indirect questions is tackled at the end of that chapter where I compare two solutions. The first is employing the Probe Label Correspondence Axiom PLCA (Boeckx (2008b)¹) which in effect functions similar to Criterial Freezing by Rizzi (2006). The second is Chomsky’s suggestion that the label in an embedded interrogative {DP, CP} is labeled by the shared prominent feature on interrogative C and the wh-D, namely Q (cf. Cable (2007)). Apart from a number of both empirical and conceptual problems the PLCA encounters, there are numerous independent advantages of pursuing the “shared label” solution, as I will show in this chapter, although aspects of what I will have to say about the shared label idea have similar effects as a PLCAbased solution. I In this chapter, I will make use of a simple and restrictive principle, variants of which explicitly or implicitly are assumed in much work involving phrasal movement: (1)

¯ X-Immobility (XIM) [-Max]/[-Min]-categories cannot undergo IM.

Based on this principle, I will show that in tandem with (1) the joined label idea can derive Criterial Freezing without ado: within {DP, CP}=β, where DP and CP share a prominent semantic feature, DP gets frozen in place after detecting the la-

1 Cf. also Cecchetto and Donati (2008) for a very similar idea. DOI 10.1515/9783110522518-005

5.2 Shared Labels and Full Interpretation | 75

bel of β, because its “projection status” (in traditional terms²) becomes intermediate. Criterial Freezing thus emerges as a corollary of the possibility of projecting the moving element and the dynamic system of cyclic label detection. I then provide a novel analysis of Hebrew “nested interrogatives” (Preminger (2006)), which makes crucial use of the way movement and Criterial Freezing is conceived in this work and which solves a subcategorization problem unaddressed in Preminger’s work. Following a suggestion in Chomsky (2013), I propose that (lack of) ¯ agree plays a key role in rendering {DP, CP} stable. Next to A-movement, I discuss A-movement and show how Criterial Freezing of grammatical subjects likewise follows from the shared ϕ-label in the structure {DP, TP} as suggested in Chomsky (2013). Before I develop these ideas which have a common root, however, I would like to highlight what in my view is a major conceptual argument in favor of the “shared label”-idea.

5.2 Shared Labels and Full Interpretation The purpose of this section is to elucidate the working of the labeling algorithm LA proposed in a recent paper by Chomsky (2013) and to give one major conceptual argument in favor of it, which remains unmentioned in that paper. Throughout, I will compare that LA to the traditional notion of projection, which suffers from a conceptual flaw, namely inevitably violating Full Interpretation. That paper suggests elaborations and refinements of a LA suggested earlier (cf. Chomsky’s 2007, 2008 papers), elaborating that “[t]here are many ways to describe how it [=labeling] might work, but we are interested in finding the most principled answer, the solution that most closely approximates [Strong Minimalist Thesis].” In which sense does the LA approximate the SMT? Before attempting to give an answer, let me show how the LA is formulated, how it works and in which ways it differs from projection in the traditional sense. ¯ The function of this algorithm is to capture the X-theoretic insight that phrases are headed by a lexical entry (functional or lexical). The need for a separate statement that delivers headedness stems from the fact that a system based entirely on the set-forming operation Merge has nothing to say about the prominence of one or the other element. Suppose the application of Merge is restricted to two arguments, α and β, where α and β are both either lexical items LIs or both XPs. Now if this procedure yields {α, β}, it is unclear which element

2 Except for section 4 where projection is contrasted with labeling by Minimal Search, I use the term projection for expository convenience, without actual bottom-up projection implied.

76 | 5 Shared Labels and Criterial Freezing

provides the head of this set – the structure as such is completely symmetrical.³ However, headedness as commonly understood introduces an asymmetry into such a structure, and the question is how this asymmetry is implemented. The algorithm Chomsky suggests for a trivial output of Merge is based on the notion Minimal Search and applies at the phase level. It is formulated in (2): (2)

In {H, XP}=α, H an LI, LB(α)=H.

What (2) says is that in a given phrase comprising a LI and a phrase, the LI – the element structurally closest to the phase head – is the head of the initial phrase. Taking this simple and rather uncontroversial case as given, Chomsky considers those possible outputs of Merge whose label is not directly predictable by (2), namely {X, Y} and {XP, YP} respectively, i.e. the primary step in the derivation and units formerly known as specifiers,⁴ here, the phrasal sister of an XP. Concerning the latter, he suggests two ways to render such structures labelable, namely symmetry-breaking movement (Moro (2000, 2007)) and the idea that such structures may be labelable if they share a prominent semantic feature.⁵ As a first stab, I formulate this latter idea as follows: (3)

Condition on shared Labels (to be revised) In {XP, YP}=α, α’s label are all semantically prominent features of the intersection of X and Y.

Several properties of this view of labeling are noteworthy: In allowing “common labels,” these recent developments depart from earlier projectionist ideas of Bare Phrase Structure which limit a structure α={X, Y} to be labeled by either X or by Y, i.e. classical endocentricity holds: The options [intersection] and [union] are immediately excluded: the intersection of [X], [Y] will generally be irrelevant to output conditions, often, null; and the union will be not only

3 There is a considerable body of literature in which Merge is either inherently or derivatively asymmetrical (Kayne (2008), Narita (2010), Zwart (2011), inter alia), deviating, in my view, from the simplemost conception of Merge, while treating as axioms other properties of the grammar, such as an alleged unarity – as opposed to binarity – of Merge, LIs as the sole locus of Edge Features, etc. Needless to say that much of what I have to say in this thesis is not compatible with this conception of Merge. 4 The notion has no status in the simplemost version of Merge as it isn’t clear if XP is the specifier of Y or YP the specifier of X. 5 Chomsky insinuates that agree between the XP and YP, too, plays a role in making such structures stable, i.e. mere matching of features is insufficient. Although important, I here abstract away from this factor in the definition, returning to this theme in section 5.4.1.

5.2 Shared Labels and Full Interpretation | 77

irrelevant but “contradictory” if [X], [Y] differ in value for some feature, the normal case. (Chomsky (1995b, 244))

A simple example may illustrate a problem with the union option: Merge(V, N)={V, N}=ϵ. As v is a [+V]- and N a [-V]-category, ϵ’s label includes contradictory categorial features if union were possible. As of the intersection, (3) evidently allows for this option, presupposing that the intersection is non-null. That leaves us with “disjunctive” projection. I illustrate this early minimalist view of projection here with the familiar tree notation: (4)

the the

book

The considerations in the quote above have partly lost their validity in recent developments if labeling takes place by Minimal Search instead of projection: unlike the indiscriminate notion of projection, LA does not necessarily pick α or β as a whole as the tree indicates (the orthographic words symbolize feature bundles), but can selectively spot a designated feature from the feature bundle of α and β(’s respective head).⁶ In several respects the Minimal Search-turn that endocentricity has taken in Chomsky (2008, 2013) strikes me as more economical and better equipped to meet minimalist desiderata than the notion of projection: labels do not enter syntactic structures blindly and as a byproduct of Merge/Narrow Syntax but at a point where they are actually needed, namely the mapping to the semantic component/transfer.⁷ What is more, if LA is as selective as Chomsky (2013) suggests, this makes it possible to investigate which specific and interfacerelevant features endow syntactic objects, leaving aside all the other features a given head comes with. To put it differently: LA is economical in being able to ignore semantically irrelevant features, while the concept of projection invariably copies the whole lexical item and endows XPs with many superfluous features, e.g., structural Case and category values. This representational redundancy is actually what underlies a tree graph like (4). By contrast, the specific, MimimalSearch-derived category labels Chomsky proposes for stable XP-YP-structures are not syntactic category values like T, v, D, C, etc. This might suggest that these category labels are irrelevant for the conceptual intentional system, and

6 The often underlined analogy to ϕ-probing should support this conclusion. Hale and Keyser (2002, 61 ff) also discuss the question of which features enter into labels in a fashion that is highly relevant here. 7 Cf. also Boeckx (2009, 2014) for a labeling idea that is similar in this respect.

78 | 5 Shared Labels and Criterial Freezing

consequently and ideally should not be part of the representation of syntactic structures. The remarks just made should make it clear that projection massively violates Full Interpretation (Chomsky (1986), Chomsky (1995b, 27)), which states that there can be no superfluous symbols in representations (or superfluous steps in derivations). The same is not true for a Minimal Search-based LA. Of course, it is possible to design projection in such a way as to remedy this defect, for example, by postulating that one of the features of a head A, F(A), moves or percolates to the node immediately dominating A to yield {F(A), {A, . . . }}. However, such a conception of projection is dubious for the same reason feature movement or percolation in general is dubious (cf. Chomsky (2000), Heck (2004)). As the LA sees only features relevant for CI-interpretation instead of indiscriminately projecting the entire head, Full Interpretation is respected. This observation actually leads to a problem with the formulation in (2): As (2) says that the entire LI is the label of {H, XP}=α, superfluous – i.e. CI-uninterpretable – features may undesirably become α’s label, yielding a representation that violates Full Interpretation. A slightly and trivially amended version of (2) could thus be: (5)

In {H, XP}, H an LI, a semantically prominent feature of H is the label.

For one thing, this formulation abides by Minimal Search – the head structurally closest to the transfer-triggering phase head is still the unit in which a label is looked for, while grammatical features of XP require deeper search. Moreover, it is supposed to guarantee that not all the features of H turn into the label, but only such that are legible by the C-I systems. To give a simple illustration, suppose that a nominal head n is the locus of lexical ϕ-features and uCase. Now if Merger of n(={n, uCase, ϕ, . . . }) and an XP yields {n, XP}, the resulting object is labeled ϕ at the p(hase) level. Thus an element of n, not all of n, is picked, and CI-irrelevant features of n can be ignored. (6)

→ p

...

p

...

?? n[n,uCase,ϕ,... ] XP

ϕ n[n,uCase, ϕ,... ] XP

In his 2013-paper, Chomsky remarks (p. 13):

¯ 5.3 X-Immobility (XIM) | 79

For this approach to be tenable, it must be that LA seeks features, not only LIs – or perhaps seeks only features, in which case it would be similar to probe-goal relations generally, specifically, agree. That seems natural, though the implications remain to be explored.

In this section I have tried to come up with a conceptual argument in favor of this proposal: By introducing CI-redundant features into the representation of an XP, any labeling/projection conception based on lexical items (heads) as a whole inevitably runs into problems for Full Interpretation. A feature-based LA does not. Optimally, the feature-sensitive conception of labeling generalizes to all logical outputs of Merge ({LI, SO}, {SO, SO}, {LI, LI}). After these conceptual reflections on Full Interpretation and labeling, let me now turn to specific ramifications of labeling and XIM.

¯ 5.3 X-Immobility (XIM) 5.3.1 Introducing XIM Chomsky (2013)’s recent proposal to restate projection in terms of Minimal Search for a label is not very explicit with respect to the following issue: ridding syntactic theory of projection begs the question what happens to and what the status is of relational properties like [+/-minimal, maximal] (cf. Muysken (1982), Chomsky (1995b)). It is this gap I intend to address in this section. I show how Minimal Search-analogues of these notions allow a given constituent to vary with respect to its projection status, which plays a crucial role in rendering these constitutents (im)mobile. The guiding principle for this chapter is the following descriptive generalization: (7)

¯ X-Immobility (XIM) [-Max]/[-Min]-categories cannot undergo IM.

The intuition behind (7) is that within Narrow Syntax lexical material and phrases projected from this lexical material get frozen for the computation once identified as part of a larger phrase which is projected from the initial lexical material. In other words, (7) captures the well-established generalization that intermediate projections do not move.

80 | 5 Shared Labels and Criterial Freezing

5.4 Criterial Freezing In this section I show how certain cases of Criterial Freezing CF (Rizzi (2005) et seq) can be derived from independently available properties of the grammatical system.⁸ Once we accept the idea that a given XP-YP-structure can be labeled by a prominent joint feature on X and Y in combination with (7), CF turns into a theorem. CF is given in (8) (Rizzi (2006, 112)): (8)

A phrase meeting a criterion is frozen in place.

The intuitive idea behind CF is that just as a given argument-XP has a θ-related position, below which XP does not occur, this XP if moved, likewise has a terminal, mostly discourse- or scope-related position – dubbbed criterial position – above which XP does not occur. Thus the derivational history of an argument-XP is delimited by a bottommost and topmost position in the sentential spine; movement chains have a restricted stretch. Rizzi encodes criterial positions by means of a variety of functional features like Foc, Sub, Rel, etc., each of which defines a (Foc, Sub, Rel) criterion. The gist of (8) is that once the criterion in question is met, the moved XP is “arrested” in its criterial position, banning further movement. In this section I will take a look at two types of criteria: The Q- or wh-criterion ¯ and the subject criterion, the first of which I take to involve A-dependencies and the latter A-movement dependencies. For both types of movement I will show how CF derives from the shared labeling conception pursued in this work. After first presenting how this works for the Q-criterion, I will show how a new explication of the fairly intricate pattern of Nested Interrogatives in Hebrew can be developed from the system. Finally, I demonstrate how the idea carries over to A-movement.

¯ 5.4.1 Criterial Freezing and A-movement Adherence to the Q-criterion and violations of CF are illustrated in (9-a)/(9-b) for Italian (from Rizzi and Shlonsky (2007, 117)), in(10-a)/(10-b) for English and in(11-a)/(11-b) for German respectively. The capitals in the Italian examples indicate focal stress:

8 Parts of the work from this section have been published in Blümel (2012). Rizzi (2012) independently arrives at a very similar conception. Luigi Rizzi (p.c.) comments: “the idea is very natural, given Chomsky’s recent approach to labeling, so, it’s not at all surprising that different people should come up independently with something very close.”

5.4 Criterial Freezing | 81

(9)

a.

Mi domandavo quale RAGAZZA avessero scelto, non quale ragazzo. I wonder which GIRL they had chosen not which boy

b. *Quale RAGAZZA mi domandavo avessero scelto, non qale ragazzo (10)

a. Bill wonders which book she read. b. *Which book does Bill wonder she read?

(11)

a.

Klaus fragt sich welche CD sie angehört hat. Klaus asks refl which CD she listened to has ‘Klaus wonders which CD she listened to.’ b. *Welche CD fragt sich Klaus sie angehört hat. which CD wonders refl Klaus she listened to has

Rizzi himself asks: “Is [(8)] an independent formal principle, or does it follow from other properties of syntactic computations or of the interface systems?” and goes on to argue in favor of the former. However, as we will see it is possible to derive (8). Let us consider cases that seem amenable to principled explanation under the current derivational conception of label identification: first, I follow Cable (2007) in assuming that wh-phrases are actually phrases headed by a question morpheme Q, which selects an XP containing the wh-word. This accounts fairly directly for cases of pied piping and indeed renders pied piping epiphenomenal: (12)

a. b.

[QP Q=∅ [PP into [which garden]]] [QP Q=∅ [DP whose brother ]]

I refer the reader to his work for argumentation and striking evidence from Tlingit and Quechua for the reality of Q. Turning to CF and taking (9-a) as an example, IM of the [+Max] wh-DP quale ragazza⁹ to the sister of the subordinate interrogative CP yields {DP, CP}=β, with the head of each member bearing [Q]. At the next higher phase level, LA probes both members of β in an effort to find a label for β, finding [Q] respectively. As a consequence, β equals [Q] – the intersection of DP and CP – and is treated as such in subsequent derivational steps:¹⁰

9 I am simplifying the QP/DP-structure somewhat for exposition. 10 I am grateful to Günther Grewendorf (p.c.) for pressing me on the issue of how the “shared label” is to be conceived formally (an element from the intersection).

82 | 5 Shared Labels and Criterial Freezing

(13)

β CP C [Q] . . . DP+Max. . . →

v*P

v*+domandavoi ti

[Q]+Max (=β)

DP−Max D [Q]

...

CP−Max C [Q] . . . hDP+Maxi . . .

Extraction of DP quale ragazza into the matrix clause is blocked because at the derivational point represented by the right tree, DP has intermediate [-Max] label status and cannot move by XIM. Without reference to descriptive notions like Criterial Freezing, we thus correctly derive why (9-b) (and equivalently (10-b)/(11-b)) is out.¹¹ Notice that layers of labels are taken for granted in most analyses. In the current system, by contrast, label levels, which can vary by derivational stage, play a crucial role in rendering phrases (non-)movable.

11 The above observation leads us to expect that β, now [+Max], can move as a whole into the matrix clause (clausal pied-piping). As Rizzi and Shlonsky (2007, 117/118) show, this is indeed – marginally – the case in Italian, say, for satisfaction of a [rel]-criterion in the matrix clause: (14)

?(?) Gianni [quanti libri del quale siano stati censurati] non è ancora stato chiarito Gianni, how many books by whom have been censored not is been clarified ‘Gianni, how many books by whom have been censored it has not been clarified yet.’

A derivational sequence just described is sometimes dubbed “snowballing movement,” which numerous languages exhibit, such as Basque, Bavarian German, Finnish, etc. While the principle XIM provides the necessary condition for this movement, sufficient conditions for this movement remain to be elucidated. The secondary movement is of a different type than the internal one.

5.4 Criterial Freezing | 83

5.4.1.1 Nested Interrogatives, Criterial Freezing and Escape Hatches ¯ I will now turn to a more complex case of A-dependencies, namely a specific subset of multiple wh-questions in Modern Hebrew (henceforth Hebrew) exemplified below: (15)

Yosi yada [CP [et ma] Dan šaxax [CP [le-mi] Rina natna hle-mii het Yosi knew acc what Dan forgot Dat-who Rina gave mai]] ‘Yosi knew what the thing was such that Dan forgot to whom Rina gave it.’

The point and intent of the discussion is to elucidate some of the detailed workings of CF in the sense above, and also the limits of its applications. Do all sorts of structures of the form {XP, YP}, in which X and Y share a prominent feature lead to CF, i.e. provide their label for that set? My answer will be negative. Using Hebrew “nested interrogatives” (Preminger (2006)) as evidence, I show that agree sets the right boundary condition for stable and unstable XP-YP-structures. By highlighting the role agree(FQ , wh)¹² plays in rendering a set {wh, FP} stable, I offer a new analysis of the phenomenon. By contrast, lack of such an agree-relationship prior to forming {wh, CP} forces successive-cyclic movement, despite the presence of a Q-feature on C, the basis of the escape hatch property of CP. Also, by utilizing feature inheritance,¹³ my reanalysis sheds light on an issue that remains unaddressed in Preminger’s work, namely how to account for the seemingly non-local selection involved in nested interrogatives (and elsewhere in Hebrew, cf. Shlonsky (2006); Boeckx (2008a, 12 ff)). The analysis developed here is admittedly partial, but demonstrates that relevant aspects of the phenomenon can be made to follow from independent properties of the grammar: Merge, labeling, feature inheritance, agree and a sparse inventory of features.¹⁴ Another result is that the notion CF is to be conceived dynamically: If the ideas presented here are on target, CF is not to be understood as the relationship between a designated position X in a rigid clausal skeleton and its associated phrase in SPEC-XP. Rather, CF de-

12 Cf. Cable (2007). 13 Cf. M. Richards (2007), Chomsky (2008). 14 This latter aspects merits further discussion in light of a plethora of conceptual arguments advanced by Boeckx (2014) against the ubiquitous use and purported importance of lexical features in Minimalist syntactic theorizing. A discussion of his arguments is way beyond the scope of this work. For current purposes, suffice it to say that in my analysis lexical features play a subordinate role as compared to the elaborate and sophisticated featural preencoding of the outcomes which Preminger employs in his analysis.

84 | 5 Shared Labels and Criterial Freezing

rives from inheritance of a probe X or lack thereof, the goal in the sister position of XP, and dynamic labeling as laid out in the preceding section. Let me begin by describing the relevant facts about wh-interrogatives and in particular nested interrogatives in Hebrew (based on Preminger’s work). First, Hebrew has obligatory overt wh-fronting in constituent questions: (16)

a.

[et mi] Dan pagaš het mii ? acc who Dan met ‘Who did Dan meet?’ b. *Dan pagaš [et mi]? Dan met acc who

Locally (within a single clause), multiple wh-fronting is impossible as (17-a)/(17-b) show, regardless of the ordering of the wh-elements, i.e. regardless whether or not the dependency is “crossing” (17-a) or “nested” (17-b). However, this restriction is not due to a general ban on multiple wh-elements within a single clause as (18) shows, where just a single wh-word undergoes fronting while the other remains in-situ: (17)

a. *[ma] [le-mi] Dan natan hmai hle-mii ? what dat-who Dan gave b. *[le-mi] [ma] Dan natan hmai hle-mii ? dat-who what Dan gave

(18)

[ma] Dan natan hmai [le-mi]? what Dan gave dat-who ‘What did Dan give to whom?’

Now the crucial pieces of data are (19-a)/(19-b) which reveal that multiple whmovement from a single clause becomes possible, provided that the wh-elements do not end up in the same clausal periphery and that the wh-word whose baseposition is structurally lower bypasses the other one, i.e. a nested dependency is obligatory. (20) shows that a crossing dependency is ungrammatical:

5.4 Criterial Freezing | 85

(19)

a.

b.

ma Dina šaxexa [CP le-mi Dan natan hle-mii hmai]? what Dina forgot dat-who Dan gave ‘What was the thing such that Dina forgot to whom Dan gave it?’ [et ma] Rina xašva [CP še-Dan ša’al [CP [le-mi] Roni šalax acc what Rina thought that-Dan asked dat-who Roni sent hle-mii het mai ]]? ‘What was the thing such that you knew that Dan asked to whom Roni sent it?’

(20)

*mi Dan šaxax [CP [et ma] hmii axal het mai] who Dan forgot acc what ate

As the necessity to use a resumptive pronoun in the English gloss indicates, the Hebrew sentences (19-a) and (19-b) do not respect the wh-island condition (Ross 1967). However, the factual situation is more complicated as Preminger shows: while Hebrew allows long distance wh-extraction out of a locally created whisland, wh-extraction via a wh-island created long-distance is out: (21)

a. *[eyze sefer] šaxaxta [CP [le-mi] Rina xašva [CP še-Dan šalax that-Dan sent dat-who Rina thought which book forgot.2sg hle-mii heyze seferi ]]? b. *[et ma] Rina ša’ala [CP [le-mi] Dan xošev [CP še-Roni šalax acc what Rina asked dat-who Dan thinks that-Roni sent hle-mii het mai ]]?

The creation of a wh-island thus crucially hinges on the contrast between long vs. short wh-movement. For the moment, I will focus on the analysis and derivation of nested interrogatives, addressing the problem posed by long-wh-movementinduced islands in the end of this section. Schematically, the derivation of nested interrogatives is given in (22). (23) illustrates the illicit crossing derivation underlying examples like (20): (22)

[CP2 wh2 . . . [CP1 hwh2i [wh1 . . . hwh1i hwh2i]]]

(23)

*[CP2 wh1 . . . [CP1 hwh1i [wh2 . . . hwh1i hwh2i]]]

Preminger (2006, 12) makes the following descriptive generalization: CP1 serves as the clausal escape-hatch, while C1’s complement provides for the landing site of ¯ the locally moved wh-word, an A-operator position, which he labels Foc. Evidence

86 | 5 Shared Labels and Criterial Freezing

for the existence and characteristics of such a position comes from topicalization below the overt complementizer (24), which may license parasitic gaps (25) and which does not give rise to new binding options (26-b): (24)

Dan amar [CP še-[et ha-sefer limud] hu kvar kara het ha sefer Dan said that-acc the book teaching he already read limudi]. ‘Dan said that he had already read the textbook’

(25)

Dan amar [CP še-[et ha-sefer limud] hu kvar kara het ha sefer Dan said that-acc the book teaching he already read limudi (mi-)bli liknot e]. from-without buy.inf pg ‘Dan said that he had already read this booki without reading iti ’

(26)

a.

b.

Dan amar [CP še-Rina i ohevet et acma i ] Dan said that-Rina likes acc refl ‘Dan said that Rina i likes herselfi .’ Dan amar [CP še-[et acma i ] Rina i ohevet het acma i i] Dan said that-acc refl Rina likes ‘Dan said that Rina i likes herselfi .’

Preminger points out that in a multiple specifier system any wh-island effect would be obviated, because there is always an additional edge position available, functioning as an escape hatch. Since Hebrew does exhibit some wh-island effects, it cannot have multiple specifiers. I will return to the way he excludes the derivation underlying the above example at the end of this section. Regarding the major theme of this book, I would like to refine the conditions on successive-cyclic movement as conceived in chapter 4 and the possibility of escape hatches. The tentative and necessarily partial account I will provide here also could give an answer to the puzzling fact that Hebrew apparently allows selection of an interrogative clause to apply “at a distance”,¹⁵ i.e. the selecting verb and the embedded interrogative clause appear not to be in a sisterhood relation. An issue that needs addressing in the analysis of Hebrew nested interrogatives is how multiple wh-phrases behave in the vP-edge if a phase-based system

15 Cf. also Shlonsky (2006) for similar observations and a rather complicated analysis based on Grimshaw (1991)’s notion of extended projection – complications not needed here. The feature inheritance-based account endorsed here could also extend to certain kinds of subjunctive clausal complements from north-west Italian dialects Shlonsky considers, as Cecilia Poletto (p.c.) points out.

5.4 Criterial Freezing | 87

is adopted. First, I will review Preminger’s treatment, who also uses phases, to then describe my own approach. Preminger stipulates that nested interrogatives require that multiple wh-elements base-generated in the same clause reach the same vP-edge in a tucking-in fashion (cf. N. Richards (2001)). Specifically, consider the tree (27): DP2 tucks-in below the external argument DP1, the former of which is base-generated in a position structurally lower than DP1. As phase theory imposes a locality condition on movement (PIC), DP2 must touch down in a specifier of v* before moving on, to escape being being trapped in v*’s domain, barring extraction:¹⁶ (27) DP1 DP2 v* √

root

hDP2i

Tucking-in is necessary for Preminger to ensure that within the subsequent CPcycle, the interrogative Q-probe undergoes agree with the right goal DP1, and DP2 does not structurally intervene. I will now turn to an alternative account of the behavior of multiple whphrases in the v*P-edge in which the relative order of the arguments in the edge is reversed. The alternative is represented in the structure below: (28)

[DP2 [DP1 [v* . . . hDP2i ]]]

In scenario (28), one could stipulate that DP2 must vacate the v*P-edge. When C/T gets Merged, bearing a Q-probe, DP1 counts as Q’s goal: (29)

DP2 [C Q [T Q [hDP2i [DP1 [v* . . . hDP2i ]]]]]

agree between Q and DP1 then is the only possibility due to the invisibility of DP2’s trace (Chomsky (2000)): At the phase level when agree is effected, DP2’s trace in the v*P-edge is invisible.

16 When nested interrogatives are formed by, say, two internal wh-arguments, the uppermost wh-argument is the closest to v* and thus moves first in accordance with the minimal link condition. Tucking-in of the originally structurally lower internal wh-argument follows. Thus the preand the post-movement (v*P-edge) configuration of the wh-elements is order-preserving.

88 | 5 Shared Labels and Criterial Freezing

However, I know of no independent properties of Hebrew grammar that could motivate this specific ordering of Q-probing in relation to DP2’s agreeobviating movement: why is the reverse order not possible (agree(Q, DP2)>DP2movement)? The solution sketched appears to me no less ad hoc than the one opted for here. In the following I will essentially adopt Preminger’s tucking-in approach to the v*P-edge in Hebrew nested interrogatives, noting that problems remain.¹⁷ What about the syntactic domain above the verb phrase, the C/T-area? I assume that in local wh-movement interrogative C inherits its Q-feature to a proxy (Chomsky (2013)), which I suggest is a functional head F for lack of conclusive evidence that it is T;¹⁸ the curved arrow in the tree below (30) indicates feature inheritance: (30) CQ

α DP1

FP FQ

... hDP1i

DP2

v*P

Before justifying why Q starts out on C and not on the functional head Foc below C as in Preminger’s work, let me elaborate on clause-internal consequences of feature inheritance. The Q-feature on C/F probes and undergoes agree with the structurally closest wh-word in the domain, DP1, marked by the dotted arrow. Subsequent local wh-movement of DP1 targets SPEC-FP resulting in a stable configuration {DP1Q , FPQ }=α labeled by the shared feature Q. Notice now that it is agree(Q, DP1) that guarantees that DP1, once raised to the sister position of FP, is criterially frozen in that the label of the ensuing set is shared in the relevant sense. I will say more about the interaction of agree and labels below.

17 Grewendorf (2012) offers a sophisticated analysis of multiple wh-fronting without tucking-in. 18 Cf. Miyagawa (2010) for exploration of the idea that a head distinct from T may function as receptacle of the submitted feature. Preminger calls the complement of C Foc. In the current context, the exact inherent/lexical label of C’s complement is of no great importance, while its feature inheritance-derived label is crucial.

5.4 Criterial Freezing | 89

The argument for why the interrogative feature Q originates in C and then provides F with a probe, is simply that Q in C locally fulfills the selectional requirements of the matrix predicate. This maneuver has another benefit: As Shlonsky (2006, 2) observes, it appears as though interrogative-embedding predicates in Hebrew can select wh-clausal complements “across” a topicalized XP: (31)

ša’alta oti [et ha sefer] le mi le haxzir het ha seferi (you).asked me acc the book to whom to return lit: ‘You asked me the book to whom to return’

According to Shlonsky this state of affairs represents a challenge to a straightforward cartographic treatment, because the interrogative wh or Q-feature associated with the specifier position occupied by the wh-element is evidently not in a sisterhood relation with the selecting verb ša’alta (‘ask’) because Top intervenes.¹⁹ If, on the other hand, Q starts out on C and is delivered to a lower functional head as I argue for here, this problem dissolves: the matrix predicate selects the right type of clause, while probe goal union²⁰ between Q and the wh-phrase obtains by way of feature inheritance. Here I follow Chomsky (2013) in assuming that feature inheritance involves copying – i.e. “sharing,” not “donating”²¹ – of the relevant feature. It must be noted that copying here implies more than providing a morphosyntactic feature to allow a morphological reflex, say, some residue of ϕ-features as in complementizer agreement. Instead, copying must mean that the semantically interpretable feature Q is present both on C and on F – on the former, to endow the clause with the right type for selection, on the latter, to allow labeling and CF. By contrast, ϕ-feature inheritance involves submitting an uninterpretable feature to T. Being lexically uninterpretable, the ϕ-probe on C cannot serve to satisfy selectional requirements of a matrix predicate, let alone function in labeling the CP.²²

19 Interestingly, putting the facts observed by Shlonsky together with Preminger’s actually reveals something of an ordering paradox: While overt complementizers may cooccur with topicalized phrases (C[+] > Top), moved wh-phrases block overt realization of C (C[−] > wh). Once wh-fronting and topicalization combine, C is muted again (C[−] > Top > wh). So apparently, the presence of wh in the left periphery conditions the form of C, irrespective of linearly intervening material. The current feature inheritance-based idea might make some sense of these facts in that Q-bearing C is null, independent of the presence of Top. I have to leave a deeper investigation of these issues for future research. 20 I am borrowing Miyagawa’s terminology here. 21 Cf. Ouali (2009)’s work. 22 The inherited ϕ-set on T is a separate issue; labeling of non-defective {DP, TP} might be contingent on the interpretable ϕ-set on the DP, cf. Chomsky (2013), (p.c.) “[I]t is the identical feature

90 | 5 Shared Labels and Criterial Freezing

Returning to the derivation of nested interrogatives, the second wh-element, DP2, may target SPEC-CP, from where, I suggest, it is forced to move successivecyclically to make it possible to label {DP2Q , CPQ }=β, i.e. β is an escape hatch configuration: (32) CP C

... β DP2

CP CQ

α DP1

FP FQ

... hDP1i hDP2i

v*P

How do α and β differ? Finally Q is formally “shared” in both structures, while the derivational histories of α and β do indeed differ. Somehow the grammar – the LA – must tell agree-derived XP-YP-structures apart from agree-less ones. Intuitively, the reason why further wh-movement is prohibited from α and forced from β is that agree induces CF in α, while its absence renders β as unstable as any other XP-YP-structure. Consequently, raising DP2 to sister-of-CP creates a problematically symmetric structure despite feature matching (cf. Chomsky (2013) for the relevance of this distinction). At the right level of abstraction, Preminger (2006, 16-17) actually arrives at a similar conclusion when he suggests that the ungrammaticality of multiple wh-fronting in monoclausal contexts in Hebrew (33-a)/(33-b) is “therefore of the same nature as the ungrammaticality of” intermediate halting as in (34):

on both DP and T – effectively, specifier-head agreement, interpretable by virtue of their inherently valued character on DP.” Thanks also to Marc Richards (p.c.) for stimulating discussion of this point.

5.4 Criterial Freezing | 91

(33)

(34)

a. *[ma] [le-mi] Dan natan hmai hle-mii ? what dat-who Dan gave b. *[le-mi] [ma] Dan natan hmai hle-mii ? dat-who what Dan gave *I think who (that) Dan met.

Formulated in current terms, the problem with (33-a)/(33-b) is that Q-feature inheritance from C to F and agree with a single wh-phrase allows F to host a single wh-phrase in its specifier. Q on C, by contrast, does not agree so that raising of a second wh-phrase to SPEC-CP results in an unstable symmetry {DP2Q , CPQ }, which can neither be labeled unambiguously nor be repaired by movement (in monoclausal contexts); likewise, (34) represents an unrepaired unstable symmetric structure. To make this distinction explicit without at present being able to rationalize it or derive it from deeper properties of the grammar, let me revise (3) from section 5.2 below in (35): (35)

Condition on shared Labels (revised) In {XP, YP}=α, α’s label are all prominent features of the intersection of X and Y, where one such prominent feature is a probe on X agreeing with Y or vice versa.

The condition in (35) allows us to exclude the crossing dependency pattern schematized in (23) and exemplified by (20), repeated here: (36)

*[DP1 mi] Dan šaxax [CP [FP [DP2 et ma] hmii axal het mai]] who Dan forgot acc what ate

Remember that DP2 et ma vacates the interior of the v*P by first tucking-in below DP1 mi within edge of the verbal phrase. As such the wh-subject is the first accessible goal for the Q-probe on embedded C/F for the purposes of agreement. Leaving aside, for the moment, Relativized Minimality issues, suppose it is the wh-object instead of the wh-goal which now raises to SPEC-FP as in (36). The reason (36) is underivable, then, is that in the absence of agree(FQ , DP2), {DP2Q , FPQ } does not meet (35) and is thus not labelable by Q. Consequently, {DP2Q , FPQ } remains unstable and (36) cannot be derived, as desired. Let me summarize the results of the discussion and reanalysis of nested interrogatives obtained so far. I have shown that there is no contradiction between having interrogative C satsify the selectional needs of the matrix predicate and at the same time providing an escape hatch in the left periphery of the embedded clause to yield the nested interrogative pattern of Hebrew: As the Q-probe agrees

92 | 5 Shared Labels and Criterial Freezing

with DP1 and DP1 ends up in the specifier of C’s proxy, SPEC-CP turns into an escape hatch for DP2 insofar as no agreement obtains with DP2. A remaining issue is accounting for island effects induced by long-distance wh-movement (21), repeated here: (37)

a. *[eyze sefer] šaxaxta [CP [le-mi] Rina xašva [CP še-Dan šalax which book forgot.2sg dat-who Rina thought that-Dan sent hle-mii heyze seferi ]]? b. *[et ma] Rina ša’ala [CP [le-mi] Dan xošev [CP še-Roni šalax acc what Rina asked dat-who Dan thinks that-Roni sent hle-mii het mai ]]?

Preminger analyzes these cases as ultimately PIC-violations: he stipulates that Hebrew CPs allow a single specifier only. Consequently, extracting both eyze sefer and le-mi via the most deeply embedded CP of e.g. (37-a) is not an option. It follows that once either of the wh-phrases escapes via the most deeply embedded CP, the respective other needs to leave the most deeply embedded CP without intermediate touch-down in the edge of that CP, violating PIC. This strikes me a fairly elegant way of ruling out (37). However, adopting that proposal proves difficult given that the analysis relies on the notion specifier, which the current approach seeks to do without. Under current assumptions, the culprit might lie in the violation of the wh-island, i.e. the crossing of the upper CP in the examples in (37): wh-movement may not pass by a criterially satisfied position. The reasons for the single specifier restriction Preminger puts forth remain to be understood – particularly in light of the fact that it does not hold in the vPphase as I have argued here. After having refined certain aspects of the analysis of Hebrew nested interrogatives and thereby revised the Condition on Shared Labels, I would now like to make a brief comment on the way the current analysis might extend to comparable constructions in other languages. After that, CF in the domain of A-movement is considered.

Nested Interrogatives beyond Hebrew Hebrew nested interrogatives are not isolated cases, but are rather part of a larger class of languages allowing one local step of wh-movement and a long-distance movement wh-movement dependency, such that the latter “contains” the former. Consider the following examples from the West African language Akan (38) (from

5.4 Criterial Freezing | 93

Sabel (1998, 35)), which allows extraction from wh-islands, and Bulgarian (39) ¯ (from Rudin (1988)), where a long distance A-dependency is created as part of relativization, crossing two wh-questions: (38)

Dɛni na Mary bisaa sɛ hena na oyɛe ti what foc Mary asked that who foc made Akan

(39)

Vidjah edna kniga, kojatoi se čudja koj znae koj prodava ti . saw-1s one book which refl wonder.1s who knows who sells ‘I saw a book which I wonder who knows who sells.’ Bulgarian

Is the analysis proposed above for modern Hebrew ready to account for the behavior of nested interrogatives in Akan and Bulgarian? The question must be posed in conjunction with independent properties of the respective languages. In the generative literature, Bulgarian is fairly well described and documented. A first major difference between modern Hebrew and Bulgarian is, of course, that the latter is, while the former is not, a multiple wh-fronting language in local contexts, compare (40-a)/(40-b) repeated from (17-a)/(17-b), with (41): (40)

a. *[ma] [le-mi] Dan natan hmai hle-mii ? what dat-who Dan gave b. *[le-mi] [ma] Dan natan hmai hle-mii ? dat-who what Dan gave Hebrew

(41)

Koj kakvo kupuva? who what buys ‘Who is buying what?’

Bulgarian

Local multiple wh-fronting in Bulgarian exhibits a crossing rather than nested dependency pattern, or put differently, local multiple wh-fronting is structure preserving. This is unlike its long distance counterpart (39). Speculating about the difference between (39) and (41), one could argue that (39) indeed exhibits the same syntactic derivation as I proposed above for Hebrew nested interrogatives. How, then, does the derivation of (41) differ from the one of (39)? Considerable modifications are needed, in particular, regarding the question of how the relationship between interrogative probes and wh-elements within the edge of vP. I will not be able to pursue these matters here. Suffice it to say that the analysis for Hebrew nested interrogatives might be applicable beyond Hebrew.

94 | 5 Shared Labels and Criterial Freezing

5.4.2 Criterial Freezing and A-movement Turning now to cases of A-movement, Rizzi (2005) invokes the concept of Subject Criterion to give a name to the following contrast: (42)

a. *Qui crois-tu que t viendra? who believe you that come ‘Who do you believe that will come?’ b. Qui crois-tu que Marie va rencontrer t? who believe you that Mary will meet ‘Who do you believe that Mary will meet?’

In French, a wh-subject must not move long-distance, while wh-objects may do so. According to Rizzi, then, the Subject Criterion freezes the subject in the subordinate clause, preventing movement into the matrix clause. In the current context the Subject Criterion can be recast as follows: Raising the external argument DP to the sister of TP yields {DP, TP}=α, each member bearing a set of ϕ-features, both of which according to Chomsky (2013) derivationally become the label of α. I adopt this analysis without further argument here. After label detection, α equals [ϕ] for computational/interpretive purposes (using subscripts of categories for expository convenience). ¯ In X-theoretic terms, both DP and TP respectively become intermediate projections – again derivationally – and thus immobile. Freezing of derived subjects thus boils down to label identification of the DP’s relevant features by Minimal Search – the raised subject’s “reprojection” in projectionist terms (without actual projection implied).

5.4 Criterial Freezing | 95

(43) C T DP+Max

v*P

... →

C

[ϕ]+Max (=α) DP−Max D [ϕ]

...

TP−Max T [ϕ]

v*P hDP+Max i v*P

... The graphs represent the transition from DP and TP being mobile (left tree) to freezing them as a consequence of labeling α; again, the indices [+/-Max] serve as descriptive devices, signaling results of Minimal Search, not of actual projection. α could move now, theoretically – not as a matter of empirical fact²³ –, but its terms, having intermediate status after detection of α’s label, may not. Freezing effects

23 What prevents α from moving? Anti-Locality (Abels (2003)) could be invoked to exclude movement of α. But as it stands, the principle is stipulative. Boeckx (2008b) attempts to derive the principle from the ensuing identity of (contextual) chain members – presumably a non-distinctness the interfaces fail to interpret as movement. In a somewhat similar spirit, Ott (2015), (p.c.) argues that Anti-Locality might reduce to a labeling problem, which too local movement fails to solve. Thus {XP, YP} is unstable, but too local YP-movement has no asymmetrizising effect, but delivers a too symmetric structure: {YP, {XP, hYPi}}. The labeling algorithm assigns this structure the format {YP, XP}, i.e. the same as the initial one, because the trace of YP is invisible. The solution is attractive and gives the right results for cases where YP does not move further. It is not clear, though, whether Ott’s solution is general enough to capture the problem at hand: anti-local movement of TP to SPEC-CP yields {TP, {C, hTPi}}, where output and input structure are identical with respect to labeling, namely {C, TP}. Failure of symmetry breaking movement cannot be the reason why the derivation is out, because the input structure unambiguously receives the label C, irrespective of whether or not TP moves.

96 | 5 Shared Labels and Criterial Freezing

follow. Status of “projection” level is determined at a given derivational point, not statically; integration of a previously mobile XP into the projection of the target category leads to freezing of that XP (now intermediate). Famous cases of hyperraising in English can be captured by the approach suggested: (44)

a. *John seems is smart b. John seems [α hJohni is smart ]

Hyperraising is out, because John’s ϕ-label is determined in the subordinate clause as part of the embedded finite α (={John, TP}). Concomitant to that label determination, John is integrated in α, arrested in place; extraction of John removes α’s label provider and thus destroys the label integrity of α (improper movement via intermediate CP phase arguably contributes to the degree of deviance; the ungrammaticality of hyperraising in English thus involves at least two factors, one of which receives a principled explanation). This type of account concurs with the conclusion by Gallego and Uriagereka (2008) that Criterial Freezing is not a first factor but is a Third Factor, i.e. reducible to minimal computation. It differs from theirs in locating the source of Criterial Freezing in Minimal Search, not in a violation of Full Interpretation, to which which they – somewhat diffusely – attribute it. Summarizing this section, we have seen that CF derives from the reconception of labeling structures of the form {XP, YP}, with agree rendering these structures stable. These are not endocentric as commonly understood, but a labeling algorithm spots a prominent semantic feature on both X and Y. As a consequence of this labeling, both XP and YP are criterially frozen since both are now [-Max][Min] categories. I have shown how both the wh-criterion as well as the Subject Criterion can be subsumed under these terms.

5.5 Summary In this chapter I have shown, among other things, that the recent conception of labeling by Chomsky (2013) has a number of consequences: in conjunction with a restrictive principle XIM, we have developed the idea that a complex phrase comprising two phrases XP and YP is labeled by a prominent feature which X and Y share gives a straightforward account of the notion Criterial Freezing (Rizzi (2005)). I then suggest how Hebrew “nested interrogatives” can receive a new analysis, based on the assumptions of this book.

6 In Defense of Forked chains 6.1 Introduction In this chapter I argue for the existence of “forked chains” (henceforth FC) in Across the Board (ATB) movement, applied to coordinate structures as illustrated in (1): (1)

What did John admire t and Mary despise t?

The account for FC crucially utilizes a symmetric structure of the conjunct, which serve as the launching site of ATB-movement. Forked chain formation is conceived as a secondary process to movement proper, following ideas of Martin and Uriagereka (2011). The connection to the overall theme of this book is that chain formation abides by Minimal Search and looks at identical syntactic objects in the symmetric conjunct, analogous to the identification of shared labels as in the previous chapter. FCs are characterized by the fact that a single antecedent heads a chain with multiple tails, i.e. extraction proceeds simultaneously from more than one base position in coordinate structures. The idea of branching chains goes back at least to Ross (1967)’s work and has been defended, pursued or presupposed since by Williams (1977, 1978, 1990), Pesetsky (1982), Gazdar (1981) and Gazdar et al. (1985), as well as Goodall (1987). The purpose of this chapter is to defend this old idea, though obviously not the technical implementation of preceding works, as the theoretical framework adopted here (essentially Chomsky (1995b) et seq) has changed considerably. I am thus not suggesting a new conception per se for extraction out of coordinate structures. Rather, the goal is first, to provide novel evidence in favor of FCs from German and English and secondly, to show avenues of analysis which couch and conceptualize FCs in a minimalist setting without jeopardizing basic tenets of the program, or so I hope. Finally, I will show that FCs relate in interesting ways to “shared labels” as discussed in chapter 5. In fact, the main analytical idea of this chapter was inspired by the impressionistically analogous “forked projections.” I would ask the reader eager to see right away the connection between FCs and labeling issues to jump to section 6.4 of this chapter. There are few things, if any, from section 6.2 and section 6.3 that are essential to understand the analysis in section 6.4.

DOI 10.1515/9783110522518-006

98 | 6 In Defense of Forked chains

6.1.1 A Word of Motivation Having said this, FCs have become disreputable in recent minimalist¹ work (Citko (2011), Ha (2008), Munn (1993), Hornstein and Nunes (2002), Zhang (2010)), and, as I will show: unmerited. A few examples illustrate this research trend. To take a quote from work which later entered into a book by the same author: Conceptually, the stipulation of a forking chain is ad hoc, and incompatible to our wellrecognized feature checking operations. Computationally, no multiple extraction operations from sub-constituents to the same damain [sic] are possible. (Zhang (2004, 1))

In a similar vein, Ha (2008, 197) writes: [I]n case of ATB wh-movement, each conjunct contains an identical wh-phrase, and each wh-phrase leaves a trace in its conjunct but takes the same specifier position of CP as the other wh-phrase. This is conceptually odd on the grounds that the two independent movements of the copy end up with a single component.

Likewise, Munn (1993) developed a widely known account of extraction involving two independent movement chains in each conjunct, thus utilizing movement chains as they are commonly understood: one antecedent per gap. Finally, recent takes on ATB include Hornstein and Nunes (2002) and Citko (2005, 2011). All of these works explicitly or effectively deny the existence of FCs. It is this tendency to look for alternatives to FCs which forms the background and part of the motivation of the current defense of FCs. I set out to show that a search for alternatives is unwarranted. My aim is to show how FCs can be conceived and made to work, given certain minimalist guidelines with the goal in mind that there is nothing about them which would violate these principles. What is new in this chapter is empirical evidence in support of FCs and a conceptual argument in their favor in the light of Chomsky’s recent theory of labels, as well as a rebuttal of previous analyses, based mainly on empirical grounds.

6.1.2 Organization of this Chapter This chapter is organized as follows: In section 6.2 I will survey some of the general properties of coordination and specify the empirical scope of my analysis, i.e.

1 It should be stressed that work within other frameworks such as HPSG has not abandoned FCs, cf. Chaves (2007, 2012). Moreover, in a lot of works the existence of FCs appears to be taken for granted. Mostly, however, these works do not have the nature of coordination/ATB as their focus but make an independent theoretical or empirical claim.

6.2 Properties of Coordination and ATB | 99

I will indicate which properties my analysis captures and which ones are not relevant. In section 6.3 I will describe and discuss previous accounts of ATB, including their merits and drawbacks. I show that they leave certain salient properties of ATB unaccounted for. I will develop my own analysis of ATB in section 6.4 and detail, how it captures important features which I argued before are in need of explanation. In the subsequent section 6.5 remaining questions and open issues are addressed. Finally, I summarize the findings of this chapter in section 6.6.

6.2 Properties of Coordination and ATB It is hardly possible to say something about ATB without saying something about coordinate structures, the undisputed derivational source of ATB. In the following survey I give some salient properties of coordination and ATB, indicating at each point whether or not the feature described is relevant for my analysis.

6.2.1 General Properties of Coordination In this subsection I will be concerned with general structural and widely known properties of coordination. The overall conclusion consistent with findings in other work will be that in languages like English and German, coordination involves a syntactic format [XP [& YP]], i.e. a structure where the first conjunct, XP, asymmetrically c-commands the last conjunct YP, which forms a constituent with the conjunction. As we will see further below, however, there are reasons to believe that this asymmetric format is not the result of base-generation of the coordinands and I will show that ATB involves at its core a structure in which XP and YP are symmetrically organized, structurally speaking. In the end, we will see how the asymmetric format mentioned can be derived. Before proceeding, let me address the scope and limitations of the observations to follow. The prime focus of this work is the reality of and the mechanisms underlying ATB and I do not want to deviate from this focus if possible. Thus I will delimit myself to conjunction of two coordinands and abstract away from the internal structure of coordination with more than two coordinands. I.e. structures of the type John and Mary will be considered, while structures of the type John, Bill and Mary will not be considered. Furthermore, I restrict myself to examples of conjunction (‘and’) and won’t discuss disjunction (‘or’), adversative conjunction (‘but’) and the like.

100 | 6 In Defense of Forked chains

Constituency There is considerable distributional evidence that the syntax of coordination involves an asymmetric structural association of the conjunction with one coordinand to the exclusion of the other coordinand. This appears to hold crosslinguistically. For example, in English and German the conjunction forms a unit with the last conjunct to the exclusion of all other conjuncts preceding the last conjunct. This asymmetry can be demonstrated by tests like parentheticals, independent sentences and movement. Thus, parentheticals and independent sentences can be preceded but not followed by the conjunct, suggesting that an underlying structural unit [Coord XP] exists but none of the type [XP Coord] (or, for that matter, a ternary branching [XP Coord XP]): (2)

Insertion of a parenthetical a. Even Bill, and he is no fool, didn’t pass the test. b. *Even Bill, he is no fool and, didn’t pass the test.

(3)

a.

Und ob ich schon wanderte im finsteren Tal, so fürchte ich and if I though wandered in the dark valley thus fear I kein Unglück. no misery ‘Yea, though I walk through the valley of the shadow of death, I fear no evil.’ b. *Ob ich schon wanderte im finsteren Tal und, so fürchte ich if I though wandered in the dark valley and thus fear I kein Unglück. no misery

Along the same lines, displacement shows that a unit [Coord NP] can move as a unit (4). However, the conjunct cannot be stranded with the first coordinate member (5-a) or moved along with the first coordinate member (5-b): (4)

Gestern sind [der Hans] angekommen und [der Bernd]. yesterday are the Hans arrived and the Bernd. ‘Yesterday, Hans arrived, and Bernd.’

(5)

a. *John gave [an article and] to Mary, [a squib]. b. *John gave [a squib] to Mary, [an article and].

It is interesting to note that, even though (5-a)-(5-b) point into the direction that the constituency of coordination in English is [XP [Coord YP]], with the last conjunct forming a constituent with the coordination, it is impossible to displace XP rightward, stranding the unit [tXP [Coord YP]]:

6.2 Properties of Coordination and ATB | 101

(6)

*John gave [and a squib] to Mary, [an article]

Leftward movement with conjunction, like passivization, wh-movement or topicalization is impossible as (15-a), (15-b) and (15-c) show:²

2 In (4) I have chosen a “discontinuous coordination” example from German, which Prinzhorn and Schmitt (2010) convincingly argue involves genuine rightward movement of an NP and must be distinguished from English examples like (7) (which also exist in German): (7)

John gave an article to Mary, and a squib.

We might blame the ungrammaticality of (15-a)-(15-c) on independent constraints like the Coordinate Structure Constraint. However, this leaves unexplained the fact that the string and+NP can occur in a position displaced from the NP it is associated with, i.e. a position which looks like the one resulting from extraposition (7). Can these observations be captured in a unified fashion? I believe they can. Let us see how. Observe first, that (2-a), (3-a) and (7) show that a string Coord+XP is possible in a clause peripheral position only (assuming that parentheticals are independent clauses not structurally integrated into the main clause). This is reminiscent of right and left dislocation phenomena and parentheticals, recently treated under a single umbrella by Ott (2014), Ott and de Vries (2014). Take, for instance, German right dislocation (8): (8)

Peter hat ihn gesehen, den Jungen. Peter has him seem the boy

The analysis which has been suggested by Ott and de Vries (2014) to account for a variety of properties of this construction is a combination of juxtaposition of two CPs (CP1 and CP2), parallelism and sluicing within CP2. It is sketched in (9): (9)

[CP1 Peter hat ihn gesehen], [CP2 [NP den Jungen] C Peter tNP gesehen hat]

There are reasons to believe that instances of “extraposition” of Coord+XP actually do not involve a constituent [Coord XP] which has been moved to the right periphery of the clause. Instead, right peripheral displacement like (7) might involve a biclausal structure along the lines of (9), involving two paratactically related CPs, CP1 and CP2 (essentially, the assumptions in Heim and Kratzer (1998, 249 ff) and the analysis for non-discontinuous coordination cases of German discussed in Prinzhorn and Schmitt (2010).). CP2 is introduced by a conjunction. Within CP2 leftward movement of XP takes place. In addition, sluicing (TP-ellipsis) applies to CP2. Thus (7) receives the analysis (10): (10)

[CP1 John gave [an article] to Mary], [and [CP2 [NP a squib], John gave tNP to Mary]]

There is an interesting advantage from this angle. Aside from the arguments advanced by the authors mentioned, we can solve an old puzzle unearthed by Sag et al. (1985). They observe a subcategorization asymmetry in coordination. The preposition on in the collocation depend on selects an NP (11-a) but not a CP as (11-b) shows: (11)

a. b.

You can depend on my assistance. *You can depend on that he will be on time.

They observe, however, that once the CP and the NP are part of what looks like coordination of these two categories, as in (12-a), the structure becomes grammatical: (12)

a.

You can depend on my assistance and that he will be on time.

102 | 6 In Defense of Forked chains

(15)

a. *[and who] did John like what? b. *[and Bill] was seen John. c. *[and this book], Mary read this article.

6.2.1.1 Asymmetries in Coordination Asymmetries in Binding Evidence from variable binding do not unequivocally provide an argument for the conclusion that the first conjunct asymmetrically c-commands the last, but not vice versa. A universally quantified expression as the first conjunct can bind a pronoun in the second, but the reverse is not possible as Munn (1993) shows: (16)

Variable Binding a. Everyi man and hisi dog went to mow a meadow. b. *Hisi dog and everyi man went to mow a meadow.

b.

*You can depend on that he will be on time and my assistance. (Sag et al. (1985, 165) in Zhang (2009))

Reversal of the relative order of the coordinands is impossible as (12-b) shows. Sag et al. (1985) take this to mean that selectional requirements must sometimes be met in the first conjunct only, i.e. that there is an asymmetry of subcategorization with respect to the first and the last conjunct. Now under the analysis (10), (12-a) is not conjunction of an NP and a CP at all but rather a biclausal structure connected by a conjunction with a single NP in the first clause CP1 and a leftward moved CP that he will be on time in CP2, followed by sluicing in CP2. This is shown in (13): (13)

[CP1 You can depend on my assistance] and [CP2 [CP that he will be on time] you can depend on tNP ]

Notice that there is no problem with the fact that depend on in the elided structure does not select a CP in-situ as we have seen in (11-b). As Webelhuth (1992) famously observed, satellite clauses generally have the distribution of NPs, captured under his Sentence-trace Universal (hence the NP-trace in the elided portion in (13). Cf. Ott (2017) for an interesting proposal to explain Webelhuth’s puzzling generalization, which is compatible with the current analysis). Hence the analysis of (12-a) as [NP [& CP]] is wrong to begin with and the observation that these data show asymmetries between first and last conjunct by Sag et al. (1985) is wrong: the CP is a leftward moved unit within an elliptical CP2 subject to the independently operative Sentence-trace Universal. depend on invariably selects NPs and there is no need to invoke the assumption that (12-a) involves coordination of unlike categories and some sort of asymmetric selection. Under the biclausal hypothesis (12-a) is exactly predicted. Arguably, then, specifier-less Coordination phrases always embed clauses, but never subclausal units. Let us make the following descriptive generalization about conjunct phrases which lack a specifier: (14)

a. b.

X [Coord CP] *[Coord {NP/PP/AP/VP}]

I will not pursue this account further here, as the focus of this work is on ATB.

6.2 Properties of Coordination and ATB | 103

The facts indicate little in favor of c-command, as a confounding factor for (16) is that it might involve a Weak Crossover violation if Quantifier Raising is assumed. Munn (1993) argues that the paradigm in (17) speaks in favor of a c-command relation between the first and the second conjunct: (17)

a. Johni ’s dog and hei went for a walk. b. *Hei and Johni ’s dog went for a walk.

He interprets (17-b) as a Condition C-violation, which presupposes that the first conjunct c-commands the last conjunct. However, Progovac (1998) questions the validity of the argument and argues that we find examples like (18): (18)

Hei finally arrived. Johni ’s dog went for a walk.

If her judgments are correct, this indicates that similar effects can be observed across two sentences, where c-command between an element in the first clause and elements in the second clause does not hold. Moreover, she gives (19) to support her argument: (19)

Johni and Johni ’s wife are certainly invited.

Here, no Condition-C-effect is induced, while within a sentence an effect is detectable: (20) ?*Johni certainly likes Johni ’s wife. Summing up, we have little evidence from either variable binding or Binding Theory to support a c-command relationship between the first and the second conjunct.

Asymmetries in Agreement Coordination structures of many languages exhibit asymmetries with respect to agreement and how it is resolved. An option many languages make use of is that the conjunct as a whole is the controller of agreement, so that number on the verb is always plural, i.e. that agreement is resolved by the semantic plurality contributed by the conjuncts, as shown by the Moroccan Arabic example (21-a). However, languages employ more possibilities. In particular, there are well-known cases of “closest conjunct” agreement, where the linearly closest conjunct controls agreement on the verb (21-b):

104 | 6 In Defense of Forked chains

(21)

Moroccan Arabic (from Benmamoun et al. (2009)) a.

b.

žaw ʕomar w Kariim came.3.pl Omar and Karim ‘Omar and Karim came.’ ža ʕomar w Kariim came.3.masc.sg Omar and Karim ‘Omar and Karim came.’

There are reasons to believe that syntax proper and the sensory-motor system³ share tasks in such phenomena as argued for by a number of researchers (cf. Bhatt and Walkow (2013) and references therein). In this chapter I will not be concerned with agreement asymmetries. The purpose of this subsection was to (re-)establish that coordinands feature an asymmetric structure. After this non-comprehensive survey of general properties of coordination, let me now turn to ATB-movement.

6.2.2 General Properties of ATB Ross (1967) observed that ATB is an exception to the Coordinate Structure Constraint CSC, which is stated in (22) and whose violation is exemplified by way of relativization in (23). Coordinate structures count as one of the classic islands identified by Ross and the CSC is often considered a cross-linguistically valid restriction on movement. I will assume so, despite well-known counterexamples, which were already discussed by Ross and many others. (22)

(23)

The Coordinate Structure Constraint In a coordinate structure, no conjunct may be moved, nor may any element contained in a conjunct be moved out of that conjunct. *The madrigals which Henry play the lute and sings sound lousy.

As (24) shows, if movement applies in an across-the-board fashion, extraction from a coordinate structure becomes possible: (24)

The madrigals which Henry writes and sings sound lousy.

Williams (1977) observed that the CSC is dispensable if the ATB-principle is adopted, which reads as follows:

3 Broadly comprising linear, morphological and phonological aspects.

6.2 Properties of Coordination and ATB | 105

(25)

ATB-Principle If a rule applies into a coordinate structure, then it must affect all conjuncts of that structure.

As can be easily seen, (23) violates (25), because wh-movement has not applied in the first conjunct but only in the second one. wh-movement applies to all conjuncts in (24) in accordance with (25) and the sentence is grammatical. Thus for the few data considered so far, (25) is a correct generalization, rendering the CSC superfluous. I will say more about the ATB-Principle below. ATB movement can take place from any category which can be extracted out of in non-coordination contexts: (26)

Who did [TP Bill see] and [TP Mary like]

(27)

Who did Bill barely [vP see] and yet desperately [ vP like]

(28)

About what did John read [DP a book] and [DP an article]

(29)

Who did Mary say [CP that Bill likes] and [CP that John despises]

(30)

Who was Chris [AP proud of] and yet [AP mad at]?

Just as ATB is licit from categories from which extraction from non-coordinated categories is fine, ATB is bad from categories from which non-coordination is bad. ¯ Thus ATB exhibits island sensitivity. For instance, ATB-A-movement movement out of subjects is impossible as e.g. Katzir and Bachrach (2009) show by way of example (32)⁴ (32)

*Whoi did [a man who loves ti dance], and [a woman who hates ti go home]?

Thus, it seems, ATB has all the properties of the “regular,” i.e. non-ATB movement type. Notice, finally, that most movement types can apply across-the-board, like A-movement (33) and head movement (34). (33)

John was registered and beaten up.

(34)

What did [John ∅ like] and [Mary ∅ hate]

4 Cf. work by Katzir and Bachrach (2009) for the exciting observation that provided the extraction proceeds from the rightmost position within both conjuncts, extraction becomes possible across certain islands like the complex NP-constraint: (31)

[Which book]i did [John meet the man who wrote ti ], and [Mary meet the woman who published ti ]?

106 | 6 In Defense of Forked chains

After having briefly surveyed some general syntactic properties of ATB, I will now turn to parallelism requirements the operation is subject to. Parallelism requirement Williams (1978) observed that ATB presupposes a kind of “parallelism” between the conjuncts from which the moved element originates. The parallelism is not met in (35), where local object and local subject relativization proceed simultaneously. However, it is met in (36), where a wh-object moves locally and a whsubject moves long-distance: (35) (36)

*I know a man whoi [Bill saw ti ] and [ti likes Mary]. I know the man who [John likes ti ] and [we hope ti will win].

Obviously then, grammatical function is not a factor relevant for illicit combinations of links in ATB. Rather, some structural parallelism constraint determines whether or not ATB may apply or not.⁵ I will later show how the parallelism requirement can be reformulated in a phase-based system. Single-identity Reading Another striking fact about coordination concerns the interpretation of the moved element. For instance, the wh-operator in (39) allows for a single identity reading only: (39)

Who did John meet and Mary like?

What (39) denotes is presented in the sketch of a logical form below: (40)

For which x, x a person, John met x and Mary liked x

Crucially, what (39) does not mean is (41):

5 Later, George (1980) would take contrasts such as these to suggest the vacuous movement hypothesis according to which wh-subjects do not undergo movement to SPEC-CP, but stay in SPEC-TP instead. (35) is then accounted for by the fact that the coordinated TPs lack the required parallelism: (37)

I know a man [CP who C [TP Bill saw ] and [TP hwhoi likes Mary]].

By the same token, (36) meets parallelism: (38)

I know the man [CP who C [TP John likes] and [TP we hope will win]]

Nothing in my analysis of ATB hinges on the truth or falsity of the vacuous movement hypothesis.

6.3 Previous Analyses

(41)

| 107

For which x, x a person, John met x and for which y, y a person, Mary liked y.

There are cases of multiple identity readings, especially in wh-adverbials as Munn (1999) discusses in detail. Thus (42-a) can be answered by (42-b), which shows that where asks for multiple locations instead of a single one: (42)

a. b.

Where did Mary vacation and Bill decide to live? Mary vacationed in Paris and Bill decided to live in Toronto.

In this chapter I will confine myself to single identity readings. It is, of course, not a priori impossible that the syntax of ATB with single identity readings and the one with multiple identity readings is identical. For the purposes of determining the nature and mechanisms of ATB, though, I will deal only with examples exhibiting a single identity reading. After describing some salient structural as well as semantic properties of ATB, let me now turn to analyses which aim at determining the structural nature of this phenomenon.

6.3 Previous Analyses There are a number of previous approaches to ATB, some of which I discuss below. All the analyses chosen deny the existence of branching chains. Aside from this commonality, there are differences and I classified the analyses twofold into “asymmetrical” and “symmetrical” accounts. The asymmetric analyses are characterized by the fact that a “primary” movement dependency is established in the first conjunct and a “secondary” relation parasitic on this first-conjunct dependency is established in the second conjunct. The secondary relation is mediated either by an operator (Munn (1993)) or by a silent pronoun (Zhang (2004, 2010)). The symmetric analyses, by contrast, feature a true movement dependency between the gaps in both conjuncts and the landing site of movement, which is either direct (Citko (2005, 2011)) or indirect (Hornstein and Nunes (2002)). Finally, I discuss a recent approach by Ha (2008). His take on ATB is somewhere between the symmetrical and the asymmetrical approaches and I thus place his hybrid treatment at the end of this section. Instead of discussing each analysis directly after presenting it, a overall discussion of asymmetric analyses and then a discussion of symmetric analyses is placed at the end of section 6.3.1 and section 6.3.2 respectively.

108 | 6 In Defense of Forked chains

6.3.1 Asymmetric Analyses According to the asymmetrical approaches what is called ATB involves no movement out of the second conjunct at all, but only out of the first conjunct. In Munn (1993) there is a silent operator which undergoes movement in the second conjunct, which is associated with the moved element in the first conjunct. In Zhang (2010) there is a silent pronoun in the second conjunct which is bound by the fronted element of the first conjunct. Effectively then, the asymmetrical approaches deny the reality of ATB but instead claim that the observed extraction is a standard single displacement from the first conjunct, plus an additional relation between the moved element and a silent element in the second conjunct. 6.3.1.1 Munn 1993 According to an influential analysis by Munn (1993), coordination involves a “Boolean Phrase” BP, which is adjoined to the maximal projection of the first coordinated member. The head Boole0 takes the second member as a complement: DP

(43) DP John

BP B

DP

and Mary As the BP adjoins to the DP, the target of adjunction – here: the DP – projects. Thus the structure (43) captures directly and elegantly the fact that coordinate structures have the same distribution and meet the same selectional requirements as each coordinand alone. In short, PB is never selected. Munn took up an old and often expressed intuition, namely that the structure of ATB is in relevant respects parallel to the one of parasitic gaps, exemplified in (44-a): (44)

a. b.

Which book did John file e without reading e? Which book did John file e and read e?

Both constructions feature a single pronounced wh-expression and multiple gaps. Taking the intuition of this parallelism seriously, combined with the at the time standard analysis of parasitic gaps (Chomsky (1982), shown in (45)), led Munn to suggest that ATB involves a null-operator, associated with the fronted wh-phrase, which moves internally to the BP to SPEC-BooleP as given in (46):

6.3 Previous Analyses

(45) (46)

| 109

Whichi book did John [VP [VP file ti ][PP without [CP OPi C reading ti ]]] CP whoi C=did

TP

John T

VP

VP tJohn

BP V’

V

OPi VP

see

B’ B

twho

and ignore

VP tOP

Aside from a number of descriptive parallelisms between parasitic gaps and ATB, a central piece of evidence in favor of the structure, which Munn gives, comes from reconstruction effects of Principle A of the Binding Theory. As (47-a) and (47-b) show, reconstruction for Principle A is possible in the first conjunct but obviated in the second: (47)

a. Which pictures of himselfi did Johni buy and Mary paint? b. *Which pictures of herselfi did John buy and Maryi paint?

If movement of the wh-expression takes place from the first conjunct only, the observation (47-a) falls into place. By assumption, the operator in the second conjunct lacks relevant parts of the lexical information present in the first one and thus no reconstruction is expected in the second conjunct. (47-b) is correctly predicted to be ungrammatical. After presenting Munn’s (1993) analysis I will now turn to Zhang (2010). 6.3.1.2 Zhang 2010 Zhang (2004, 2010) modifies Munn’s treatment of ATB in a number of respects. First, she follows Kayne (1994) and others and assumes that coordination involves an &P as shown in (48). She assumes that the categorial features of the first coordinand percolate up to the &P to account for the distributional and selection restrictional facts mentioned before:

110 | 6 In Defense of Forked chains

&P

(48) DP John

&’ &

DP

and Mary She follows Munn (1993) in assuming that extraction takes place from the first conjunct only. The novelty of her analysis in ATB lies in assuming that there is a silent pro-form in the second conjunct and a binding relation between the extracted element and this pronoun. Moreover, this pro-form pro-ϕP agrees in term of ϕ-features with the binder and moves to the left periphery of the second conjunct. This is sketched below: (49)

a. b.

[[ Mary did help who ] and [ Jane did ruin pro-ϕP ]] [ whoi [Mary did help twho ] and [ pro-ϕP Jane did ruin tpro−ϕP ]]

She then claims that ATB-constructions are in certain respects parallel to constructions involving the adjective same. She gives the following sketch of a derivation (50-a)⁶ and then proposes that silent counterparts of this adjective are part of ATB in general as schematized in (50-b): (50)

a.

b.

[[ Mary helped [the same man]i ] and [ Jane ruined pro-ϕPi ]] [DP which ∅hsamei picture of himself ] [ did Tom paint tDP ] and [ proϕPi Mary buy tpro−ϕP ]

As a semantic motivation for this silent adjective she resorts to the single identity readings described above. For example, she cites example (51) from Heycock and Zamparelli (2000). The example is odd only because a single identity reading is forced: it is the same set of documents which John wrote today and Mary filed before that, which the speaker is interested in. However, this is nonsensical. If a multiple identity reading were available, the sentence would be just fine as it is perfectly possible for John to write one set of documents today and Mary file a different set of documents before that. (51)

#Tell me which documents John wrote today and Mary filed yesterday.

6 It is in fact unclear what construction she has in mind: topicalization in English? Relative clause formation?

6.3 Previous Analyses

| 111

What is in my view the most compelling piece of evidence in favor of a silent proform in the second conjunct is morphological in nature. Zhang (2010, 227) cites Icelandic examples from Rögnvaldsson (1982) showing that agreement within the second conjunct cannot be with the across-the-board-moved subject in (52-a), because it exhibits the same agreement form as when a quirky subject is expressed in the second conjunct (52-b): (52)

a.

b.

Margir stúdentar náðu prófinu og var hrósað fyrirþað many students passed test the and were praised for it ‘Many students passed the test and were praised for it.’ Margir stúdentar náðu prófinu og theim var hrósað fyrirþað many students passed test the and they-dat were praised for it ‘Many students passed the test and they were praised for it.’

Let me now turn to a discussion of her analysis. 6.3.1.3 Discussion of and Problems for Asymmetric Analyses Here I would like to address properties and problems of both asymmetric accounts. Let me first address the issue of a pronominal in the second conjunct as suggested by Zhang (2010). This property of Zhang’s analysis at first sight looks like a virtue. Prima facie, one would expect that some languages feature overt realizations of the silent pronominal in the second conjunct that she posits. This expectation is confirmed from ATB in the Austronesian language Palauan (Georgopoulos (1983, 1985); taken from Goodall (1987, 67/68)⁷). The language is subject to the CSC and allows ATB as in English. However, a difference between the languages⁸ is that ¯ Palauan has two strategies to establish an A-dependency. Relativization or whinterrogatives exhibit either a gap in the launching site of the moving element, or a resumptive pronoun. Interestingly, as (53) shows it allows both strategies simultaneously under ATB movement, i.e. a gap in the first, and a resumptive pronoun in the second conjunct: (53)

akmediengelii a bilasi [ el lebilʔerar a Cisko ] me [ a Ioseb a I know boat C bought Cisko and Joseph milngesbereber er ngiii ] painted P it ‘I know which boat Cisko bought and Joseph painted.’

7 Cf. Borsley (2013) for similar facts from Welsh. 8 Cf. Munn (1993, 59) for the corresponding ungrammatical English cases.

112 | 6 In Defense of Forked chains

In other words, even though Palauan coordination is subject to the CSC, (53) ¯ is grammatical due to the independently available resumption strategy in Adependencies. Insofar as the overt expression of a resumptive pronoun supports the postulation of a silent pro-form, (53) thus supports the approach by Zhang (2004, 2010). However, as the Palauan wh-question in (54) shows, the resumptive pronoun may likewise occur in the first conjunct: (54)

ngngerangi [ mirruul er ngiii a Sie ] e [ a ʔoʔodal a meʔerar ] what made P it Sie and her sister bought ‘What did Sie make and her sister buy?’

Thus such data do not support her analysis but actually challenge it, because it leads one to expect an asymmetry between the first and the second conjunct: The first is the locus of the movement derived dependency while the second is the locus of the dependency between the fronted element and the resumptive pronoun.⁹ Factually, however, the freedom we see in the way Palauan forms two kinds ¯ of A-dependencies suggests that first and second conjunct are structurally symmetric. Another problem that concerns both asymmetric analyses comes from morphological requirements of the moved element. The German data in (55), (56) and (57) show that the moved element needs to simultaneously satisfy the morphological case requirements of the predicates in both conjuncts, familiar from case matching effects in free relatives:¹⁰ (55)

a.

b.

(56)

[Welchem Jungen] hat Maria gehorcht und dann geholfen? which.dat boy has Mary obeyed and then helped ‘Which boy did Mary obey and help?’ [Welchen Jungen] hat Maria gesehen und geliebt? which.acc boy has Mary seen and loved ‘Which boy did Mary see and love?’

a. *[Welchen Jungen] hat Maria gesehen und geholfen? which.acc boy has Mary seehn and helped b. *[Welchem Jungen] hat Maria gesehen und geholfen? which.dat boy has Mary seen and helped

9 There are in fact approaches which capture resumption by means of movement, cf. among others, Boeckx (2003). However, point remains unaffected by questions of implementation. 10 Although there are differences not relevant for the ongoing discussion.

6.3 Previous Analyses

(57)

a.

b.

| 113

Bären hat er geliebt und geholfen. bears has he loved and helped ‘It is bears which he loved and helped.’ Was hat er gehört und hat die Leute beeindruckt? what has he heard and has the people impressed

The examples in (56) are ungrammatical because this condition is not met: helfen (‘help’) assigns dative case while sehen (‘see’) assigns accusative. The moved whphrase bears morphological accusative and dative respectively, yielding ungrammaticality. gehorchen (‘obey’) and lieben (‘love’) assign dative and accusative respectively, and hence the examples in (55) are grammatical. As (57) shows, finally, this parallel case assignment requirement pertains to the morphological case form only and is not a requirement on the identity of case feature borne by the across-the-board moved element. Bären (‘bears’) and was (‘what’) are syncretic between dative and accusative, and accusative and nominative respectively. The result is grammatical. Now consider the asymmetric analyses. A structure along Munn and Zhang’s lines would be (58-a) and (58-b) respectively: (58)

a. b.

[Welchen Jungen]i hat Maria [ ti gesehen] und [ pro-ϕP geholfen] [Welchen Jungen]i hat Maria [ ti gesehen] und [OPi [ tOP geholfen]]

As far as I can see, there is nothing in their analyses to rule out the ungrammatical sentences in (56), because the operator or proPhiP can both in principle bear any case. To stipulate that OP or proPhiP need to receive the same case as the DP in the first conjunct is clearly ad hoc and boils down to a restatement of the facts. Similar case-identity requirements between an antecedent and an operator or a bound pronoun are tellingly missing elsewhere, so that an operator- or pronounbased analysis suffers from loss of plausibility.¹¹ ¯ In addition, Zhang adopts the theory by Adger and Ramchand (2005) of Adependencies formed by an antecedent and a bound pronoun. However, aside from the fact they argue in favor of a combination of base-generation plus pronoun binding and not of movement plus pronoun binding, they show that in Gaelic the antecedent in fact must differ in case from the case borne by the bound category in wh-questions. As (59) shows, cut assigns genitive case to its object. Once a whquestion is formed, the wh-element must crucially bear nominative and cannot be genitive ((60-a) and (60-b)):

11 The analysis of facts like in (55), (56) and (57) is not trivial, but it seems to me that a operatoror pronoun-based analysis is particularly troublesome for them. I will make tentative suggestions towards analyzing such examples in section 6.4.

114 | 6 In Defense of Forked chains

(59)

Bha thu a’gearradh na craoibhe be.pst you cutting the tree.gen ‘You were cutting the tree’

(60)

a.

De a’ chraobh a bha thu a’gearradh which the tree.nom C.rel bepst you cutting ‘Which tree were you cutting?’ b. *De na craoibhe a bha thu a’cearradh? which the tree.gen C.rel be.pst you cutting ‘Which tree were you cutting?’

Thus the situation is exactly the reverse from ATB (in German, Polish and elsewhere). Consequently, Adger and Ramchand (2005)’s theory is ill-equipped to accommodate ATB; rather than providing an “independent motivation” for this treatment of ATB as Zhang has it, their observations provide a motivation not to adopt a pronoun-based analysis of ATB. Let me finally make a remark on assimilating ATB-constructions with parasitic gaps. If the structure of ATB has the same ingredients as the structure of parasitic gaps, one could expect that whenever a language has the former, the latter should be available too. Reich (2007) makes this point in an effort to argue against Munn’s analysis of ATB: As German does not have parasitic gaps, crucial ingredients for a Munn-style analysis of ATB must be missing. But German ATB is perfectly fine. Hence, ATB cannot have the underlying derivation as Munn suggested. The actual facts in German are not as clear-cut, as some speakers allow parasitic gap constructions more readily than others. Reich’s argument holds, however, for those speakers that do not allow it. On top of this, the literature reports facts that seem more robust than the German ones. Welsh seems to be clearer. Thus Borsley (2013, 10/24) observes that the CSC is operative in Welsh (61-a) and can be circumvented by ATB (61-b): (61)

a. *y dyn [welais i a gwelaist tithau Megan] the man see.pst.1sg I and see.pst.2sg you Megan b. y dyn [welais i a gwelaist tithau hefyd] the man see.pst.1sg I and see.pst.2sg you too ‘the man that I saw and you saw too’

However, parasitic gaps are impossible altogether: (62)

*Dyna ’r adroddiad dw i wedi ei daflu i ffwrdd [heb there is the report be.prs.1sg I prf 3sg.m throw away without ddarllen]. read

6.3 Previous Analyses

| 115

Concluding, Welsh provides evidence that there is no correlation between parasitic gaps and ATB. Such a lack of correlation could lead to conclude that the constructions should not be treated on a par. Summarizing this section, both asymmetric analyses are confronted with empirical problems, namely that first, languages which actually make use of pronominals, i.e. resumptive pronouns, feature these pronominals symmetrically in both conjuncts; secondly, case matching effects cannot be captured by these analyses either. Thus, upon closer scrutiny, arguments in their favor collapse. I hasten to say that I am not claiming that the analyses cannot be repaired. However, the rescue comes at the price of construction-specific stipulations and assumptions which have no independent motivation. I will now illustrate and discuss symmetric analyses. As I go along, more problematic aspects of the asymmetric analyses will be touched on.

6.3.2 Symmetric Analyses 6.3.2.1 Citko 2011 Citko (2005, 2011) offers an intriguing analysis of ATB based on the notion multidominance, which she conceptualizes as “Parallel Merge,” a third type of merge which next to external and internal merge shares features of both. So while external Merge picks α and β from independent sources to yield (63-a), internal Merge picks α and β, where one is contained in the other to yield (63-b): (63)

a. α

β

α

β

b.

. . . hαi . . . Parallel Merge, now, picks α and β to yield a structured object and remerges either α and β again with γ to yield (64): →

(64) α

β γ

α

β

116 | 6 In Defense of Forked chains

Of course, the right tree in (64) is a multi-dominance structure. As the name suggests, in multi-dominance a single terminal node or syntactic object can be dominated by more than one mother node. Many other theoretical frameworks, such as the one adopted here, restrict syntactic objects to be dominated by at most one mother node. Aside from this difference, I would like to point out that in many respects Citko’s analysis comes closest to the one I will develop and advocate below. And yet despite a big overlap in terms of the properties of her and my own analysis, they depart in respects which allow distinctive predictions. It is these predictions which will lead me to argue against her and in favor of my own analysis in terms of FCs. To illustrate how Citko treats ATB, let us take (65) as an example: (65)

What did John lose and Bill find?

Citko assumes that there is a CP selecting an &P, a conjunct of two TPs: (66)

[C [ TP [ & TP]]]

There are numerous multi-dominance relations for ATB as the tree below shows. CP

(67) C

&P

TP

&’ T’

&=and

TP

vP John

T’ v’

T=do VP

lose

vP Bill

v’ v

VP find

what

In the base configuration lose as well as find take a single instance of what as their complement. As we can see, what is thus dominated by the VP of lose as well as the one of find. In addition, she assumes that the two TPs and vPs too dominate a

6.3 Previous Analyses

| 117

single terminal node T and v respectively. As for the former, Citko gives (68) as a justification for this treatment (I will not go into the details concerning the latter): (68) *What did John lose and Bill will find? The T(ense)-node in the first and the second conjunct have to agree in terms of tense and (68) shows what happens if they do not: the first involves the value past while the second bears the value present. According to Citko, the analysis in (67) captures this due to the fact that T is shared in the two conjuncts. So how does (65) come about? In a fairly simple fashion: The single instance of what moves to SPEC-CP, leaving a trace or copy in the multiply dominated complement position. Thus, there is no branching chain, which is the putative advantage compared to classical analyses. Citko (2005) shows that the analysis has further advantages for the treatment of ATB, namely the one of capturing matching effects which I briefly discussed above based on German. She uses Polish examples to exemplify case matching effects. I will not discuss her data in much detail here, because the pattern is basically identical to the German one. Instead I confine myself to presenting the analytical part by way of the Polish examples, i.e. to showing how multi-dominance can handle these effects. As (69-a) shows, ATB-wh-movement is fine as long as the fronted element satisfies the case requirements of the respective gaps, which kogo (‘who’) does: kogo is accusative and both verbs in the conjuncts assign accusative. Once the fronted element bears a case which does not satisfy the case requirements of one of the gaps, ungrammaticality results as in (69-b). Neither kogo (accusative) nor komu (dative) satisfies the case requirements of both gaps simultaneously: (69)

a.

Kogo Jan lubi a Maria podziwia? who.acc Jan likes and Maria admires ‘Who does Jan like and Maria admire?’ b. *Kogo/Komu Jan lubi a Maria ufa? who.acc/dat Jan likes and Maria trusts ‘Who does Jan like and Maria trust?’

The trees in (70) show that the Parallel Merge approach captures the matching effects elegantly: a single complement merges with two verbs with conflicting assignment properties, which is illicit – lubi (‘like’) assigns accusative and ufa (‘trust’) assigns dative. In the tree (70-a) komu satisfies the case requirements of ufa but not of lubi, and vice versa for the wh-element kogo in tree (70-b):

118 | 6 In Defense of Forked chains

(70)

a. * VP lubi

VP ufa

komu DAT

b. * VP lubi

VP ufa

kogoACC

As for the morphological case matching effects observed before for German, consider now the Polish example (71): (71)

Kogo Jan nienawidzi a Maria lubi? who.acc/gen Jan hates and Maria likes ‘Whom does Jan hate and Maria like?’

The Polish inanimate wh-pronoun kogo is syncretic between genitive and accusative. Moreover, nienawidzi (‘hate’) assigns genitive case. Citko (2005, 487) suggests that such facts are captured if “the lexicon contains a single wh-form, underspecified in such a way that it is compatible with both genitive and accusative case features.” Syntactically, then, verbs assigning different case values to their single goal can do so as long as the morphological component (‘the lexicon’) provides only a single exponent for the syntactic object receiving the conflicting case specifications. A simplified tree of the relevant aspects for (71) then looks like (72), where the wh-element is represented abstractly as a feature matrix: (72) VP nienawidzi

VP lubi

D wh Case:[acc, gen] ϕ

6.3 Previous Analyses

| 119

As can be seen, the wh-word receives the values accusative and genitive for its case feature.¹² As long as there is only a single exponent available which is underspecified with respect to the values in question, the results are grammatical. However, once more than a single wh-pronoun form is available and eligible for the differing case values, ill-formedness ensues. I have presented Citko; Citko’s treatment of ATB in terms of multi-dominance structures. I will now turn to Hornstein and Nunes’s analysis of ATB. 6.3.2.2 Nunes and Hornstein 2002 Hornstein and Nunes (2002) and Nunes (2004)¹³ develop an approach to ATB based on the operation sideward movement, which they posit involves the elementary suboperations Copy, Merge, and Delete. Sideward movement is movement of a category X to a position p which does not c-command the launching site of X. Thus sideward movement is movement between unconnected trees. I illustrate their approach to ATB by way of an example sentence (73). The numeration N (73-a) involves the items listed and the derivation accessess the numeration, picking lexical items one by one. The result of the derivation of the second conjunct is object K, given in (73-b). The object L is created as part of an independent workspace, (73-c). The next step is merging read with which book which is the sideward movement step: (73)

Which book did you read and Mary recommend? a. b. c.

N = { which1 , book1 , did1 , Q1 , you1 , read1 , and1 , Mary1 , recommend1 } K = [&P and [ Mary did recommend [ which book ] ] ] L = read

The tree (74) shows the sideward movement step perspicuously:

12 This is a departure from or at least modification of standard assumptions about activating case features and the operation agree made in Chomsky (2000, 2001); Citko’s possibility of oneto-many relation between Case on goals and ϕ-probes is not considered in his work. What is considered there is partial agree between a defective probe and a goal, followed by agree between a non-defective probe and that goal. As only the latter results in valuation of case on the goal, Chomsky’s case is clearly different from multiple case valuation on a single goal as assumed by Citko. 13 It should be stressed that despite serious shortcomings I identify in this section, their approach has gained recognition and got allegiance, cf. Fernández-Salgueiro (2008).

120 | 6 In Defense of Forked chains

(74)

read

&P and Mary did recommend what →

read what

&P and Mary did recommend what

The syntactic object L (read) is a workspace independent of &P. Sideward movement places what in the complement position of read, which subsequently projects. Merge and Move proceed to assemble the first conjunct, the TP you did read what. This TP gets connected to &P when the first conjunct is completely derived. Finally, wh-movement takes place, raising what to SPEC-CP: (75) what

C’ C

&P

TP

&P

you did read what

and . . .

Effectively then, there are no FCs but a configuration that resembles successive cyclic movement: the formation of an A-chain – i.e. sideward movement into a θ position, required by the thematic requirements of the verb in the first conjunct ¯ –, followed by A-movement out of the first conjunct to SPEC-CP. Fernández-Salgueiro (2008) advances empirical support for the sideward movement analysis. He points out that Weak Crossover (WCO) effects are absent in cases where the coindexed element is in the second conjunct (76-a) but do show up if the coindexed element is in the first conjunct (76-b):¹⁴

14 I supplemented the examples (76-a)/(76-b) in Fernández-Salgueiro (2008) with the slightly altered ones (77-a)/(77-b), containing a non-ambiguous predicate in the second conjunct; meet is ambiguous between at least ‘get to know’ and ‘get together with’.

6.3 Previous Analyses

| 121

(76)

a. Who should Mary invite ti and hisi best friend meet ti ? b. *Who should hisi best friend invite ti and Mary meet ti ?

(77)

a. Who should Mary invite ti and hisi best friend have a drink with ti ? b. *Who should hisi best friend invite ti and Mary have a drink with ti ?

Following Postal (1971), Fernández-Salgueiro suggests that WCO is a condition on operations and not (final) representations (contra Koopman and Sportiche’s 1982 Bijection Principle), i.e. the transformation to the left of a coindexed element is the culprit and not the ensuing configuration where a single operator binds more than one variable. This way, the contrast (76-a)/(76-b) receives an account under a sideward movement analysis: In (76-a) who moves from the complement position of meet to the complement position of invite, inducing no WCO effect. The reason is that the two trees of the conjunct are unconnected at this point in the derivation, and thus sideward movement is not “to the left” of the coindexed pronoun. The ¯ same is not true in (76-b) where after sideward movement, the WCO-violating Amovement to the left of the coindexed pronoun takes place. After presenting the sideward movement approach to ATB and empirical support it has received, I will now turn to some of its drawbacks, including remarks on Citko’s analysis. 6.3.2.3 Discussion of and Problems for Symmetric Approaches Both symmetric approaches have problems, which I will discuss in this section. As we have a single representation of the element which undergoes ATB-movement as part of the conjoined launching sites, there are problems which both analyses have in common and I will at the relevant point indicate where the common property leads to common problems. The sideward movement analysis in my view exhibits the most substantial empirical problems. I will thus mainly be concerned with that one, touching on the problems of the multi-dominance analysis as we go along. Obviously, the derivation shown in (75) bluntly violates the CSC, leaving the exceptional character of ATB unexplained. But even if we were to accept this and trust special conditions which sideward movement is subject to and gives rise to, there are independent empirical problems. Specifically, a sideward movementbased analysis of ATB undergenerates. To anticipate data and a discussion below, which in my view, speaks strongly in favor of FCs, consider the following German sentence, a combination of “wh-copying” (cf. Felser (2003)) and ATB, perfectly grammatical:

122 | 6 In Defense of Forked chains

(78)

Wen hat Maria gemeint [CP1 wen Peter gesehen hat] und [CP2 wen Jens who has Mary meant who Peter seen has and who Jens betrogen hat] cheated on has ‘Who did Mary say that Peter saw and that Jens cheated on?’

I assume here that the copy theory of movement (Chomsky (1995b)) is syntactically at the core of this phenomenon (and so does Nunes (2004)). For a sideward movement analysis it is unclear how the wh-element in SPEC-CP2 comes about, because sideward movement does not provide for “parallel” successive cyclic movement within both conjuncts. As far as I can see, there are two possible responses to this challenge. One response could be to say that within CP2, local ¯ A-movement occurs, followed by sideward movement of the wh-element to the complement position of the verb in CP1: (79)

[VP wen gesehen] und [CP2 wen [ C Jens hweni betrogen hat]]

After sideward movement from SPEC-CP2 to the complement position of gesehen, the derivation proceeds as detailed above. This modification of Hornstein and Nunes’s analysis, then, provides the copy which we see pronounced in the left sentential periphery of the second conjunct in (78). However, the derivation in (79) is a clear case of improper movement as illustrated in (80): (80) *[TP Mary seems [CP hMaryi [TP hMaryi is sick]]] For current purposes I take improper movement to be combinations of movement ¯ types such that the head of an A-chain undergoes movement to form an A-chain; in (79), movement of wen from SPEC-CP2 to the internal argument position of gesehen constitutes the A-chain. To the extent that such combinations of movement types are unavailable, I take a derivation like (79) to be impossible and hence (78) to be underivable in this fashion. Alternatively, one could say that sideward movement precedes or simultan¯ eously occurs with local A-movement in the second conjunct and that the base ¯ copy in CP2 is available for subsequent local A-movement as illustrated in (81): (81)

[VP wen gesehen] und [CP2 wen Jens hweni betrogen hat]

This solution begs the question how it is that the base copy in the θ-position of the second conjunct is available for movement twice and in a counter-cyclic fash¯ ion at that: once for sideward A-movement and once for “standard” A-movement

6.3 Previous Analyses

| 123

within the second conjunct. Such “recycling” or multiple manipulation of a single copy for independent movement operations clearly would need independent motivation and justification.¹⁵ A second and rather severe empirical problem comes from an observation which is new, to the best of my knowledge. Remnant movement (den Besten and Webelhuth (1990), Müller (1996), Grewendorf (2003), Abels (2007)) can apply across-the-board as the following German and English examples show: (82)

a.

b.

(83)

[VP X Gelesen ] hat [ Maria [X das Buch ] tVP ] und [ Peter [X den read has Mary the book and Peter the Artikel ] tVP ]. article [VP Gründlich zu X lesen ] hat [ Maria [X das Buch ] tVP ] und [ Peter thoroughly to read has Mary the book and Peter [X den Artikel ] tVP versucht]. the article tried

[VP Written X for children ], [[X those books ] couldn’t possibly be tVP ] and [[X these songs ] shouldn’t really be tVP ].

I make the standard assumption that these examples involve phrasal movement with prior evacuation of the VP by the object and the subject respectively, i.e. I reject an analysis of such examples in terms of syntactic head movement (pace Trinh (2009)). The possibility of further material next to the verb in (82-b)/(83) strongly suggests a remnant movement account. The X in the fronted VP represents the category which vacates the VP to create a remnant.¹⁶ However, descriptively speaking, we see a single fronted category, the VP, but two – and principally more – remnant creating categories: the objects das Buch/den Artikel and the grammatical subjects those books/these songs respectively. It is interesting to note that ATB-remnant movement not only involves parallel movement of the fronted VP, but also prior parallel evacuation movement of the arguments.¹⁷

15 Simultaneous or parallel movement of a single copy to multiple positions is in fact assumed in Chomsky (2008), but there it is tightly related to the notion of phase. In the phase system, it ¯ is parallel probing by an A-probe on C and an A-probe on T which trigger parallel attraction of the goal. As Hornstein and Nunes’s do not make use of phases and probing at all, their sideward movement system does not appear to be compatible with phase-based cyclicity as envisaged by Chomsky. 16 The English example (83) is modeled after the one in Chomsky (2000, 120). 17 Formulated this way, ATB remnant movement requires a fair amount of look-ahead. A reviewer provided me with the following pair, in which the same verb is used transitively and intransitively (judgments are mine, and the reviewer pointed out that the pair seems “bad to [his/her] ear”):

124 | 6 In Defense of Forked chains

Let me now address problems the above observation represents for both symmetric analyses, while noting in passing that the observation in fact extends problems to the asymmetric ones as well: Given that in these analyses a lexical representation of the fronted category is missing entirely in the second conjunct, there is nothing to evacuate either – the presence of, say, an object in the second conjunct as in (82) is completely mysterious from this perspective. If remnant-ATB comes about by Parallel Merge/multi-dominance, the fronted vP in (82)/(83) must be a single instance of a vP and, according to Citko, must furthermore be base generated in the complement positions of both T-heads. However, the condition cannot hold, because the two arguments to vacate the vP are each exclusively part of their own conjunct, i.e. while the conjuncts share the vP, a subpart of the vP is crucially not shared between the conjuncts. Similarly, if ATB comes about by sideward movement, a host of related questions and problems arise: are the arguments to vacate the vP part of it from the beginning of the derivation? Such a structure would, for instance, host both objects and look as in (85): (85)

[vP [OBJ1, OBJ2 V ] v ]

It must be guaranteed that only one of the objects receives a θ-role from the verb while the other one is somehow exempt. Even if this implausible assumption is made it would mean for multi-dominance that OBJ1 and OBJ2 leave vP and land in their respective conjunct TP, whereupon vP with multiple object traces raises to SPEC-CP. For sideward movement it would follow that a successive evacuation of the vP needs to take place both before and after the sideward movement step applies to the vP. Using object evacuation as an example, such a derivation is schematized below:

(84)

a. b.

*Geschrieben hat er und sie das Buch. written has he and she the book *Gegessen habe ich und du das Steak. eaten have I and you the steak

The observation appears to confirm the assumption that the construction under discussion ((82)) must indeed comprise remnants only and cannot mix a remnant and a non-remnant constituent in ATB. For more considerations on the necessary parallelisms, cf. the section on remnant movement below.

6.3 Previous Analyses

(86)

| 125

a. b. c. d.

[TP2 [vP [OBJ1, OBJ2 V ] v ] ] OBJ1 vacates the vP → [TP2 OBJ1 [vP [tOBJ1, OBJ2 V ] v ] ] sideward vP-movement → [TP1 [vP [tOBJ1, OBJ2 V ] v ]] & [TP2 OBJ1 tvP ] OBJ2 vacates the vP → [TP1 OBJ2 [vP [tOBJ1, tOBJ2 V ] v ]] & [TP2 OBJ1 tvP ] remnant vP-fronting →

e.

[vP [tOBJ1, tOBJ2 V ] v ] [ C [TP1 OBJ2 t’vP ] & [TP2 OBJ1 tvP ]]

Problems and questions abound. For instance, if step (86-a)involves object scrambling out of the vP, of what movement type is the sideward movement step from (86-b) to (86-c)? For sure, it would have to abide by the restriction on remnant movement suggested in Müller (1998) and refined in Grewendorf (2003) and Abels (2007), and it is not clear that it does. An alternative to remnant movement would be to independently base generate the arguments in their host clauses so that vP sideward movement may apply without any created remnant. Again, taking multiple objects scattered over the conjuncts as in (82) as a model, thematic role assignment as well as case assignment to the argument in the first conjunct would then have to take place by the v(P) after that v(P) moves to the first conjunct. I will not go on to speculate here what this alternative scenario as well as the one sketched above would mean for theories of θ-role assignment and case licensing. In any event it should be evident that data like the ones in (82)/(83) represent so serious a challenge to both a multi-dominance approach as well as a sideward movement analysis of ATB that attempts at repair appear bleak and would force considerable modification or abandonment of standard assumptions. The data in (82)/(83) suggest an important conclusion which stands in conflict with all the asymmetric as well as the symmetric analyses discussed in this section: Unless we do not give up well-established assumptions about argument structure and thematic role assignment, remnant-ATB shows quite clearly that there must be two (or multiple) instances of vP. And if remnant-ATB shows the necessity of multiple instances of vP, it is natural to say that ATB in general requires multiple instances of the moved category, present in both conjuncts. And this is precisely what FCs are and what I defend in this chapter.

6.3.3 A Hybrid Analysis – Ha 2007 In this subsection I will very briefly present and comment on a recent account of ATB movement developed by Ha (2008), which shares aspects of both asymmetrical as well as symmetrical analyses. Ha claims that ATB (87) involves right node raising RNR as a substep in the derivation. An example of RNR is given in (88), in which capitals indicate a contrastive focus intonation:

126 | 6 In Defense of Forked chains

(87)

What did the children choose and their mother read before going to bed?

(88)

THE CHILDREN CHOSE, and THEIR MOTHER READ – a couple of books before going to bed.

The gist of his analysis is that the derivation of (88) is a precondition for (87).¹⁸ Following proposals by Merchant (2001), Ha (2008, Chapter 5) proposes that RNR involves phonological deletion (ellipsis) of the constituent which remains unpronounced in the first conjunct. His analysis of RNR is schematized in (89), where strikeout indicates PF-deletion: (89)

a. b.

THE CHILDREN CHOSE a couple of books and THEIR MOTHER READ a couple of books before going to bed. RNR THE CHILDREN CHOSE a couple of books and THEIR MOTHER READ a couple of books before going to bed.

Ellipsis is licensed by particular syntactic and semantic conditions which need not concern us here. Suffice it to say that what we call ATB takes the output of RNR as its starting point. As Ha (2008, Chapter 7) argues, movement applies to the relevant unit in the second conjunct only. The full derivation from two conjoined sentences to RNR to ATB is given in (90): (90)

a. b. c.

THEY CHOOSE what and THEIR MOTHER READ what THEY CHOOSE what and THEIR MOTHER READ what What did THEY CHOOSE what and THEIR MOTHER READ hwhati

RNR “ATB”

Thus, ATB involves independently available components of the grammar: conjunction, ellipsis (including its licensing conditions) to yield RNR and movement, which applies to the second conjunct. 6.3.3.1 Discussion of the Hybrid Analysis Let me now briefly discuss Ha’s account. His main claim is that ATB is an epiphenomenon of independently available operations which yield RNR and, based on this, movement out of the second conjunct. Because of its lack of constructionspecific elements, the analysis is, in my view, conceptually very appealing. Notice, furthermore, that the analysis involves a structural representation of the moved

18 As Ha observes, Williams (1978) took this idea into consideration but dismissed it on empirical grounds in conjunction with the assumption that RNR involves rightward displacement. Ha’s analysis of RNR is free of rightward movement and hence Williams’ reservation does not apply.

6.3 Previous Analyses

| 127

element in both conjuncts, which I have argued above is a desirable feature, given the evidence in favor of it from ATB-remnant movement. In this respects, his analysis could be categorized as a “symmetric” one. However, it is unclear how the analysis copes with e.g. German ATB-whcopying, repeated here: (91)

Wen hat Maria gemeint [wen Peter gesehen hat] und [wen Jens who has Mary meant who Peter seen has and who Jens betrogen hat] cheated on has ‘Who did Mary say that Peter saw and that Jens cheated on?’

Under Ha’s analysis ATB involves an elliptical variant of the moved element in the first conjunct. So how does a pronounced copy in the first conjunct come about? And what forces this element to obligatorily appear in a displaced position? Thus leaving the wh-phrase in the first conjunct in-situ yields strong deviance: (92)

*Wen hat Maria gemeint [Peter wen gesehen hat] und [wen Jens who has Mary meant Peter who seen has and who Jens betrogen hat] cheated on has

After all, ATB appears to have all the standard movement properties, with the exception of a single antecedent heading a chain with a number of gaps. Notice also that Ha’s analysis has little to say with respect to case matching effects I have described above for German and repeated in one example below as (93-a) and supplemented with (93-b): (93)

a.

b.

Bären hat er geliebt und geholfen. bears.acc/dat has he loved and helped ‘It is bears which he loved and helped.’ {*Den/*Die} Bären liebt und hilft Hans. the.dat/the.acc bears loves and helps Hans

If the movement antecedent matches morphological case requirements of the case assigning predicates as in (93-a), ATB is licit. Determiners must equally abide by this requirement as (93-b) shows. If ATB involves RNR as a subderivation, similar case matching effects must hold in RNR. However, the examples (94-a) and (94-b)¹⁹ show that this prediction is not confirmed:

19 Thanks to an anomymous reviewer for these suggestions.

128 | 6 In Defense of Forked chains

(94)

a. b.

Hans liebt und hilft Bären. Hans loves and helps bears Hans liebt und hilft {?den/*die} Bären. Hans loves and helps the.dat/the.acc bears

Evidently, case matching requirements are weaker in RNR than in ATB: the example (94-b) shows that accusative on the determiner is impossible (assigned by the first predicate) while dative (assigned by the second predicate) is reasonably acceptable. We can thus conclude that these facts undermine the idea that RNR is the basis for ATB. Under Ha’s analysis similar problems arise as with the asymmetric analyses presented and discussed before. To sum up, his analysis shares with the symmetric analyses the insight that the moved element needs to be present in both conjuncts. However, it runs into problems with accounting for, say, wh-copying and case matching effects. After presenting and discussing previous analyses of ATB, let me now turn to my own account.

6.4 The Current Analysis The point of this section is to show that ATB fits perfectly well into a theory of syntax which recognizes Set-Merge as the fundamental structure building operation, with specific licensing conditions for all structures (labeling) and internal merge in particular (chain formation). Once the notions are properly formulated, conceived and differentiated, ATB is not a special ad hoc operation that involves an “intermediate” step of uniting two (or n) disjoint categories to surface as a single element (pace Zhang (2004)). Quite the opposite: If we make certain assumptions about what chains are (Chomsky (2000)) and how they are identified (Martin and Uriagereka (2011)) both of which have been independently motivated, ATB falls out quite naturally.

6.4.1 Coordinative Core In the following I outline an approach to ATB which might solve some of the problems the construction poses. Somewhat paradoxically, in order to do so, I will initially abstract away from the syntax of coordinate structures. Within generative grammar, there is a long tradition of analyzing coordination in asymmetric terms.

6.4 The Current Analysis

| 129

Leaving aside slight differences of analysis, an asymmetric structure of the form in (95) reflects arguably the prevailing view of coordination up to this day: (95)

&P VP

&P

hit Bill

&=and

VP kiss Mary

¯ (Chomsky (1970), Jackendoff (1977)) posits that phrases are headed and X-theory binary branching (Kayne (1984)), thereby introducing an asymmetry between the Merge mates, and thus (95) is a natural way to capture coordination. By the same token, the very term coordination belies such a treatment, suggesting a structural symmetry between the coordinates and accordingly, many traditional grammarians treated coordination by using “flat,” n-ary branching structures, in which the coordinate members are hierarchically on a par: VP

(96) VP hit Bill

and

VP kiss Mary

The intuition behind this arrangement is that coordination differs from subordination precisely in that the coordinated members are hierarchically symmetric. The intuition is supported by a simple test which shows that conjuncts can be freely permuted without loss of grammaticality: (97)

a. b.

John and Mary went to the cimena. Mary and John went to the cinema.

Furthermore, both coordinands are θ-marked by the predicates, suggesting that none is hierarchically superordinate, in contrast to the observations we have made in section 6.2.1. In current terms, a natural way to capture this symmetry is to say that the coordinate members immediately merge with each other, because merge as conceived in this book and other works (Boeckx (2008a), Chomsky (2007, 2008, 2013)) is symmetrical. I will thus make the simplifying assumption that, say, VP-coordindation like John hits Bill and kisses Mary has the syntactic format as shown in (98):

130 | 6 In Defense of Forked chains

CC

(98) VP

VP

hit Bill

kiss Mary

In other words, I am assuming a binary branching, non-headed (unlabeled) phrase comprising two VPs. Such a base structure is in fact what Chomsky (2013) assumes is the underlying form of coordinated structures (99-a) before raising of one of the coordinates to SPEC-CoordP takes place (99-b): (99)

a. b.

{XP, YP} {XP, {Coord, {hXPi, YP}}}

For lack of a better term, I will, for ease of reference, use “CC” as a descriptive – not a grammatical – label for this structure, which stands for Coordinative Core. The intuition behind CC is that coordination is literally simple merger of the coordinate members, which, as a consequence, mutually c-command each other. As I have been assuming Bare Phrase Structure theory throughout (cf. Chomsky (1995a)), there is furthermore no linear ordering imposed on the coordinates. I posit that this initial merger is subject to the constraint on or “law” of Coordination of Likes (Chomsky (1957)):²⁰ (100)

CC comprises like categories.

Unlike a structure like (95) which treats the coordinands as specifier and complement of & respectively, there is hope that the identity of categories which coordination requires can be captured if they are simple merge-mates. To state a categorical identity relation between two merged XPs strikes me as simpler than to state such a relation between specifiers and complements, especially within a framework which questions the relevance of specifiers altogether like the one of Chomsky (2007, 2008, 2013). This is not to say that coordinate members cannot eventually end up in a configuration like (95) – and below I will follow Chomsky (2013) in say-

20 I am aware that first this constraint has exceptions (Bill is [a professor] and [proud of his job]) and secondly that this stipulation puts a restriction on Merge which Chomsky (2004) claims is free. However, concerning the former point I hope that these exceptions can receive independent explanations, leaving the generalization basically intact (for instance, for the example mentioned, Bowers (1993) idea is suggestive, according to which predicates have their own functional category PredP, which in turn embeds a variety of lexical categories so that effectively, the example does involve coordination of likes.). As regards the former concern, I have to leave this stipulation as an axoim at present, without deriving it from deeper principles.

6.4 The Current Analysis

| 131

ing just this. The point is rather that the condition on Coordination of Likes must after all apply to some configuration or representation. And identity of categories combined by Merge can be easily stated, if stipulative as above, whereas stating that “specifier and complement of & must be categorically identical” is difficult if the notion of specifier is unavailable to begin with. In this sense, we have at least a conceptual motivation for the existence for CC, aside from the classical intuition that coordination involves a kind of juxtaposition of the coordinands. As will hopefully become clear, the initial abstraction from the syntax of coordinate structures paves the way to a better understanding and cleaner formulation of ATB. After showing how ATB works based on such a symmetric core, section 6.4.3.1 illustrates how this derivation can be integrated into a fuller theory of coordinate structures, which may also accommodate the well-known asymmetry effects in coordination, some of which I have shown in section 6.2.1. I will consider two possibilities how CC relates to full asymmetric coordination: One is by direct ATB-movement to SPEC-CP, accompanied by parallel raising one of the coordinate members to SPEC-CoordP (Chomsky (2008)). The other alternative addresses the symmetry problem created by the output of raising one of the coordinates to the sister position of CoordP (the structure shaped {XP, CoordP}, cf. Chomsky (2013)) and suggests a solution in terms of transfer (Ott (2011b)). The analysis necessarily involves intermediate ATB-extraction, which is why I dub it “indirect ATB-extraction.” Thus the presentation of my analysis proceeds in three steps: ATB from CC, direct ATB to SPEC-CP and indirect ATB. Under the current assumptions, the form of CC is crucially symmetric for ATB to work. So to the extent that the assumptions about symmetry as a precondition of ATB is right, ATB serves in fact as a piece of evidence for the symmetric base form CC. What I just said actually makes an empirical prediction: If we can find a language in which a symmetric base form CC is unavailable, ATB is expected to be unavailable, too.

6.4.2 ATB from the Coordinative Core I will now progressively detail the workings of my own analysis of ATB. To do so, I will first have to say something about the Coordinate Structure Constraint (CSC) or its reinterpretations, because ATB is the very type of extraction to void the CSC. Take Williams’ (1977) ATB-principle, repeated here, which derives many of the CSC effects:

132 | 6 In Defense of Forked chains

(101)

ATB-Principle If a rule applies into a coordinate structure, then it must affect all conjuncts of that structure.

What is the source of this principle? As it stands it amounts to little more than a stipulation which ensures the right outcome but which is unlikely to be a candidate of an axiom in the theory. I will not be able to derive the principle here, but adopt a recent proposal by Kasai (2004), whose conception of parallelisms fits the current framework.²¹ Let me briefly review Kasai (2004). Based on a number of observations and previous work by Williams (1977) and Hornstein and Nunes (2002), Kasai suggests the following condition on ATB: (102)

Parallelism Condition on ATB movement Kasai (2004, 181) ATB movement must take place from syntactically parallel positions.

Kasai interprets this principle as being computed in a phase-by-phase fashion, i.e. within every v*P/CP-cycle (102) is checked. Let us go through an illustration. The grammatical sentence (103-a) involving ATB-extraction of a wh-object abides by (102) as the schema (103-b) illustrates, because the wh-expressions occupy the edges of the respective vPs: (103)

a. b.

the man who John saw and Mary kissed C [TP John2 [vP who1 [vP t2 saw t1]]] and [TP Mary3 [vP who1 [vP t3 kissed t1]]

In (104-b) extraction proceeds from non-parallel positions as the grammatical object who in the first conjunct launches from the edge of vP, while the grammatical

21 The CSC or (101) is in fact derived in what in my view is a truly minimalist contribution from GPSG/HPSG (Gazdar (1981), Gazdar et al. (1985)). Roughly put, in GPSG/HPSG displacement – the fact that phrases can appear in one position in the structure while being interpreted in a different position – is implemented as follows: Whenever extraction from XP takes place, XP gets categorically marked, say, as XP’. This marking of a phrase is technically implemented by means of a so-called slash feature which has a specific value and which percolates up the tree until the antecedent of the movement dependency is reached. Now if conjunction combines XP’ with YP, where no extraction has applied to a phrase internal to YP, this conjunction yields [XP’ & YP]. This structure violates the law of coordination of like categories. Thus neither the CSC nor (101) need to be formulated independently of the principle of coordination of like categories. The insight from GPSG/HPSG that the CSC/ATB-Principle reduces to the Law of Coordination of like Categories is very elegant and presumably ultimately right in spirit. However, the technical details are not compatible with assumptions I am making in this book; thus, I am not assuming that movement of a phrase leaves a mark on every node passed.

6.4 The Current Analysis

| 133

subject who in the second conjunct launches from SPEC-TP. (102) is violated and ungrammaticality results as (104-a) illustrates: (104)

a. *I know a man who Bill saw and likes Mary. b. C [TP Bill2 [vP who1[vP t2 saw t1]]] and [TP who1 [vP t1 likes Mary]]

Moreover, (102) accounts for more intricate cases like (105-a), where local whextraction of the object applies to the first conjunct and long distance whextraction of the subject applies to the second conjunct: (105)

a. b.

I know the man who John likes and we hope will win. C [TP John2 [vP who1[vP t2 likes t1]]] and [TP we3 [vP who1 [vP t3 hope [CP t1[TP t1[vP t1 will win]]]]]]

For more examples, I refer the reader to Kasai’s paper. I adopt the principle (102) for what follows. ATB Let us look in some detail at the derivation of a wh-ATB out of a VP-coordinated structure: Who did John see and ignore? (106) is an example representing the configuration before who1 and who2 ATB-move out of the v*P-edge; this is the configuration which is a necessary condition for (102) to apply, i.e. without both instances of who present in both vP-edges, (102) is impossible to meet: (106)

[CC [v*P who1 see hwho1 i] [v*P who2 ignore hwho2 i]]

Let who1 =who2 , i.e. both elements are featurally identical. I am using the indices here just for expository purposes. Moreover, either may move at the phase level C. Now suppose who1 moves, while who2 stays in-situ to yield (107): (107)

[ who1 [C . . . [CC [v*P 1 who1 see hwho1 i] [v*P 2 who2 ignore hwho2 i]]]].

What happens now is that a chain needs to be licensed. Following Martin and Uriagereka’s, this process involves Minimal search (probing) for the wh-elements. Martin and Uriagereka’s prime concern is the problem how to distinguish between copies (of a movement chain) and repetitions. I take the core of their proposal to be the following:

134 | 6 In Defense of Forked chains

(108)

An element α constitutes a copy of α’ iff a. b.

there is no phase-boundary between α and α’ and if Merge is accompanied by Minimal Search.

Conversely, (109)

α and α’ are interpreted as repetitions whenever a. b.

a phase-node separates α from α’ and no search for α’ takes place upon Merger of α to a phase-edge.

What is crucial, really, is that internal merge comprises Minimal Search as one of its subcomponents, presumably, to identify the object we call a chain. I.e. internal merge involves a context-sensitive component which external merge does not. The natural thing to say is that chain formation is Minimal Search for the element identical to the one that is raised to SPEC-CP. Search for the identical element as the upper copy of who1 , however, is ambiguous between who1 and who2 – the intersection of {v*P1 , v*P2 } – both of which are structurally equally close to C. As a consequence, both are identified as chain members of the chain headed by the upper copy of who1 . The process of copy identification just described is effectively the same as label identification described in Chapter 3. Recall Chomsky’s idea picked up by Rizzi (2012) and myself in Blümel (2012) that in a syntactic structure which has set theoretical format {XP, YP}=α a prominent feature shared by both X and Y can function as the label of α. This is represented in the tree in (111) repeated from chapter 5. Minimal Search picks the intersecting element in both set members. Rizzi (2012) and I show that Criterial Freezing as exemplified in (110-a)/(110-b) follows naturally if this assumption is made. (110)

a.

Mi domandavo quale RAGAZZA avessero scelto, non quale I wonder which GIRL they had chosen not which ragazzo. boy b. *Quale RAGAZZA mi domandavo avessero scelto, non qale ragazzo Rizzi and Shlonsky (2007, 117)

6.4 The Current Analysis

| 135

v*P

(111)

v*+domandavoi

VP ti

Q (=α) DP

D [Q]

CP ...

C [Q]

. . . hDPi . . .

Now notice that we can say something very similar with respect to the identification of non-distinct copies: In ATB configurations one of the identical objects raises (112-a). This process is accompanied by Minimal Search at the phase level for the identical element lower in the structure. However, in ATB, Minimal Search detects two non-distinct elements (112-b), giving rise to “branching chains,” and the impression that a single antecedent heads a chain with multiple gaps (a correct impression): (112)

a.

[ who1 [C . . . [CC [v*P 1 who1 see hwho1 i] [v*P 2 who2 ignore hwho2 i]]]].

b.

[ who1 [C . . . [CC [v*P 1 who1 see hwho1 i] [v*P 2 who2 ignore hwho2 i]]]].

Thus the step (112-b) is for movement and chain formation what (111) is for “projection.” Notice that no movement proper applies to who2 , yet it becomes a member of a movement chain. Let me stress that the above kind of derivation solves the issue of the single identity reading in a straightforward way²² and without the need for dubious devices as silent adjectives same (pace Zhang (2010)): As the chaining mechanism integrates who2 into the chain headed by who1 , a single identity reading is expected (I know of no claim in the literature where distinct members of a movement chain can be referentially independent). The chain identification process is equivalent to the “shared label” idea in Chomsky (2013): The head of the movement chain is a single element just as a single label can be yielded if in {XP, YP}=β Y agree in feature F with X (or vice

22 A property the current analysis shares with the symmetric analyses discussed in this chapter.

136 | 6 In Defense of Forked chains

versa) and the intersection of X and Y in β, F, becomes the label of β. Notice also that the chain identification mechanism is very natural, given its equivalent, the label identification procedure: We know what copies and chains are – i.e. how they are defined –, but the question how they come about derivationally in the phase based system is rarely addressed. An exception is Martin and Uriagereka (2011), whose insight that chain identification is Minimal Search-based fits exactly our current assumptions about labeling and the general label-chain equivalence. What goes wrong if both who1 and who2 move to the edge of C? As we now see, the question as such is misguided. A related question is what motivates who1 and who2 to move in the first place. Let us again consider movement of one element, say who1 , first. Let us assume that who1 moves in order to solve a labeling problem: What I have termed v*P above is in fact an unlabeled structure comprising a complex element, a phrase, who1 and a v*P.²³ In κ={who1 , v*P} no element is more prominent than the other so that no label can be assigned. Movement of who1 salvages this defect because then κ can be labeled v*. Is there any justification for who2 to move as well? Finally, it too is contained in a symmetric and seemingly unlabelable structure. Once we take a look at the chain that is formed after moving only who1 , we see that there is not. For perspicuity, let us again take VP-coordination with a full tree as in (113) and go through the corresponding chain members in (114). Notice that angled brackets are missing around who2 to indicate that internal merge has not applied to this element. (113)

CP who1 C=did

TP

John T

CC

κ1 hwho1 i

κ2 v*P

v*

who2 VP

see

v*P v*

hwho1 i

VP ignore

hwho2 i

23 Again, I need to abstract away from issues related to the external argument within the VP.

6.4 The Current Analysis

(114)

| 137

CH(who1 )={ a. b. c. d. e.

did John see twho1 and ignore twho2 , v* see twho1, v* ignore twho2 , see, ignore}

According to Chomsky (2000, 116) a “chain [. . . ] is a set of occurrence of an object α in a constructed syntactic object K.” What is an occurrence? According to Chomsky, “an occurrence of α is the sister of α.” What this means for (113) is that the uppermost occurrence of who1 is as in (114-a), i.e. the syntactic object traditionally referred to as C’. As this object is not contained in any other occurrence and contains all the other members of the chain, it is the head of the chain.²⁴ What is crucial to see is that the occurrence of who1 in the edge in the v*P is actually ambiguous: not only does the chain linking procedure identify who1 – the sister of who1 , i.e. the left v*P – as a member of the chain, but also the occurrence who2 , i.e. the right v*P. The reason for this is that first who1 and who2 are identical, hence indistinguishable and secondly they are equally close to the phase head C. So, crucially, the chain comprises (114-b) and (114-c) (and by transitivity of previous chain formation processes also (114-e)). Returning to the initial question why who2 need not move – i.e. break the problematic symmetry in κ2 ={who2 , v*P} – we can now formulate an answer: (114-c) too is a (lower) member of the discontinuous object headed by who1 , κ2 becomes labelable without the need to apply internal merge to who2 in κ2 ’s edge. As who2 is integrated into the chain and need not move, I take it that who2 must not perform this superfluous step (aside from the independent impossibility of multiple wh-fronting in English). A remark on the θ-criterion (Chomsky (1981b)) is in order: The unique chain above is associated with two thematic roles in violation of the biuniqueness holding between arguments (and the chains they are part of) and θ-roles, a standard assumption. However, consider the linking of chains as described above and in addition the notion of phases more carefully. Chomsky (2001) suggests that the definition of phase relates to interpretive properties at the interface: accordingly, phases are defined by being the units which are interpreted at the semantic interface as argument structure on the one hand (vP) and information structural properties such as focus, topic, givenness and newness on the other (CP). Notice

24 This is how the set-based definition of chains makes it possible to define notions like head or tail of chain; the arguably more complex sequence/pair-based definition of chains is thus unnecessary, as Chomsky points out (curiously, this is an issue not acknowledged in Martin and Uriagereka (2011)).

138 | 6 In Defense of Forked chains

that the derivation as described above involves two independent chains as far as the derivation prior to the CP-phase is concerned. Because of this, the θ-criterion is abided by within the respective vP-phase, the unit of argument structure and thematic configurations: within κ1 the wh-chain bears a single θ-role by virtue of originating in the complement position of see, and within κ2 the wh-chain bears a single θ-role by virtue of originating in the complement position of ignore. Unification of these two independent chains takes place afterward, within the next, non-thematic cycle. In this sense, the analysis of ATB involving FCs does not violate well-established principles. With the proper conception of chain linking by Minimal Search, we see that having a single antecedent with two (or n) gaps is nothing troublesome, but very natural. As non-top chain members are treated alike, the upper copy of who2 is treated just as any non-maximal chain member. A more appropriate representation of the tree (113) is thus (115), where this copy does not receive a phonological matrix, and which resembles more closely representations of ATB as we know them: (115)

CP who1 C=did

TP

John T

CC

κ1 hwho1 i

κ2 v*P

v*

hwho2 i

VP see

v*P v*

hwho1 i

VP ignore

hwho2 i

After outlining how a general theory of ATB under minimalist assumptions looks like, I will now show how the analysis captures specific ATB-phenomena discussed before. wh-Copying The point of this section is to provide another argument in favor of the view and analysis defended above. The evidence comes from wh-copying in German. As

6.4 The Current Analysis

| 139

is well-known and briefly reviewed in chapter 4 of this book, German allows a ¯ pronounced instances of wh-elements under long-distance A-movement: (116)

Wen hat Maria gemeint wen Peter gesehen hat? who has Mary meant who Peter seen has ‘Who did Mary say that Peter saw?’

Now it is perfectly possible to combine wh-copying with ATB-movement to yield ATB-wh-copying:²⁵ (117)

Wen hat Maria gemeint [wen Peter gesehen hat] und [wen Jens who has Mary meant who Peter seen has and who Jens betrogen hat] cheated on has ‘Who did Mary say that Peter saw and that Jens cheated on?’

What these data show is that a copy of wen (‘who’) shows up in the left peripheries of the respective coordinated subordinate clauses. A FC analysis can make ¯ immediate sense of such data in that there are two independent A-chains in the subordinate CPs, followed by ATB/FCing as described above of the respective intermediate copies to matrix SPEC-CP (and, in addition, whatever conditions and mechanisms are responsible for the pronunciation of intermediate copies in this construction, cf. Nunes (2004), Boef (2012), Pankau (2013)): (118)

Wen hat Maria gemeint [wen Peter hweni gesehen hat] und [wen Jens hweni betrogen hat]

Needless to say that the asymmetric analyses as well as the sideward movement analysis discussed above run into serious trouble, given such facts: for the operator-based and the pro-based analysis, the question is why there is a pronounced copy of the wh-element in the second conjunct to begin with. An analysis based on FCs by contrast needs to invoke no more than what is independently needed in wh-copying.

25 Felser (2004) remarks that there might be a preference for a multiple identity reading in such examples, where the person asked for by wen in the first conjunct is a different one from the one asked for by the instance of wen in the second conjunct. I concur with her intuition. However, as I restrict myself to single identity readings here. (117) exhibits an ambiguity which includes a single identity reading – and Felser seems to think so, too. Thus there is a reading according to which the speaker wants to know about a specific person who Mary thought Peter has seen and who Mary thought Jens cheated on. Thus, the argument advanced here is unaffected.

140 | 6 In Defense of Forked chains

Remnant Movement Recall the discussion from section 6.3.2.3 regarding remnant movement, exemplified by the German example (82), repeated here as (119): (119)

Gelesen hat Maria das Buch und Peter den Artikel. read has Mary the book and Peter the article ‘As for reading, Mary has read the book and Peter has read the article.’

Remember the main conclusion from that discussion: as I argued, (119) clearly shows the necessity to analyze this construction as involving two vPs within each conjunct. This is contrary to expectations raised by analyses which do not assume FCs and as I will now show, this is in line with an analysis that involves branching chains in the sense described here. First, the CC-structure of (119) looks like (120): (120)

[CC [TP1 [vP1 OBJ1 V=gelesen ] ] [TP2 [vP2 OBJ2 V=gelesen ] ] ]

This structure comprises two vPs, vP1 and vP2 within each coordinated TP. It is important to note that while the respective verbs within each vP are identical, the respective objects OBJ1 and OBJ2 differ, the former being das Buch (‘the book’) and the latter being den Artikel (‘the article’). Thus while the numerical indices on TP and vP serve exposition, the numerals behind the internal arguments indicate a missing identity of the syntactic objects. As I am making the standard assumption that it is remnant movement which underlies (119), each object needs to scramble out of its respective vP as illustrated in (121) before CC is derived, an instance of parallel evacuation: (121)

das Buch

v*P2 den Artikel v*P1

Maria

Peter VP

tOBJ1

v* V

gelesen

VP tOBJ2

v* V

gelesen

¯ After CC gets assembled and C merges with CC, one of the remnant-vPs A-raises to CP SPEC. Assume, for concreteness, that vP1 moves:

6.4 The Current Analysis

(122)

| 141

CP v*P1 C=hat

CC

TP

TP

Maria das Buch

Peter v*P1 den Artikel v*P2

tMaria

tOBJ1

VP

v* V

gelesen

tPeter

tOBJ2

VP

v* V

gelesen

¯ At this point in the derivation, again, search for the identical copy of the A-raised element sets in. What is the identical copy of dislocated vP1? We face a seeming problem: the object raised to SPEC-CP includes the trace of OBJ1, or more specifically, a deleted copy of das Buch. The question which arises at this point is whether vP2 qualifies as identical to vP1, given that vP2 includes the trace of OBJ2, a deleted copy of den Artikel.²⁶ As we will see in a moment, the problem compares to one which case matching effects represent for the current analysis of ATB. Let me suggest a solution to this challenge in terms of “trace invisibility” as propounded in Chomsky (2000). Remember that the identification of a copy as being part of a chain involves the notion Minimal Search. What other processes involve minimal search? In the literature, I know of two cases: agree and the identification of labels. In both of these processes it is assumed that a moved category is invisible to search. In agree, the trace of a moved category is invisible and hence gives rise to voiding intervention effects so that agreement with a lower category is possible. Raising constructions from Icelandic with quirky Case subjects illustrate the point (from Holmberg and Hróarsdóttir (2005)):

26 For the way I presented the remnant movement analyses, the same applies to the (copies of) vP-internal subjects. However, it is not entirely clear whether it is actually vP that undergoes remnant movement, or a smaller verbal category, cf. Müller (1998).

142 | 6 In Defense of Forked chains

(123)

a.

b.

Méri virðast ti [hestarnir vera seinir] me.dat seem.pl the horses.nom be slow ‘It seems to some man that the horses are slow.’ það virðist/*virðast einhverjum manni [hestarnir vera expl seems/seem some man.dat [the horses.nom be seinir] slow] ‘It seems to some man that the horses are slow.’

In (123-a) the matrix verb agrees with the nominative argument lower in the structure. The trace indicates the original site of the quirky subject which raises into specifier of TP in the matrix clause. In (123-b) the quirky experiencer remains insitu. The matrix verb cannot agree in number with the nominative argument low in the structure but must exhibit singular number, suggesting that the experiencer controls agreement. Chomsky (2000) interprets such facts as indications that the trace of the quirky subject does not cause an intervention effect between the ϕ-probe on matrix T and the argument further down. However, once the experiencer stays low as in (123-b), the experiencer’s full (trivial) chain intervenes between ϕ on matrix-T and the lower argument the horses, such that only the experiencer counts as a possible goal.²⁷ Now if these assumptions are right, similar conclusions suggest themselves for the identification of copies to yield movement chains. I would like to propose that Minimal Search or an identical category XP (a copy) too is “blind” for phrases previously moved out of XP, i.e. traces within XP do not enter the computation of identity of occurrences of XP. What this means for the derivation in (122) is that next to vP1, vP2 is identified as a copy of vP1 in SPEC-CP. Hence both the lower occurrece of vP1 and vP2 get integrated into the movement chain. In this sense, vP2 is “sufficiently identical” or sufficiently non-distinct from vP1 to count as a copy of vP1. One may wonder what happens if “parallel evacuation” does not apply to vP1 and vP2, i.e. what happens if only a single (instead of two) remnants is created as schematically exemplified in (124): (124)

[CC [TP1 OBJ1 [vP1 tOBJ1 V=gelesen ] ] [TP2 [vP2 OBJ2 V=gelesen ] ] ]

In this case, subsequent movement of either vP1 or vP2 will not yield a movement chain: If vP1 undergoes fronting, search does not recognize vP2 as non-distinct

27 For recent arguments from tough-constructions in favor of this line of reasoning, cf. Hartman (2011).

6.4 The Current Analysis

| 143

due to the presence of OBJ2 within it. If vP2 moves, vP1 is not identified as a copy, because it lacks OBJ2. However, I take chain formation to be necessary in order to meet (102). Thus, “asymmetric” remnant creation ultimately violates (102). Despite some necessary maneuvering, I hope to have shown that the current analysis can capture remnant ATB, which proved to be problematic for other accounts. I now turn to case matching effects as described in this chapter for German and Polish. Case Matching Effects Let me now address case matching effects as discussed before, taking German examples. I repeat the relevant sentences here: (125)

a.

b.

[Welchem Jungen] hat Maria gehorcht und dann geholfen? which.dat boy has Mary obeyed and then helped ‘Which boy did Mary obey and help?’ [Welchen Jungen] hat Maria gesehen und geliebt? which.acc boy has Mary seen and loved ‘Which boy did Mary see and love?’

(126)

a. *[Welchen Jungen] hat Maria gesehen und geholfen? which.acc boy has Mary seehn and helped b. *[Welchem Jungen] hat Maria gesehen und geholfen? which.dat boy has Mary seen and helped

(127)

a.

b.

Bären hat er geliebt und geholfen. bears.acc/dat has he loved and helped ‘It is bears which he loved and helped.’ Was hat er gehört und hat die Leute beeindruckt? what has he heard and has the people impressed

My take on morphological case matching effects will be very similar to Citko’s, although in my analysis there is no multiple case assignment to a single case feature, because my analysis involves multiple instances of the moved item. I adopt a realizational approach to morphology, i.e. Distributed Morphology (Halle and Marantz (1993) et seq). A simplified syntactic tree with the relevant featural information for, for instance, (127-b) is given in (128), where the wh-element in the vP-edge of the first conjunct bears accusative case while the one in the edge of the vP in the second conjunct bears nominative:

144 | 6 In Defense of Forked chains

CP

(128) DP

CC

[wh,G:neut,uC:Nom]

C’ C=hat

C’ v*P

DP

C=hat v*’

[wh,G:neut,uC:Acc]

v*P DP

VP

[wh,G:neut,uC:Nom]

er tDP gehört

die Leute beeindruckt

As can be seen, the item from the second conjunct undergoes movement to SPECCP (which of the two undergoes movement is immaterial, as noted before). chain identification is ambiguous, as before, between the two items in the vP-edges. This is because values on features are not computed for identity but only the features. I follow Citko in assuming that essentially, there is nothing wrong in the cases (126) as far as the syntax is concerned. In other words, I am assuming that a wh-chain can be properly formed. What goes wrong in these examples is that the realizations of the wh-expressions morphologically cannot meet the case requirements of both verbs at the same time. Hence the sentences in (126) are morphologically ill-formed. I express this with the following stipulation: (129)

A movement chain must a.

comprise non-distinct members (i.e. they must be featurally identical)

b.

be headed by a syntactic object which receives an exponent compatible with all lower chain members.

The examples (126) meet condition (129-a) but violate condition (129-b), hence they are ungrammatical. As for the syncretism cases in (127-b), I suggest for concreteness that the inanimate wh-pronoun was (‘what’) of German is negatively specified for oblique case (and probably not specified at all for person and number features, but not gender). The vocabulary item for this pronoun is given in (130): (130)

[D, wh, +neut, -oblique] ↔ was

6.4 The Current Analysis

| 145

The feature specifications of this vocabulary item dictate that it can be inserted into morphemes (terminal nodes) specified for nominative or accusative. So (130) can be inserted in the upper or the lower copy of the wh-item in (128) which bears nominative case, but it can, as a matter of principle, also be inserted in the lower copy in the vP-edge of the first conjunct, meeting both conditions (129-a) and (129-b). The value [-oblique] is supposed to guarantee that the vocabulary item is not inserted in contexts assigned oblique case (like genitive or dative). For the non-syncretic cases, I assume that all other wh-pronouns in German are positively specified for case and hence their insertion is required if the feature specifications allow their insertion by the subset principle.

6.4.3 Splitting up CC 6.4.3.1 Direct ATB A fuller theory of ATB from coordinate structures involves as its part the above derivation. In addition, the coordinating element must obviously be part of the structure. In the following, ideas from Chomsky (2013) will be taken up, who suggests that one coordinand within CC must raise to specifier of Coord after the latter merges with CC. To make things specific, suppose it is the member out of which wh-extraction takes place which raises. (131)

CP who1 C=did

TP

John T

α CoordP Coord

CC

κ1 hwho1 i

κ2 v*P

v*

hwho2 i

VP see

v*P v*

hwho1 i

VP ignore

hwho2 i

146 | 6 In Defense of Forked chains

First, a number of operations take place simultaneously or in parallel. whextraction and the formation of chains need to apply to the symmetric CCrepresentation, because after raising of one of the coordinate members, the coordinate structure is asymmetrical, such that one wh-phrase is closer to the probe and chain formation will detect the higher wh-phrase only. So to ensure ambiguous Minimal Search, CC is the relevant unit for chain formation. Secondly, it must be the case that the familiar asymmetry-effects of coordination are “surfacy” effects, i.e. they result not from CC but the derived CoordP. Thirdly, raising of both the coordinate member and wh-extraction need to take place in parallel at the C-phase level, where all operations are effectuated (cf. Chomsky (2008)). Let me finally address the issues of labeling. κ1 (or κ2 ) are part of a symmetric structure CC and no agreement holds between the two members, just accidental identity of category as it were. Under the current assumptions this means that CC cannot receive the label shared by the two members. A solution is indicated in (131): CC can receive a label if one member moves. In this particular case, CC effectively obtains the label v*, i.e. the label of the category that remains in-situ. After raising κ1 to the sister-of-CoordP position, α needs to receive a label. Again, we have a symmetric structure, in which no element is more prominent than the other. For lack of a better understanding, I will here assume that raised κ1 labels the structure, i.e. T selects a vP. 6.4.3.2 Indirect ATB We know that the distribution of coordinates is the same as the one of its members (cf., among many, Munn (1993)). Chomsky (2013) remarks on this issue: We know what the right answer is: the label is not Conj but rather the label of [the raised coordinate], typically shared with [the in-situ coordinate]; if the coordinated expressions are APs, then [α] is an AP, etc. It follows that Conj and the construction [ConjP] that it heads are not available as a label, so that [α] receives the label of [the raised coordinate].

The remarks make sense but it is unclear how to get to the result. To complicate matters, in Chomsky (2013, fn. 40) it reads: “The assumption under consideration is that although [Coord] is not a possible label, it must still be visible for determining the structure. Otherwise [both coordinates] would be equally prominent.” Let me rephrase the problem: CC, say {vP, vP}, is problematically symmetric hence requires raising of one member to yield {vP, CoordP}, where CoordP={Coord, {hvPi, vP}}. The output of this raising needs to be labeled by the raised vP, as we know that such units have the distribution of vPs. This requires Coord to be unavailable for labeling. However, if Coord is unavailable for labeling, we must still guarantee that the {vP, CoordP} is not equivalent to CC as far as labeling is concerned

6.4 The Current Analysis

| 147

(which would effectively reiterate the original symmetry problem!). How can we effectuate the unavailable-but-visible property? Which feature is it at all that Coord bears? I will here consider an analysis that uses an idea recently suggested by Ott (2011b). Based on the syntax of free relatives, he argues that if a phase head carries no features necessary for subsequent computation, such as free relative C after feature inheritance, it must be removed from the workspace by transfer along with TP, in order to comply with Full Interpretation. From the current perspective, and maybe more generally, coordination represents a comparable state of affairs: Coord, while being lexically interpretable, bears no features which are needed for subsequent derivational steps (cf. Zhang (2010, 65)) – it is not selected and has no grammatical function for the ongoing derivation. Thus in α it is CoordP as a whole which gets removed by transfer (not just Coord’s complement), leaving only v*P available for subsequent selection (say, by T). Thus the current conception of coordination does not resort to either adjunction of a boolean phrase the first conjunct (Munn 1993) which faces independent problems, nor does it employ stipulated feature percolation by the first conjunct to account for the distribution of conjoined phrases (pace Zhang (2010)). Under the current view, the set forming operation Merge and transfer suffice to yield the correct outcome. (132) who1

α CoordP Coord

CC

v*P hwho1 i

v*P v*’

v*

who2 VP

see

v*’ v*

hwho1 i

VP ignore

hwho2 i

CoordP undergoes transfer as a whole – analogous to CP in free relatives –, leaving only the raised constituent available for subsequent selection, the right result. Although conceptually appealing, the idea might not be compatible with current

148 | 6 In Defense of Forked chains

considerations because of the additional cycle which gets introduced, rendering the symmetric CC unrecoverable at the C-phase level.

6.5 Open Issues and Remaining Questions There are a number of questions which remain open, given the analysis developed here. In particular, I have not accounted for reconstruction effects, island phenomena and multiple wh-fronting phenomena. However, I believe that this does not point to a principled defect of the analysis I have suggested. Rather I think that a proper understanding of these constructions is still pending. Reconstruction As noted in the exposition of the asymmetric analyses, there are reconstruction asymmetries with respect to the first and the last conjunct. Thus (133-a) exhibits a Condition C effect induced by the coreferential pronoun in the first conjunct, while no such Condition C effect shows up in (133-b), where the coreferential pronoun is the subject of the last conjunct: (133)

a. *Which picture of Johni did hei like and Mary dislike? b. Which picture of Johni did Mary like and hei dislike?

Likewise, (134-a) is accounted for if the complex wh-expression including the anaphor is reconstructed in the first conjunct, while, apparently, reconstruction is obviated in the second conjunct as (134-b) suggests: (134)

a. Which pictures of himselfi did Johni buy and Mary paint? b. *Which pictures of herselfj did John buy and Maryj paint? from Munn (1993, 52)

The picture is not so clear, however. In Citko (2011, 220, fn. 17), data like (135-a) and (135-b) are given: (135)

a. b.

Which picture of Johni do you think hei liked and Mary disliked? Which picture of Johni do you think Mary liked and hei disliked?

She comments that these examples feature “an extra level of embedding, which could decide whether the issue with [(134-a)/(134-b)] is reconstruction into the first conjunct, or some kind of proximity effect. The fact that they are possible (on the coindexed reading) suggest that the latter is more likely to be the case.”

6.5 Open Issues and Remaining Questions

| 149

Effectively, then, reconstruction data appear not to be decisive and do not provide conclusive evidence for a structural asymmetry, which treats the second conjunct as somehow special: both conjuncts might still involve extraction, i.e. ATB. Based on the independent evidence I have given in favor of forked chains, which strongly suggest a structural symmetry between the first and the second conjunct, reconstruction asymmetries must receive an account different from the asymmetric ones I have discussed. Islandhood Sensitivity to islands has been claimed to provide clues with respect to the question whether extraction from the first or the second conjunct is genuine or not. I will comment directly on the claims in the literature, because the picture is far from clear already at the observational level. Anticipating my view, the argument is at present ill-suited for strong conclusions with respect to the true extraction site and the issue requires more systematic and thorough investigation. According to Ha (2008, 256), extraction from the second conjunct is real. He gives the following examples to make the point: (136)

a. *Who did JOHN THINK, but BILL WONDER went to Prof. Williams’ talk last night? b. Who did JOHN WONDER, but BILL THINK went to Prof. Williams’ talk last night?

In (136-a) wh-extraction of a subject crosses an interrogative island in the second conjunct and the result is bad. In the first conjunct, the subordinate clause introduced by a declarative embedding predicate has undergone ellipsis, which is understood semantically parallel to the subordinate clause in the second conjunct. In (136-b), the situation is reversed: In the first conjunct, wonder embeds an interrogative island, which however, is elided and which is semantically parallel to the pronounced subordinate clause in the second conjunct. The latter, in turn, is selected by think. Ha interprets the examples as showing that the crossing of the interrogative island in (136-b) is repaired by ellipsis. That the island condition holds in (136-a) indicates that true extraction takes place from the second conjunct, or so Ha claims. The judgments are taken over from his work. A problem this sort of argument encounters has to do with the observations: Native speakers of English I have consulted vary greatly with respect to the judgments of the above sentences from outright supporting the ones that Ha gives, over rejecting both sentences, to reversing the pattern: One speaker judged (136-b) as worse than (136-a) – he had strong difficulties parsing (136-b) beyond wonder, while he felt that (136-a) fared better in comparison (still not great).

150 | 6 In Defense of Forked chains

The uncertainty regarding the factual situation is corroborated by examples Zhang (2010, 226) gives. She presents (137-a) and (137-b) to make the exact opposite point:²⁸ (137)

a. *Who did Bill lose business because he hired and Mary praise a lot? b. Who did Bill praise a lot and Mary lose business because she hired?

Here wh-movement crosses an adjunct island in the first conjunct in (137-a) and none in the second conjunct. The result is ill-formed. Reversing the the order of the conjuncts results in an improvement. Zhang argues that this shows that real extraction takes place from the first conjunct, but none in the second. Again, as with the reconstruction effects, the evidence in favor of an asymmetry between the first and the second conjunct is far from conclusive. The evidence certainly does not speak against the idea of forked chains or a symmetric treatment respectively. Multiple wh-Fronting and Lower Copy Pronunciation An (2008) makes an interesting observation regarding ATB in Slavic languages. As is very well-known, Slavic languages form multiple-wh-questions by employing obligatory multiple wh-fronting (138)/(139). An provides data by Bošković (2002) from Romanian, Bulgarian, Russian and Serbo-Croatian, all of which pattern the same with regard to the properties described here. I confine myself to the representative Serbo-Croatian and Romanian: (138)

(139)

a.

Ko šta kupuje? who what buys ‘Who buys what?’ b. *Ko kupuje šta? who buys what

Cine ce precede? who what precedes ‘Who precedes what?’ b. *Cine precede ce? who precedes what

Serbo-Croatian

a.

Romanian

28 Marc Richards (p.c.) judges (137-a) just about as good or as bad as extraction over any kind of because-adjunct clause, i.e. in his view not too bad. However, an asymmetry between (137-a) and (137-b) is detectable.

6.5 Open Issues and Remaining Questions

| 151

As is fairly well-known, multiple wh-movement becomes impossible once the fronted wh-elements are homophonous as (140-a)/(141-a) show. Interestingly, in this case one of the wh-words is pronounced in its in-situ position (140-b)/(141-b). (138-b)/(139-b) show that this is impossible whenever the wh-words differ with respect to their form: (140)

(141)

a. *šta šta uslovljava? what what conditions ‘What conditions what?’ b. šta uslovljava šta? what conditions what a. *Ce ce precede? what what precedes ‘Who precedes what?’ b. Ce precede ce? who precedes what

Serbo-Croatian

Romanian

Bošković (2002) famously puts forth an analysis for these cases in which syntactically, multiple wh-fronting uniformly takes place for all the cases mentioned. To avoid linear adjacency of the phonologically identical elements, PF makes different choices with respect to which copies to pronounce. All fronted copies are pronounced whenever the fronted categories differ phonologically. However, once they are phonologically identical, as in (140-b)/(141-b), one of the upper copies obligatorily gets phonologically deleted to prevent a crash at PF to the effect that the complement copy in the low position is pronounced. This is sketched in (142): (142)

[ šta subj [šta obj uslovljava štaobj ]]

To corroborate the claim that overt multiple wh-movement takes place even in cases like (140-b)/(141-b), Bošković shows that parasitic gaps can be licensed by an wh-object pronounced in its thematic position. The fact that licensing of the gap is impossible in a non-multiple-wh-fronting language like English (144), in turn, suggests that overt movement is part of the licensing of parasitic gaps. It would follow then, that Slavic provides evidence for overt movement and lower copy spell-out: (143)

(144)

Ce precede ce fˇarˇa sˇa influenţeze? what precedes what without sbjv-ptcl influence.3sg ‘What precedes whati without influencing iti ?’ *Who bought what after finding.

Romanian

152 | 6 In Defense of Forked chains

Now An observes that multiple ATB-fronting is possible in Slavic: (145)

Cine ce a spart şi a distrus? who what has broken and has destroyed ‘Who broke what and destroyed it?’

Romanian

As before, homophony of the fronted wh-words yields deviance: (146)

*Ce ce a spart şi a distrus? what what has broken and has destroyed

The strategy Slavic uses in this situation is to realize the phonologically identical copy in the base position of the second conjunct: (147)

Ce a spart şi a distrus ce? what has broken and has destroyed what

I will not try to analyze these phenomena here as an analysis of multiple whfronting phenomena exceeds the confines of this book. Suffice it to say that they raise further questions for asymmetric analyses of ATB: why does a pronounced instance of one of the wh-words appears in the second conjunct? While I cannot here provide a detailed treatment of these phenomena, I do not see a principled reason for why the analysis I have proposed should not accommodate them.

6.6 Summary In this chapter I have given a number of empirical arguments for FCs in ATBmovement and shown why the data represent problems for previous accounts of ATB, which seek to dispense with FC. Secondly, I have given an analysis of ATB which involves FCs. As a way to implement FC, I have shown that a recent idea by Martin and Uriagereka (2011) can fruitfully be employed, and idea which also abides by basic tenets of minimalist desiderata: Internal Merge is a composite operation involving Minimal Search as one of its components, used to identify the copy and the chain.

7 Summary and Outlook On the theoretical side, I have in this book provided arguments for a symmetric nature of basic properties of phrase structure as derivable by Merge and the Third Factor principle Minimal Search. Thus the component of the grammar responsible for hierarchical structure building, Merge (both external and internal), does not yield a labeled set (and concomitantly, specifiers) per se. Instead, labeling of such structures is derivative of a labeling algorithm which detects the most prominent element within the set at the phase level, guided by Minimal Search (Chomsky (2008, 2013)). ¯ On the empirical side, I have first shown that successive cyclic A-movement can derive from the iterated need to label structures of the form {XP, YP} in the edges of phases. The cyclic and recursive application symmetry breaking movement yields the pattern we see in various phenomena such as long-distance wh-interrogatives, long-distance relativization or long-distance comparative operator movement. I have shown that this treatment of long distance dependencies is parsimonious and employs fewer technical means than a number of previous ¯ analyses of successive-cyclic A-movement. In addressing the “halting problem” evident from indirect questions, which exhibit the form {DP, CP}, I discuss two approaches of how to label such structures. One essentially stipulates that the probe-bearing head is the label (Boeckx (2008a)), so that the structure receives the label CP by virtue of C bearing a Q-probe. Another approach says that in {DP, CP}=α, a feature which D and C have in common becomes the label of α (Chomsky (2013)); the feature in question is Q. In combination with the condition XIM, I show that the latter option is to be preferred as it derives Criterial Freezing without further assumptions: If α’s label is Q, α’s terms DP and CP cannot move in that they are intermediate categories in that derivational cycle. As part of that chapter, I develop a new analysis of Hebrew “nested interrogatives,” showing that an interplay between feature inheritance of Q, Criterial Freezing and symmetrybreaking movement by and large suffices to describe the construction. Finally, I suggest a novel approach to ATB constructions in coordination, based on an analogy between “shared labels” and “shared launching sites.” I discuss previous approaches to ATB, which deny the existence of “forked chains” (Ross (1967), Williams (1977)) and demonstrate that they all fall short of accounting for various empirical properties of ATB, some of which are originally observed within this book. Among the novel observations is the existence of remnant ATB, which poses a problem for most analyses discussed here. Having shown that forked chains is indispensable for ATB, I offer a new implementation of ATB: ATB involves a parallelism condition, which applies at the phase level (Kasai (2004)), which demands DOI 10.1515/9783110522518-007

154 | 7 Summary and Outlook

that the structures before the actual ATB-step be parallel. ATB then is movement of a single instance of the phrase in question, which goes hand in hand by Minimal Search for the displaced element (Martin and Uriagereka (2011)). Search detects both members in the parallel structure to form a chain (“forked chains”). I propose that the phenomenon of remnant ATB receives a treatment in terms of movement of one of the two remnant categories. Search for the identical element then must ignore the specific character of the trace, providing additional support for the idea of “trace invisibility” (Chomsky (2000)). Within an approach to phrase structure without specifiers, there are numerous empirical and conceptual issues that are still pending. Concerning syntax proper, questions arise like: How can XP-YP-structures be labeled without movement and without seeming “sharing” of features in the sense described above? Pertinent cases are VP-internal subjects in languages like German or Japanese, cf. also Ott (2011a), Chomsky (2013) for these and many more problems. Concerning the interpretation of syntactic structures at the interfaces, a problem is how to linearize stable structures of the form {XP, YP}. How do we know which phrase linearly precedes which? The issue is most pressing for base-generation of the two terms. If XP Merges with YP and is internal to YP, we know that the prevalent case is XP>YP, i.e. the filler precedes the gap. But how exactly is this formalized? And how can we account for the exceptions (e.g. remnant movement and the like)? These and many more issues besides need to be addressed in work to come.

Bibliography Abels, K. (2003). Successive Cyclicity, Anti-locality, and Adposition Stranding. Ph. D. thesis, University of Connecticut. Abels, K. (2007). Towards a restrictive theory of (remnant) movement. Volume 7 of Linguistic Variation Yearbook, pp. 53–120. Abels, K. (2012). Phases. An essay on cyclicity in syntax. Berlin: de Gruyter. Adger, D. and G. Ramchand (2005). Merge and Move: Wh-Dependencies Revisited. Linguistic Inquiry 36(2), pp. 161–193. Alexiadou, A. and E. Anagnostopoulou (2001). The subject-in-situ generalization, and the role of Case in driving computations. Linguistic Inquiry 32(2), pp. 193–231. Alexiadou, A. and E. Anagnostopoulou (2007). The subject-in-situ generalization revisited. In H.-M. Gärtner and U. Sauerland (Eds.), Interfaces + Recursion = Language?, pp. 31–60. Mouton de Gruyter. An, D.-H. (2008). Lower copy pronounciation and multiple wh-fronting in ATB contexts. Volume 3 of Nanzan Linguistics: Special Issue, pp. 1–15. Barbiers, S., O. Koeneman, and M. Lekakou (2009). Syntactic doubling and the structure of wh-chains. Journal of Linguistics 45, pp. 1–46. Barss, A. (1986). Chains and Anaphoric Dependence: On Reconstruction and Its Implications. Ph. D. thesis, MIT, Cambridge, Massachusetts. Barss, A. (2001). Syntactic reconstruction effects. In M. Baltin and C. Collins (Eds.), The Handbook of Syntactic Theory, pp. 670–696. Oxford: Blackwell. Bayer, J. (1984). Comp in bavarian syntax. The Linguistic Review 3, pp. 209–274. ¯ Bayer, J. (2006). A note on targets of A-movement in the left periphery of German sentences. In P. Brandt and E. Fuss (Eds.), Form, Structure, and Grammar. A Festschrift Presented to Günther Grewendorf on Occasion of his 60th Birthday, pp. 119–128. Berlin: AkademieVerlag. Bayer, J. (2010). Discourse particles in questions. In R. Mohanty and M. Menon (Eds.), Proceedings of GLOW in ASIA VII. Hyderabad: EFL University Press. Bayer, J. and E. Brandner (2008). On wh-head-movement and the Doubly filled Comp Filter. In Proceedings of WCCFL 26, pp. 87–95. Somerville, MA.: Cascadilla Press. Benmamoun, E., A. Bhatia, and M. Polinsky (2009). Closest conjunct agreement in head final languages. Number 9 in Linguistic Variation Yearbook, pp. 67–88. Bhatt, R. and M. Walkow (2013). Locating Agreement in Grammar: An argument from Agreement in Conjunctions. Natural Language and Linguistic Theory 31, pp. 951–1013. Bloomfield, L. (1933). Language. New York: Henry Holt and Co. Blümel, A. (2012). Successive Cyclic Movement as Recursive Symmetry-Breaking. In N. Arnett and R. Bennett (Eds.), Proceedings of the 30th West Coast Conference on Formal Linguistics, pp. 87–97. Blümel, A. (2017). Exocentric Root Declaratives: Evidence from V2. In L. Bauke and A. Blümel (Eds.), Labels and Roots. Mouton de Gruyter. to appear. Boeckx, C. (2003). Islands and Chains: Resumption as Stranding. Amsterdam: John Benjamins. Boeckx, C. (2008a). Bare Syntax. Oxford: Oxford University Press. Boeckx, C. (2008b). Understanding Minimalist Syntax: Lessons from Locality in Long-Distance Dependencies. Oxford: Blackwell.

DOI 10.1515/9783110522518-008

156 | Bibliography

Boeckx, C. (2009). On the Locus of Asymmetry in UG. Catalan Journal of Linguistics 8, pp. 41– 53. Boeckx, C. (2014). Elementary syntactic structures. Cambridge: Cambridge University Press. Boeckx, C. and K. Grohmann (2007). Putting Phases in Perspective. Syntax 10(2), pp. 204–222. Boeckx, C. and N. Hornstein (2010). The varying aims of linguistic theory. In J. Briemont and J. Franck (Eds.), Chomsky Notebook, pp. 115–141. Columbia University Press. Boef, E. (2012). Doubling in relative clauses. Aspects of morphosyntactic microvariation in Dutch. Ph. D. thesis, Meertens Instituut (KNAW)/ Universiteit Utrecht. Borer, H. (1984). Restrictive relatives in Modern Hebrew. Natural Language and Linguistic Theory 2, pp. 219–260. Borsley, R. D. (2013). On the nature of Welsh unbounded dependencies. Lingua 133, pp. 1–29. Bošković, Ž. (2002). On multiple wh-fronting. Linguistic Inquiry 33(3), pp. 51–83. Bošković, Ž. (2007). On the locality and motivation of Move and Agree: An even more minimal theory. Linguistic Inquiry 38(4), pp. 589–644. Bošković, Ž. (2011). Last resort with Move and Agree in derivations and representations. In C. Boeckx (Ed.), The Handbook of Linguistic Minimalism. Oxford University Press. Bowers, J. (1993). The syntax of predication. Linguistic Inquiry 24(4), pp. 591–656. Cable, S. (2007). The Grammar of Q: Q-Particles and the Nature of Wh-Fronting, as Revealed by the Wh-Questions of Tlingit. Ph. D. thesis, MIT. Cable, S. (2010). Against the Existence of Pied-Piping: Evidence from Tlingit. Linguistic Inquiry 41, pp. 563–594. Cecchetto, C. and C. Donati (2008). On Labeling. In Studies in Linguistics, Volume 1 of CISC Working Papers, University of Siena, pp. 16–38. Chaves, R. P. (2007). Coordinate Structures - Constraint-Based Syntax-Semantics Processing. Ph. D. thesis, University of Lisbon. Chaves, R. P. (2012). On the grammar of extraction and coordination. Natural Language and Linguistic Theory 30.2, pp. 465–512. Cheng, L. (1991). On the Typology of Wh-Questions. Ph. D. thesis, MIT. Chomsky, N. (1957). Syntactic structures. The Hague: Mouton. Chomsky, N. (1965). Aspects of the Theory of Syntax. M.I.T., Cambridge, Mass. Chomsky, N. (1970). Remarks on nominalization. In R. A. Jacobs and P. S. Rosenbaum (Eds.), Reading in English Transformational Grammar. Waltham: Ginn. Chomsky, N. (1973). Conditions on Transfomations. In S. R. Anderson and P. Kiparsky (Eds.), A Festschrift for Morris Halle, pp. 232–286. New York: Holt, Rinehart and Winston. Chomsky, N. (1981a). Knowledge of language: Its elements and origins. Philosophical Transactions of the Royal Society of London B: Biological Sciences 295(1077), pp. 223–234. Chomsky, N. (1981b). Lectures on Government and Binding: The Pisa Lectures. Mouton de Gruyter. Chomsky, N. (1982). Some Concepts and Consequences of the Theory of Government and Binding. Cambridge, Mass.: MIT Press. Chomsky, N. (1986). Barriers. Cambridge, MA: M.I.T. Press. Chomsky, N. (1993). A Minimalist Program for Linguistic Theory. In K. Hale and S. J. Keyser (Eds.), The View from Building 20. Essays in Linguistics in Honor of Sylvain Bromberger. Cambridge: MIT Press. Cambridge: MIT. Chomsky, N. (1995a). Bare phrase structure. In G. Webelhuth (Ed.), Government and Binding Theory and the Minimalist Program. Cambridge, MA: MIT Press. Chomsky, N. (1995b). The Minimalist Program. M.I.T., Cambridge, Mass.

Bibliography

| 157

Chomsky, N. (2000). Minimalist inquiries: The framework. In R. Martin, D. Michaels, and J. Uriagereka (Eds.), Step By Step: Essays In Syntax in Honor of Howard Lasnik, pp. 89– 155. MIT Press. Chomsky, N. (2001). Derivation by phase. In M. Kenstovicz (Ed.), Ken Hale: A Life in Language. Cambridge, MA: MIT Press. Chomsky, N. (2004). Beyond explanatory adequacy. In A. Belletti (Ed.), Structures and Beyond, pp. 104–131. Oxford: Oxford University Press. Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry 36(1), pp. 1–22. Chomsky, N. (2007). Approaching UG from Below. In U. Sauerland and H.-M. Gärtner (Eds.), Interfaces + Recursion = Language?: Chomsky’s Minimalism and the View from SyntaxSemantics. Berlin, New York: Mouton de Gruyter. Chomsky, N. (2008). On phases. In C. P. Otero, R. Freidin, and M. L. Zubizarreta (Eds.), Foundational Issues in Linguistics, pp. 133–166. Cambridge, Mass.: MIT Press. Chomsky, N. (2010). Restricting stipulations: Consequences and challenges. Talk given at the University of Stuttgart, March 27th 2010. URL http://infostream.rus.uni-stuttgart.de/lec/534/3637/aufzeichnung.flv. Chomsky, N. (2011). Problems of Projection. Talk given at the University of Leiden, March 14th 2011. URL http://chomsky.nl/component/content/article/137-lezing-14-maart-van-noamchomsky-in-leiden-over-syntax. Chomsky, N. (2013). Problems of projection. Lingua 130, pp. 33–49. Chomsky, N. (2015). Problems of projection, extensions. In E. D. Domenico, C. Hamann, and S. Matteini (Eds.), Structures, Strategies and Beyond: Studies in honour of Adriana Belletti, Volume 223. Linguistik Aktuell/Linguistics Today. Chung, S. (1994). Wh-Agreement and “Referentiality” in Chamorro. Linguistic Inquiry 25(1), pp. 1–44. Citko, B. (2005). On the Nature of Merge: External Merge, Internal Merge, and Parallel Merge. Linguistic Inquiry 36(4), pp. 475–497. Citko, B. (2011). Symmetry in Syntax: Merge, Move and Labels. Cambridge: Cambridge University Press. Clements, G., J. McCloskey, J. Maling, and A. Zaenen (1983). String-Vacuous Rule Application. Linguistic Inquiry 14(1), pp. 1–17. Collins, C. (1993). Topics in Ewe syntax. Ph. D. thesis, MIT. Collins, C. (2002). Eliminating labels. In S. D. Epstein and T. D. Seely (Eds.), Derivation and explanation in the minimalist program, pp. 42–64. Oxford: Blackwell. Curry, H. B. (1961). Some logical aspects of grammatical structure. Structure of Language and Its Mathematical Aspects, Proceedings of the Twelfth Symposium in Applied Mathematics, pp. 56–68. Providence, R. I.: American Mathematical Society. den Besten, H. and G. Webelhuth (1990). Stranding. In G. Grewendorf and W. Sternefeld (Eds.), Scrambling and Barriers. NY: Academic Press. den Dikken, M. (2009). On the nature and distribution of successive cyclicity. Paper presented at NELS 40, MIT, November 2009. Epstein, S., H. T. Kitahara, and D. Seely (2009). The Necessity, but Invisibility of Counter-Cyclic Outputs: Deducing Extraction Constraints and Transfer-Application from 3rd Factor Conditions on Language Design. 2009 GLOW Newsletter.

158 | Bibliography

Epstein, S. D., H. T. Kitahara, and D. Seely (2014). Labeling by Minimal Search: Implications for Successive Cyclic A-Movement and the Conception of the Postulate “Phase”. Linguistic Inquiry (45), pp. 463–481. Epstein, S. D., H. T. Kitahara, and D. Seely (2016). Phase cancellation by external pair-merge of heads. The Linguistic Review 33(1), pp. 87–102. Epstein, S. D. and D. Seely (2002). Rule application as cycles in a level-free syntax. In Explanation and derivation in the minimalist program, pp. 65–89. Oxford: Blackwell. Fanselow, G. and D. Çavar (2002). Distributed deletion. In A. Alexiadou (Ed.), Theoretical Approaches to Universals, pp. 65–107. Amsterdam: Benjamins. Felser, C. (2003). Wh-Copying, Phases, and Successive Cyclicity. Manuscript, Department of Language & Linguistics, University of Essex. Felser, C. (2004). Wh-copying, phases, and successive cyclicity. Lingua 114, pp. 543–574. Fernández-Salgueiro, G. (2008). Deriving the CSC and Unifying ATB and PG Constructions through Sideward Movement. Proceedings of the 26th WCCFL, pp. 156–162. Fox, D. (1998). Economy and Semantic Interpretation. Ph. D. thesis, MIT. Gallego, A. (Ed.) (2012). Phases: Developing the Framework. Mouton de Gruyter. Gallego, A. and J. Uriagereka (2008). Freezing effects. Talk given at eh XVIII Colloquium on Generative Grammar, Universidade de Lisboa, Lisboa (Portugal), 17-19 April 2008. Gazdar, G. (1981). Unbounded Dependencies and Coordinated Structure. Linguistic Inquiry 12(2), pp. 155–184. Gazdar, G., E. Klein, G. Pullum, and I. Sag (1985). Generalized Phrase Structure Grammar. Basil Blackwell. George, L. (1980). Analogical generalization in natural language syntax. Ph. D. thesis, MIT. Georgopoulos, C. (1983). Trace and resumptive pronouns in Palauan. In A. Chukerman, M. Marks, and J. F. Richardson (Eds.), Papers from the Nineteenth Regional Meeting, Chicago: Chicago Linguistic Society, pp. 134–145. Georgopoulos, C. (1985). Variables in palauan syntax. Natural Language and Linguistic Theory 3, pp. 59–94. Goodall, G. (1987). Parallel structures in syntax: coordination, causatives and restructuring. Cambridge University Press. Grewendorf, G. (2002). Minimalistische Syntax. Francke Verlag, Tübingen. Grewendorf, G. (2003). Improper remnant movement. Gengo Kenkyu 123, pp. 47–94. Grewendorf, G. (2012). The internal structure of wh-elements and the diversity of whmovement. Manuscript University of Frankfurt. Grewendorf, G. and J. Kremers (2009). Phases and cycles. Some problems with Phase Theory. The Linguistic Review 26:4, pp. 385–430. Grimshaw, J. (1991). Extended projections. Manuscript Brandeis University. Grosu, A. (1973). On the nonunitary nature of the coordinate structure constraint. Linguistic Inquiry (4), pp. 88–92. Ha, S. (2008). Ellipsis, Right Node Raising and Across-the-Board Constructions. Ph. D. thesis, Boston University. Haegeman, L. (1994). Introduction to Government and Binding Theory. Blackwell Textbooks in Linguistics. Hale, K. and S. J. Keyser (2002). Prolegomenon to a Theory of Argument Structure. Cambridge: MIT Press. Halle, M. and A. Marantz (1993). Distributed Morphology and the Pieces of Inflection. In K. Hale and S. J. Keyser (Eds.), The View from Building 20, pp. 111–176. MIT Press, Cambridge.

Bibliography

| 159

Hartman, J. (2011). (Non-)intervention in A-movement – Some cross-constructional and crosslinguistic considerations. Volume 11 of Linguistic Variation, pp. 121–148. Hasegawa, N. (2005). The EPP Materialized First, Agree Later: Wh-Questions, Subjects and Mo‘also’-Phrases. Scientific Approaches to Language 4, pp. 33–80. Hasegawa, N. (2008). Wh-Movement in Japanese: Matrix Sluicing is different from Embedded Sluicing. Number 4 in The proceedings of the Workshop on Altaic Formal Linguistics, pp. 63–74. MITWPL, MIT. Heck, F. (2004). A Theory of Pied-Piping. Ph. D. thesis, Universität Tübingen. Heim, I. and A. Kratzer (1998). Semantics in Generative Grammar. Blackwell. Henry, A. (1995). Belfast English and Standard English: Dialect variation and parameter setting. OUP: Oxford. Heycock, C. and R. Zamparelli (2000). Friends and Colleagues: Plurality and NP-Coordination. Proceedings of the North East Linguistic Society 30. Hiraiwa, K. (2001). Multiple Agree and the Defective Intervention Constraint in Japanese. pp. 67–80. Cambridge, MA.: MITWPL. The Proceedings of the MIT-Harvard Joint Conference (HUMIT 2000) MITWPL #40. Hiraiwa, K. (2010). Scrambling to the Edge. Syntax 13.2(2), pp. 133–164. Holmberg, A. and T. Hróarsdóttir (2005). Agreement and movement in Icelandic raising constructions. Lingua 114, pp. 651–673. Hornstein, N. and J. Nunes (2002). On Asymmetries between Parasitic Gap and Across-theBoard Constructions. Syntax 5(1), pp. 26–54. Hornstein, N., J. Nunes, and K. Grohmann (2006). Understanding Minimalism. Cambridge University Press, Cambridge. Idsardi, W. J. and E. Raimy (2010). Three types of linearization and the temporal aspects of speech. In T. Biberauer and I. Roberts (Eds.), Principles of linearization. Berlin: Mouton de Gruyter. Jackendoff, R. (1977). X-bar-Syntax: A Study of Phrase Structure. Linguistic Inquiry Monograph 2. Cambridge, MA: MIT Press. Jiang, L. J. (2008). SUO in Chinese and Phase Edges. Manuscript Harvard University. Kasai, H. (2004). Two Notes on ATB Movement. Language and Linguistics 5(1), pp. 167–188. Katzir, R. and A. Bachrach (2009). Right-Node Raising and Delayed Spellout. In K. K. Grohmann (Ed.), InterPhases: Phase-Theoretic Investigations of Linguistic Interfaces. Oxford University Press. Kayne, R. S. (1984). Connectedness and Binary Branching. Foris Publications, Dordrecht. Kayne, R. S. (1994). The Antisymmetry of Syntax. M.I.T., Cambridge, Mass. Kayne, R. S. (2008). Antisymmetry and the Lexicon. In Linguistic Variation Yearbook, Volume 8, pp. 1–31. Kayne, R. S. and J.-Y. Pollock (1978). Stylistic Inversion, Successive Cyclicity, and Move NP in French. Linguistic Inquiry 9(4), pp. 595–621. Kiziak, T. (2007). Long Extraction or Parenthetical Insertion? Evidence from Judgement Studies. In N. Dehé and Y. Kavalova (Eds.), Parentheticals, pp. 121–144. Benjamins, Amsterdam/ Philadelphia. Larson, B. (2016). The representation of syntactic action at a distance: Multidominance versus the Copy Theory. Glossa: a journal of general linguistics 1(1), pp. 1–18. Lasnik, H. (1995). Case and expletives revisited: On greed and other human failings. Linguistic Inquiry 26(4), pp. 615–633.

160 | Bibliography

Lasnik, H. and M. Saito (1992). Move Alpha: Conditions on Its Application and Output. MIT Press. Legate, J. (2003). Some interface properties of the phase. Linguistic Inquiry 34(3), pp. 506– 516. Marantz, A. (1997). No Escape from Syntax: Don’t Try Morphological Analysis in the Privacy of Your Own Lexicon. In Working Papers in Linguistics, Philadelphia, Proceedings of the 21st Penn Linguistics Colloquium, pp. 201–225. Marantz, A. (2013). Locality Domains for Contextual Allomorphy across the Interfaces. In O. Matushansky and A. Marantz (Eds.), Distributed Morphology Today, pp. 39–58. Cambridge, MA: MIT Press. Martin, R. and J. Uriagereka (2011). Chains in Minimalism. Manuscript based on a talk given at the workshop The Minimalist Program: Quo Vadis? - Newborn, Reborn, or Stillborn? in Potsdam. McCloskey, J. (2000). Quantifier Float and wh-movement in an Irish English. Linguistic Inquiry 31(1), pp. 57–84. McCloskey, J. (2002). Resumption, successive cyclicity, and the locality of operations. In S. D. Epstein and D. Seely (Eds.), Derivation and explanation in the minimalist program, pp. 184–226. Oxford: Blackwell. Merchant, J. (2001). The syntax of silence: Sluicing, islands, and the theory of ellipsis. Oxford University Press: Oxford. Miyagawa, S. (2010). Why agree? Why move? Unifying agreement-based and discourseconfigurational languages. Cambridge, MA:MITPress. Moro, A. (2000). Dynamic Antisymmetry. M.I.T., Cambridge, Mass. Moro, A. (2007). Some notes on unstable structures. Ms., Università Vita-Salute San Raffaele/Harvard University. Müller, G. (1996). A Constraint on Remnant Movement. Natural Language and Linguistic Theory 14, pp. 355–407. Müller, G. (1998). Incomplete Category Fronting. A Derivational Approach to Remnant Movement in German, Volume 42 of Studies in Natural Language and Linguistic Theory. Dordrecht/Boston/London: Kluwer. Müller, G. (2010). On Deriving CED Effects from the PIC. Linguistic Inquiry 41(1), pp. 35–82. Müller, G. (2011). Constraints on Displacement. A Phase-Based Approach. Amsterdam/Philadelphia: John Benjamins. Müller, G. and F. Heck (2000). Successive cyclicity, long-distance superiority, and local optimization. In R. Billerey and B. D. Lillehaugen (Eds.), Proceedings of WCCFL, Volume 19, pp. 218–231. Somerville, MA.: Cascadilla Press. Munn, A. (1993). Topics in the Syntax and Semantics of Coordinate Structures. Ph. D. thesis, University of Maryland. Munn, A. (1999). On the identity requirement of ATB movement. Natural Language Semantics 7, pp. 421–425. Muysken, P. (1982). Parametrizing the notion ‘head’. Journal of Linguistic Research 2, pp. 57–75. Narita, H. (2010). Phasing in Full Interpretation. Ph. D. thesis, Harvard University. Nunes, J. (2001). Sideward movement. Linguistic Inquiry 32(2), pp. 303–344. Nunes, J. (2004). Linearization of chains and sideward movement. Cambridge, MA: MIT Press. Ott, D. (2009a). Stylistic fronting as remnant movement. In Working Papers in Scandinavian Syntax, Number 83, pp. 141–178.

Bibliography

| 161

Ott, D. (2009b). The conceptual necessity of phases: Some remarks on the minimalist enterprise. In K. K. Grohmann (Ed.), Explorations of phase theory: Interpretation at the interfaces(=Interface Explorations 17), pp. 253–275. Berlin/New York: de Gruyter. Ott, D. (2011a). Local instability: The syntax of split topics. Ph. D. thesis, Harvard University. Ott, D. (2011b). A note on free relative clauses in the theory of phases. Linguistic Inquiry 42, pp. 183–192. Ott, D. (2013). Review of Angel Gallego (ed.), Phases: Developing the framework (de Gruyter, 2012). Language 89.2, pp. 357–360. Ott, D. (2014). An ellipsis approach to Contrastive Left-dislocation. Linguistic Inquiry 45(2), pp. 269–303. Ott, D. (2015). Symmetric Merge and local instability: Evidence from split topics. Syntax 8(2), pp. 157–200. Ott, D. (2017). Clausal arguments as syntactic satellites: a reappraisal. In L. Bauke and A. Blümel (Eds.), Labels and Roots. Mouton de Gruyter. to appear. Ott, D. and M. de Vries (2014). A biclausal analysis of right-dislocation. In H.-L. Huang, E. Poole, and A. Rysling (Eds.), Proceedings of NELS, Volume 43, pp. 41–54. Amherst, MA: GSLA. Ouali, H. (2009). On C-to-T Phi-Feature Transfer: the nature of Agreement and Anti-Agreement in Berber. In R. D’Alessandro, G. H. Hrafnbjargarson, and S. Fischer (Eds.), Agreement Restrictions, pp. 159–180. Mouton de Gruyter. Pankau, A. (2013). Replacing Copies: The Syntax of Wh-Copying in German. Ph. D. thesis, University of Utrecht. Pesetsky, D. (1982). Paths and Categories. Ph. D. thesis, MIT. Pires, A. and H. L. Taylor (2007). The syntax of wh-in-situ and common ground. In Chicago Linguistic Society Meeting, Volume 43, pp. 201–215. Postal, P. (1971). Cross-over phenomena. New York: Holt, Rinehart, and Winston. Preminger, O. (2006). Nested Interrogatives and the locus of wh. In Y. Falk (Ed.), Proceedings of the 22nd Conference of the Israeli Association for Theoretical Linguistics (IATL 22), Jerusalem. Prinzhorn, M. and V. Schmitt (2010). Discontinuous DP-coordination in German. Number 40 in Linguistic Variation Yearbook, pp. 161–200. Progovac, L. (1998). State-of-the-article. Glot International 3(7), 3–6. Rackowski, A. and N. Richards (2005). Phase edge and extraction: a Tagalog case study. In M. McGinnis and N. Richards (Eds.), Perspectives on Phases, Volume 49 of MIT Working Papers in Linguistics. Raimy, E. (2000). The Phonology and Morphology of Reduplication. Mouton de Gruyter. Reich, I. (2007). From Phases to ATB-Movement. Number 43 in Proceedings from the Annual Meeting of the Chicago Linguistic Society, pp. 217–232. Reis, M. (1995). Wer glaubst du hat recht? On So-called Extractions from Verb-Second Clauses and Verb-First Parenthetical Constructions in German. Sprache und Pragmatik 36, pp. 27–83. Richards, M. (2004). Object shift and scrambling in North and West Germanic: A case study in symmetrical syntax. Ph. D. thesis, University of Cambridge. Richards, M. (2007). On feature inheritance: An argument from the phase impenetrability condition. Linguistic Inquiry 38, pp. 563–572. Richards, M. (2010). Stabilizing Syntax: On instability, optionality, and other indeterminacies. Handout syntax workshop, Stuttgart.

162 | Bibliography

Richards, M. (2012). On feature inheritance, defective phases, movement and morphology. In Ángel Gallego (Ed.), Phases: Developing the framework. Mouton de Gruyter. Richards, N. (2001). Movement in Language: Interactions and Architectures. Oxford University Press. Rizzi, L. (1997). The fine structure of the left periphery. In Elements of Grammar. Kluwer, Dordrecht. Rizzi, L. (2005). On some properties of subjects and topics. Proceedings of the XXX Incontro di Grammatica Generativa. Venezia, Cafoscarina. Rizzi, L. (2006). On the form of chains: Criterial positions and ECP effects. In L. Cheng and N. Corver (Eds.), WH-Movement: Moving On, pp. 97–133. Cambridge, MA: MIT Press. Rizzi, L. (2012). Cartography, criteria, and labeling. Talk given at the EALING 2012 – Blaise Pascal Lectures, Sept 11-13, 2012. Rizzi, L. and U. Shlonsky (2007). Strategies of subject extraction. In H.-M. Gärtner and U. Sauerland (Eds.), Interfaces + Recursion = Language, pp. 115–160. Berlin: Mouton de Gruyter. Rögnvaldsson, E. (1982). We Need (Some Kind of a) Rule of Conjunction Reduction. Linguistic Inquiry 13, pp. 557–561. Ross, J. (1967). Constraints on Variables in Syntax. Ph. D. thesis, MIT. Rudin, C. (1988). On multiple questions and multiple Wh-fronting. Natural Language and Linguistic Theory 6, pp. 445–501. Sabel, J. (1998). Principles and Parameters of Wh-Movement. Habilitationsschrift Frankfurt am Main. Sabel, J. (2000). Partial wh-movement and the typology of wh-questions. In U. Lutz, G. Müller, and A. von Stechow (Eds.), Wh-scope marking, pp. 409–446. Amsterdam/Philadelphia: John Benjamins. Sag, I., G. Gazdar, T. Wasow, and S. Weisler (1985). Coordination and how to distinguish categories. Natural Language and Linguistic Theory 3, pp. 117–171. Saumjan, S. K. and P. A. Soboleva (1963). Applikativnaja porozdajuscaja model’ i iscislenie transformacij v russkom jazyke. Moscow: Izdatel’stvo Akademii Nauk SSSR. Seely, T. D. (2006). Merge, derivational c-command, and subcategorization in a label-free syntax. In C. Boeckx (Ed.), Minimalist Essays. John Benjamins. Shlonsky, U. (2006). Extended projection and CP cartography. Nouveaux cahiers de linguistique francaise 27, pp. 83–93. Sigurðsson, H. A. (2004). Meaningful silence, meaningless sounds. Number 4 in Linguistic Variation Yearbook, pp. 235–259. Simík, R. (2011). The elimination of formal wh-features and a theory of free wh-movement. Manuscript University of Potsdam. Stalnaker, R. (1978). Assertion. In P. Cole (Ed.), Syntax and Semantics, Number 9. New York: Academic Press. Starke, M. (2011). Towards elegant parameters: Language variation reduces to the size of lexically stored trees . Unpublished Ms. Stepanov, A. and P. Stateva (2006). Successive cyclicity as residual wh-scope marking. Lingua 116, pp. 2107–2153. Svenonius, P. (2004). On the Edge. In D. Adger, C. de Cat, and G. Tsoulas (Eds.), Peripheries: Syntactic Edges and their Effects, pp. 261–287. Kluwer. Takahashi, D. (1994). Minimality of Movement. Ph. D. thesis, University of Connecticut, Storrs. Trinh, T. (2009). A constraint on copy deletion. Theoretical Linguistics 35, pp. 183–227.

Bibliography

| 163

van Urk, C. and N. Richards (2015). Two components of long-distance extraction: Successive cyclicity in Dinka. Linguistic Inquiry 46(1), pp. 113–155. Webelhuth, G. (1992). Principles and Parameters of Syntactic Saturarion. Oxford University Press, Oxford. Williams, E. (1977). Across-the-Board Application of Rules. Linguistic Inquiry 8(2), pp. 419– 423. Williams, E. (1978). Across-the-Board Rule Application. Linguistic Inquiry 9(1), pp. 31–43. Williams, E. (1989/1990). The ATB Theory of Parasitic Gaps. The Linguistic Review 6, pp. 265– 279. Wood, J. (2012). Icelandic Morphosyntax and Argument Structure. Ph. D. thesis, NYU. Zeijlstra, H. (2012). There is only one way to agree. The Linguistic Review 29, pp. 491–553. Zhang, N. N. (2004). Against Across-The-Board Movement. Concentric: Studies in Linguistics 30.2, pp. 151–185. Zhang, N. N. (2009). Explaining the Immobility of Conjuncts. Studia Linguistica 64.2, pp. 190– 238. Zhang, N. N. (2010). Coordination in Syntax. Cambridge Studies in Linguistics Series 123. Cambridge: Cambridge University Press. Zwart, J.-W. (2011). Structure and order: asymmetric merge. In C. Boeckx (Ed.), The Oxford Handbook of linguistic minimalism, pp. 96–118. Oxford: Oxford University Press.

Index ¯ X-Theory 3, 10, 12, 59, 75, 94, 129 Akan 92, 93 Arabic (Moroccan) 103 ATB 4, 7, 8, 15, 97–99, 102, 104–111, 114–117, 119, 121, 123–128, 131–133, 135, 138–140, 142, 143, 145, 149, 150, 152–155 Bare Phrase Structure 2, 3, 6, 39, 40, 42, 44, 45, 55, 76, 130 Basque 82 Bavarian German 71, 72, 82 Belfast English 23 Binding Theory 10, 12, 22, 23, 25, 29, 86, 109 Bošković, Željko 16, 37, 38, 58, 73, 151 Boeckx, Cedric 4, 5, 9, 11, 18, 35, 36, 44, 62, 73, 74, 77, 83, 95, 112, 129, 154 Bulgarian 93 C-command 29, 37–40, 47, 52, 54, 55, 58, 99, 102, 119, 130 Cable, Seth 6, 40, 74, 81, 83 Chain 4, 5, 7, 8, 35, 36, 39, 41, 42, 48–51, 55, 56, 60, 74, 80, 95, 97, 98, 107, 117, 120, 122, 127, 128, 133–138, 140, 142–146, 149, 150, 153–155 Chamorro 18 Chomsky, Noam 2–14, 16–19, 22, 24, 28, 29, 35, 37, 40, 42–49, 51–57, 61–63, 67–70, 73–80, 83, 87–90, 94, 96–98, 108, 119, 122, 123, 128–131, 134, 136–138, 142, 145–147, 155 Citko, Barbara 7, 98, 107, 115–119, 121, 124, 144, 149 Coordinate Structure Constraint 8, 101, 104, 105, 111, 112, 114, 121, 131, 132 Criterial Freezing 3, 6, 7, 74, 75, 80–83, 88–90, 92, 94, 96, 134, 154 Dinka 30, 32, 33 Dutch 64

Endocentricity 2, 3, 5, 16, 43, 46–48, 52, 59, 76, 77, 96 English 5, 16, 17, 20, 23, 28, 29, 58, 59, 61, 67, 69, 70, 80, 96, 97, 100–102, 104–110, 116, 117, 119, 121–123, 125, 126, 129, 132, 133, 137, 138, 145, 147–150, 152 Ewe 18 Extended Projection Principle 4, 12, 31, 32, 36, 38, 55, 57, 64, 67, 70, 73 Extension Condition 36, 37 Finnish 82 French 9, 18, 66, 94 Full Interpretation 6, 75, 78, 79, 96, 147 Gaelic 113 German 7, 18, 24, 25, 33, 59, 71, 80, 100, 101, 112–114, 117, 118, 121, 123, 127, 139, 140, 143, 155 Grewendorf, Günther 18, 24, 25, 33, 42, 60, 72, 81, 88, 123, 125 Ha, Seungwan 125, 126, 149, 150 Hasegawa, Nobuko 64–67 Hebrew 6, 7, 18, 80 Hiraiwa, Ken 52 Icelandic 111, 142 Irish 27 Italian 80, 82, 86, 134 Japanese 65, 68, 155 Kasai, Hironobu 8, 132, 133, 154 Kikuyu 18 Kremers, Joost 18, 62, 66, 69 Linear Correspondence Axiom 55 Müller, Gereon 13, 30, 60, 123, 125, 142 Mandarin Chinese 30, 32, 65 Martin, Roger 97, 128, 133, 136, 137, 153

166 | Index

Minimal Search 3–6, 8, 46, 52, 75–79, 95–97, 134, 135, 138, 142, 143, 146, 153–155 Moro, Andrea 55, 74, 76 Multi-dominance 13–15, 41, 49, 115–117, 119, 121, 124, 125 Nunes, Jairo 7, 52, 98, 107, 119, 122, 123, 132, 140 Ott, Dennis 16, 18, 24, 45–48, 50, 56, 59, 61, 70, 74, 95, 101, 102, 131, 147, 155 Palauan 111, 112 Parasitic gaps 86, 108, 109, 114, 115, 152 Phase 4, 5, 8, 16, 18–21, 27, 28, 30, 31, 37–41, 46, 47, 52–55, 57, 58, 61, 62, 66, 68, 70, 73, 74, 76, 78, 81, 86, 87, 92, 106, 123, 132, 154 Polish 114, 117, 118, 143 Preminger, Omer 4, 6, 7, 75, 83–90, 92 Principles and Parameters 2, 9, 10, 12 Probe Label Correspondence Axiom 5, 62, 66, 74 Quantifier Raising 103

Reconstruction 23, 25, 28, 29, 109, 148–150 Richards, Norvin 17, 32–35, 87 Right Node Raising 125, 126, 128 Rizzi, Luigi 3, 64, 74, 80–82, 94, 96, 134 Romanian 151, 152 Serbo-Croatian 151 Sideward Movement 7, 119–125, 140 Subjacency 18 T-Model 10, 22 Transfer 19–21, 37, 38, 41, 44, 47, 52, 53, 61, 77, 78, 131, 147, 148 Tucking-in 87, 88, 91 Uriagereka, Juan 97, 128, 133, 136, 137, 153 van Urk, Coppe 32–35 Weak Crossover 103, 120, 121 Webelhuth, Gert 102, 123 Welsh 111, 114, 115 West Ulster English 23, 24, 26 Zhang, Niina Ning 7, 98, 102, 107–109, 111–114, 128, 136, 147, 150