The Role of Economy Principles in Linguistic Theory 9783050072173, 9783050028972


249 73 21MB

English Pages 298 [312] Year 1996

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Introduction
Expletive Insertion
Wh-Scrambling in the Minimalist Framework
Comparing Reference Sets
Optional Movement and the Interaction of Economy Constraints
Interface Economy: Focus and Markedness
Formal and Substantive Elegance in the Minimalist Program
Economy in Syntax is Projective Economy
Lexical Information from a Minimalist Point of View
A Minimalist Model of Inflectional Morphology
Addresses of Contributors
Recommend Papers

The Role of Economy Principles in Linguistic Theory
 9783050072173, 9783050028972

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Chris Wilder, Hans-Martin Gärtner and Manfred Bierwisch (Eds.) The Role of Economy Principles in Linguistic Theory

studia grammatica Herausgegeben von Manfred Bierwisch unter Mitwirkung von Hubert Haider, Stuttgart Paul Kiparsky, Stanford Angelika Kratzer, Amherst Jürgen Kunze, Berlin David Pesetsky, Cambridge (Massachusetts) Dieter Wunderlich, Düsseldorf

studia grammatica 40

Chris Wilder, Hans-Martin Gärtner and Manfred Bierwisch

T h ü 1 1

R

0

|

e

Qf

EcOnOITiy PrÌlìCÌpleS in Linguistic Theory

Akademie Verlag

Die Deutsche Bibliothek - CIP-Einheitsaufnahme The role of economy principles in linguistic theorie Chris Wilder ... (ed.). - Berlin : Akad. Verl., 1996 (Studia grammatica ; 40) ISBN 3-05-002897-1 NE: Wilder, Chris (Hrsg.]; G T

ISSN 0081-6469 © Akademie Verlag GmbH, Berlin 1997 Der Akademie Verlag ist ein Unternehmen der VCH-Verlagsgruppe. Gedruckt auf chlorfrei gebleichtem Papier. Das eingesetzte Papier entspricht der amerikanischen Norm ANSI Z.39.48 - 1984 bzw. der europäischen Norm ISO TC 46. Alle Rechte, insbesondere die der Übersetzung in andere Sprachen, vorbehalten. Kein Teil dieses Buches darf ohne schriftliche Genehmigung des Verlages in irgendeiner Form - durch Photokopie, Mikroverfilmung oder irgendein anderes Verfahren - reproduziert oder in eine von Maschinen, insbesondere von Datenverarbeitungsmaschinen, verwendbare Sprache übertragen oder übersetzt werden. All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form - by photoprinting, microfilm, or any other means - nor transmitted or translated into a machine language without written permission from the publishers. Druck und Bindung: Druckhaus „Thomas Müntzer" GmbH, Bad Langensalza Printed in the Federal Republic of Germany

Preface Following the appearance of Chomsky's Minimalist Program, much work in generative linguistics has focussed on the role of Economy, both in the sense of a heuristic principle, i.e. one that guides theory construction, and as a constitutive principle in the design of language, i.e. as a property of the object under study. In the latter sense, Economy has previously figured in explanations in phonology, morphology and lexicon theory, in the guise of underspecification hypotheses and the 'elsewhere condition'. The Minimalist Program itself is an attempt to derive major properties of linguistic representations, and of their syntactic mode of construction, from Economy-related principles which interact with morphological properties of lexical items. The papers in this volume represent differing views on the scope and nature of Economy principles in domains of syntax, morphology and the lexicon, offering extensions of and alternatives to the version outlined by Chomsky. They include case studies of specific phenomena as well as discussion of theoretical issues. The at times highly complex empirical discussion ranges across such diverse topics as A'-movement constructions in English, French, German, Japanese and Korean; pleonastic subjects in English and Icelandic; the clitic pronoun system of Spanish; and inverse-linking morphology in Georgian and Sioux. The theoretical issues underlying this discussion include specific aspects such as the role of the theta-criterion, the nature of c-command, aspects of the syntax-morphology and syntax-phonology interfaces. The common theme is the evidence for assuming competition among syntactic or morphological derivations, and the nature of the economy or optimality principles that choose among them. This volume grew out of a conference under the same title which we hosted in Berlin in February 1995, with the generous support of the Max-Planck-Gesellschaft. Speakers at that meeting whose contributions are not included here were Gisbert Fanselow, Jane Grimshaw and Jan Köster. The paper by Gereon Müller, who did not participate in the conference, is included here for its particular relevance to the theme of the book. It essentially complements Wolfgang Sternefeld's paper, a reflection of the long-standing collaboration between these two authors. For their help in the preparation of the manuscript, we would like to thank the individual authors, and also our exemplary editorial assistants, Andrea Johanna Hess and Jutta Romberg. Chris Wilder Hans-Martin Gärtner Manfred Bierwisch

Contents Chris Wilder and Hans-Martin Gärtner Introduction

1

John Frampton Expletive Insertion

36

Günther Grewendorf and Joachim Säbel Jf/;-Scrambling in the Minimalist Framework

58

Wolfgang Sternefeld Comparing Reference Sets

81

Gereon Müller Optional Movement and the Interaction of Economy Constraints

115

Tanya Reinhart Interface Economy and Markedness

146

Juan Uriagereka Formal and Substantive Elegance in the Minimalist Program

170

Hubert Haider Economy in Syntax is Projective Economy

205

Manfred Bierwisch Lexical Information from a Minimalist Point of View

227

Dieter Wunderlich A Minimalist Model of Inflectional Morphology

267

Adresscs of Contributors

299

Introduction

Chris Wilder and Hans-Martin Gärtner Generative linguistics, it is fair to say, is dominated by developments in syntactic theory; and for the past few years, work in generative syntax has been heavily influenced by what has come to be known as 'the Minimalist Program' (MP). The seminal paper that ushered in the M P was Chomsky's "A minimalist program for linguistic theory" (1993); this was followed by two papers defining a 'second p h a s e ' — "Bare Phrase Structure" (Chomsky 1995a) and "Categories and Transformations" (Chapter 4 of Chomsky 1995b). 1 At the core of the M P lies the idea that economy is a central property of the system of language. This is fleshed out in terms of concrete principles of UG that instantiate the overarching economy idea. Although economy principles have previously figured in explanations in phonology, morphology and lexicon theory (underspecification, elsewhere condition), as well as syntax, it is only with the M P that the role of economy has become a central topic on the linguistic agenda. This is especially true of syntax, but there is also convergence with developments in neighbouring subfields, as this volume illustrates. The papers collected here present differing views on the scope and nature of economy principles in syntax, morphology and the lexicon, offering extensions of and alternatives to the version outlined by Chomsky. In the following, we set out basic dimensions of the economy concept, and draw attention to some antecedents to the economy principles of MP. We then plot some of the main features of the MP itself. In part II, we introduce the contributions to the volume, relating them to the larger context. (Throughout, we refer to papers in this volume by author's name(s) in small capitals.)

I

Background

1

Economy: From Heuristic to Object of Study

In simple, informal terms, an economy principle can be expressed through an injunction like (1). This in turn can be decomposed into a 'last resort' principle, and a 'laziness' ('least effort') principle: (1)

'Do only what needs to be done.'

2 (2) a. Last Resort. 'Do what must be done.' b. Least Effort: 'Do not do what need not be done.' (2a) differs from (2b): compliance with (2a) is necessary to attain some goal, while noncompliance with (2b) does not necessarily lead to failure. But under certain conditions—scarcity of resources, a large number of goals to be fulfilled—(2b) can join (2a) as necessary for success. Taken together, these constitute a general economy principle (in practice, Last Resort is usually understood as implying (2b), and Least Effort as implying (2a), so that the two end up describing the same coin by its different sides). The usual role of (l)/(2) in science is as a heuristic guiding construction of theories commonly referred to as Occam's razor. But such maxims may have different— deeper?—import, reflecting principles underlying the design of the cognitive systems which the theories are about. Throughout, we need to distinguish these two aspects of the M P : 2 (3) a. Economy (I): A heuristic, a principle governing the way we go about constructing theories (explanations). b. Economy (II): A property of the object of study, i.e. a constitutive principle in the design of language (UG). The importance of economy principles in explaining language use, and cognitive processes in general, has long been recognised. 3 Such principles have also made regular appearance in models of specific competence domains, cf. the Elsewhere principle in the lexicon, minimality and locality principles in syntax and phonology. The major innovation of the MP has been the elevation of Economy to a fundamental constitutive principle in the design of the language faculty as a whole. 4 This nontrivial step from Economy I to Economy II is much in line with Chomsky's scientific realism: A naturalistic approach to linguistic and mental aspects of the world seeks to construct intelligible explanatory theories, taking as "real" what we are led to posit in this q u e s t . . . (1995c: 1).

Technically, it is important to recognize the comparison (or competion) inherent in the economy concept. A general format for economy principles is a simple injunction: "Minimize X". Minimization is a process of comparison and selection, presupposing (i) a set of alternatives, comparable in terms of bearing some property X, and (ii) a scale that orders these in terms of amount or degree of X: a 'price list'. Comparison of the alternatives determines which is the minimal ('cheapest') one. This then forms the correct choice by virtue of its properties in comparison with alternative candidates of its set. In an alternative image, selection can be envisaged as the outcome of competition among alternatives, the 'winner' having scored the most (or least!) points on the scale. This characterization leaves open parameters—the set of alternatives, the X of the price-list—that can be defined according to the nature of the system being described. It

Wilder & Gärtner:

Introduction

3

lends itself to description of optimization effects in all kinds of complex systems. The explanatory value of such a description (assuming it is accurate) will depend largely on the plausibility and naturalness of assumptions regarding the sets of alternatives and the X of the price-list, on which appeals to economy feed. 5 The relevance of economy to a (sub)system entails the presence of a minimal degree of complexity, in the sense that alternatives must exist, inside that system. The raison d'etre of an economy principle is selection among alternatives—minimally two. Suppose we have a system S' which is simple, in that it always generates a singleton output for each goal (no comparable alternatives). If we add an economy principle, S' only ever makes a singleton set available to its economy principle. Then of course, each output is a 'born winner' as far as economy is concerned, since it necessarily has less X than all the alternatives (there are none!) in its set. Economy is irrelevant to such a system—if it always applies vacuously, it is not possible to detect it. Now in theory, S' could still exercise the choice between having that output and having none. The latter option would imply failure of the system with respect to the goal at hand. It is crucial that that option is denied to economy, as a matter of principle. If we allow economy to choose no output, then no output will be the universal outcome—economy will prevent the system from fulfilling its goals. To illustrate the point with Occam: a theory with zero axioms is no theory at all. 6 As a heuristic guiding our construction of theories (3a), conceptual economy (Occam's razor) enjoins us: do not multiply your axioms beyond necessity. The theorist seeks to reduce his assumptions to the minimum necessary to account for a body of data. The set of alternatives are the various theories under consideration; the price-list is the scale defined by the number of axioms. Assuming that the theories fall into the same comparison class—i.e. to the extent that their goals are similar, that they share empirical coverage, that their axioms comparable, and so on—then the optimal theory will be the one deploying the least axioms. So much is familiar. So also are the difficulties of putting the principle into practice. How to establish the comparablility of theories? How do we count axioms? And so on. And beyond the practicalities lurks the conceptual issue: why should we seek elegance in our theories anyway? Why should the simplest theory be the correct one? 7 The phenomena in a particular domain may demand theories in which the economy notion is significant in a different sense. The data may be such that axioms are needed which generate sets of alternatives in the sense described above. The theorist may then decide to include an economy principle of the form "Minimize X", operating over such sets, among his axioms, turning economy into a property of the object under study (Economy II). Of course, there will exist alternative theories which do not include any such principle. But suppose that such alternatives involve more, or more complex axioms, than the version including such a principle. Then Economy I comes into play— the theory including an economy principle among its axioms is preferred for reasons of conceptual economy. Natural language—as already indicated above—is such a domain, it seems. Linguistic objects (words, sentences) involve combinations of primitives (features, etc.) that are organized in complex ways. Descriptive goals require the postulation of those primitives and the formulation of axioms (rules, well-formedness conditions) governing possible combinations. In some cases, it has long been claimed that correct

4

Wilder & Gärtner: Introduction

generalization requires the invocation of economy principles, for the sorts of reason just described. One such case is blocking in morphology, interpreted as a reflex of the Elsewhere condition by Kiparsky (1982). 8 Pairs of V-stems and past tense forms like move-moved, walk-walked, motivate the postulation of a V-stem for each pair and an affix (-ed [+PAST]) common to all, along with a rule for combining the affix with V-stems. In its simplest formulation, the rule attaches the affix to any V-stem: (4)

/ . . / ; +V

/...+ed/; +V+PAST

A problem arises with irregular past forms in pairs like go-went, swim-swam etc—not that an extra [+past] stem (e.g. went) or additional past-formation rule (e.g. ablaut) must be assumed, but that 'regular' affixation must be prevented from applying to the V-stem in such pairs ( * g o e d , *swimmed). The solution lies in assuming that past-formation constitutes a domain in which various alternatives ( - ^ - a f f i x a t i o n , ablaut, suppletive forms) are engaged in a competition governed by an economy principle, i.e. the Elsewhere condition. Suppletive past forms like went can be analysed as a rule in the format (1): (5)

/go/ ; + V

/went/;+V+PAST

(4) and (5) compete to apply to incoming V-stems; the economy principle orders them on the scale o f ' i n p u t specificity': a rule with a more specific input condition—i.e. (5)— will win out against a rule with less specific input (4). The output *goed is blocked, since (5) takes precedence over (4), and gives an output for go. If however, the incoming stem does not match a rule's input condition, that rule is no longer in the competition (it can give no output)—e.g. there is no rule relating walk with went. In that case, the most specific of the remaining rules which can apply will win out. This economy principle is the Elsewhere condition, which turns the general rule into a default that applies only where no more specific rule can apply, thus explaining the blocking effect. 9 It is hard to imagine a simpler alternative account of these facts. Without the Elsewhere condition, we either give up the affixation rule (thereby multiplying lexical entries = 'axioms'), or we have to state the rule in more complex fashion, i.e. with exceptions attached—equivalent to the addition of specific blocking 'axioms' for exceptions. Taking on board the way the Elsewhere account generalizes to numerous similar cases across languages, the argument from conceptual economy for adopting it becomes overwhelming. Another case for which an economy-style account has been invoked, this time in the syntactic domain, concerns the paradigm (6): (6)

a. Johnj V] [ PROj to leave] b. * Johnj V 2 Mary k [ PROj to leave] c. John; V2 Maryk [ PRO^ to leave]

Wilder & Gärtner:

Introduction

5

The subject argument of the infinitive verb in (6a) is controlled by the subject of the superior verb {try, hope, intend, etc.). When the matrix verb takes an object {persuade, force, ask, etc.), however, control by the matrix subject is not possible: rather, the controller may only be the object. A control rule based on the pattern (6a)—"the subject of the higher clause controls the infinitive subject PRO"—is falsified by (6b), while a rule based on (6c)—"the object of the higher clause controls..." cannot apply generally. The nature of the problem is familiar: it is the closest potential controller that controls. There is object control where this is possible (6c), the possibility of object control blocking subject control in (6b), while there is subject control otherwise (6a). Rosenbaum (1967) proposed just such an account in terms of a 'Minimal Distance Principle'. 10 Until recently, instantiations of economy in the syntactic theoretical apparatus were fairly isolated. " Before considering the emergence of the 'internal economy' as a pervasive factor in syntax in MP, it is useful to consider the role of Economy I (3a)—in rhetoric and in substance—in the the emergence of MP out of its predecessor "Government-Binding" (GB) model.

2

From GB to the Minimalist Program

A recurrent theme in linguistic theory is the tension between descriptive and explanatory adequacy that accompanies the search for the right model of grammar. The facts of natural languages necessitate complex descriptive mechanisms. Conceptual economy dictates that these be generalized and reduced as much as possible. 12 Alongside the incorporation of empirical findings on particular phenomena in individual languages, the elimination of redundancy in theoretical machinery has always been a driving force in the development of the theory. The move away from construction-centred rule-systems in the late 1970's was accompanied by a shift of focus from rules that derive representations to principles that govern representations. The GB-system (Chomsky 1981) rested on the interaction of the 'principles' of the 'principles-and-parameters' label: Case Filter, Theta Criterion, Binding Conditions, ECP, etc.—essentially well-formedness-conditions on representations. This system attained a degree of deductive depth, especially in the Binding-Case-Theta-A'/A-nexus, which enabled it—extended through the late 1980s with the 'Barriers' model (Chomsky 1986a)—to serve as a research platform for a decade or more (indeed, it still does so). Nevertheless, transformational grammar has been from its inception a derivational system. Despite its emphasis on 'principles', the derivation (Move-a) plays a central role in the GB-system, too. This is reflected in the 'T-model' architecture that held sway from the late 1970's up to Chomsky (1993), with its four levels of representation at which well-formedness conditions apply (i.e. D-structure, S-structure, Logical Form (LF), and Phonetic Form (PF)). In GB, principles are formulated in very specific fashion. While concepts of 'minimality' and 'locality' figure in many places, principles such as Subjacency, ECP, and Case Filter appear highly specific—language-faculty-specific and, internally, module- and level-specific—with intricate and seemingly arbitrary internal structure,

Wilder & Gärtner: Introduction

6

not relatable to external concepts. A central role in defining principles was accorded to the government relation, a 'derived primitive' of some complexity, especially in its 'Barriers' incarnation. One aim (only partially realized) of the MP is to reduce this specificity to its 'bare essentials'. In the MP, Chomsky 'goes back to basics'. His approach to the minimal theory of grammar starts from "virtual conceptual necessities"—basic elements that (virtually) anyone would agree on, ones that are incontrovertibly forced on us by the facts. Sentences, built from words, relate sound and meaning; that is, they are complex representations that are interpreted at LF and at PF—the two independent interfaces with extra-linguistic systems. Thus, we need (i) a lexicon, listing the atomic elements that may enter linguistic representations, and (ii) a way of combining these into (PF,LF) pairs. Furthermore, those representations must encode in some way (iii) the phrase-head distinction (i.e. we need X'-theory), and (iv) the fact that elements within those expressions may be displaced from the position where they are interpreted (i.e. we need movement/chains). Hence, the basic essentials are: (7)

a. b. c. d.

units: lexical items (heads), phrases, chains levels: PF and LF a construction/assembly mechanism (forming phrase structures from lexical items) movement (forming chains in trees)

'Minimalism', so the rhetoric goes, dictates that the 'perfect theory' will contain as little more than this as possible. In this sense, the emergence of the M P model was explicitly and fundamentally influenced by Economy I: . . . further questions arise, namely those of the Minimalist program. H o w "perfect" is language? O n e expects "imperfections'" in . . . the lexicon and aspects o f . . . the A-P interface. The essential question is whether, or to what extent, these components of the language faculty are the repository of departures from virtual necessity . . . Looking at the same problem from a different perspective, we seek to determine structure

just

to the language

how far faculty,

the evidence

really

carries

us toward

attributing

specific

requiring every departure from "perfection" to be closely

analyzed and well-motivated. ( C h o m s k y 1995b:9; our emphasis)

Taking the challenge seriously, the MP proceeds with wholesale revision of the architecture and core concepts of the GB-model. This extends to levels, principles and relations. Firstly, if LF and PF are the onl^ externally motivated levels, then DS and SS can be dispensed with, and so should be. The adoption of Generalized Transformations as a tree-construction and lexical insertion mechanism (required anyway), allows DStructure to be dispensed with. At the same time, the introduction of GT permits a partial unification of structure-building and movement, and a reinterpretation of the cycle as a consequence of the tree-building procedure (more on this in section 4 below). Secondly, with D-structure eliminated, S-structure is reinterpreted as a 'stage in a derivation' (Spell-Out). This opens the program of reinterpreting other conditions defining SS and DS as independent levels. One aspect consists in showing that

Wilder & Gärtner:

Introduction

principles such as the Case Filter or Binding conditions can (if possible: must) apply at LF, rather than SS. Another concerns reinterpretation of 'S-structure parameters' governing overt vs. covert application of Move-a (in the sense of Huang's 1982 proposals on w/z-movement) via conditions on PF and LF. A third key task is to eliminate government as core relation, replacing it with a locality relation constructed in terms of X'-theoretic relations, independently given (cf. Chomsky's 'Minimal Domain'). Then it becomes necessary to restate Case Theory, Binding Theory, 9-theory, ECP etc—core concepts in the GB-model that relied on government—using these bare minima. The terms for explanations are limited to (i) properties of lexical items themselves, (ii) conditions on LF (or PF) (iii) X'-theoretic locality, (iv) the workings of derivational operations: GT and Move-alpha. Crucially, MP puts forward the hypothesis that economy-based principles play a major role in the computational system of UG. Principles of the form "minimize X" constrain representations, and the operations that build them. In this sense, we move away from Economy I—economy becomes a property of the object of study, the design of language itself. It is instructive to see how much of the MP—including economy—was already implicit in the syntactic analyses of the GB framework. Take for example the analysis of Wz-movement in simple and multiple interrogatives. Adapting May's (1985) or Rizzi's (1991) WH-criterion, assume the level-specific principles in (8), coupled with lexical specifications (9): (8) a. b.

[+WH] in C must match with [+WH] in an XP in SPEC,CP (at SS) [+WH] in XP must match with [+WH] in C (at LF)

(9) a. b.

C: [+WH] 0 / D: [+WH] which, who, what, etc. /

[-WH] that,etc. [-WH] this, the, helshelit, etc.

These suffice to account for (10): (8a) ensures that movement of what applies pre-Sstructure in (10a), (8b) then being satisfied at LF. The SS-representation determines the given order of elements in PF, the LF provides a basis for interpretation as indicated. (10) a. b.

what did John say that he saw 'for which thing x , John said he saw x'

Assuming that LF-w/z-raising adjoins a wh-phrase to Spec,CP in case Spec,CP is already occupied, and that adjunction to Spec is sufficient to satisfy (8b), (8)-(9) also nearly suffice to ensure the derivation of (11). (11) a. who said that he saw what b. 'for which thing x, for which persony, y said he saw x' The appearance of two w/z-phrases in (11) however makes available an alternative derivation satisfying (8), in which what raises to satisfy (8a) at S-structure, with who adjoining to what covertly, giving *what did who say that he saw. To exclude this

Wilder & Gärtner:

Introduction

derivation, various accounts have been offered (cf. Chomsky's 1973 Superiority Condition, Pesetsky's 1982 ECP-account). Further, cross-linguistic variation is accommodated by parametrization of the principles (8). Given a division into English-type languages (one w/z-phrase must raise before SS) and Japanese-type languages (all w/z-phrases remain in situ at SS), the latter type can be described simply by relocating (8a) at LF in the grammars in question. In sum, an elegant analysis for a complex array of facts. 14 The movement parameter rests on a principle (8a) that can alternatively apply at SS or LF—an option not available in MP. Chomsky (1993) reinterprets this account in terms of the strong-weak feature distinction (interface-relevant properties of lexical items), interacting with an economy principle ('Procrastinate') that selects among potential derivations. As well-formedness conditions on representations, matching requirements (8) are interpreted as a reflex of representational economy holding at the interfaces (LF/PF), requiring the removal of formal (later: 'uninterpretable') features introduced into derivations by lexical items (9). The PF-filter is 'turned on' only by the presence of a feature lexically specified as 'strong'. The effect of the SS-parameter is thus obtained by attributing a property 'strength' to features of lexical items (here: the WH-feature of C in English) to which PF-well-formedness is sensitive. 15 This aspect of the reorganisation does little more than restate the GB analysis: an SS filter is removed, by declaring it to hold at PF. 16 The role of economy in the account is more substantial. (12) represents a 'Japanese' derivation, (13) an 'English' derivation. The filters (8) ensure the LF is the same; the derivation (12) is excluded for English by the SS-Filter. However, while (12) is permitted for Japanese, (13) is not yet excluded. (12) a. SS: b. LF:

C John said that he saw what what C John said that he saw

(13) a. SS: b. LF:

what C John said that he saw what C John said that he saw

Similarly, nothing that has been said so far prevents (11) from surfacing as *what who said that he saw. The logic of freeing move-a from level-specific constraints demands an answer to such questions; stipulating that w/z-movement in Japanese must be delayed is unsatisfactory. Chomsky (1993) proposes that the computational system is 'lazy', it acts to satisfy filters 'as late as possible'. Early (pre-SS) movement generally incurs 'extra cost', which permits the derivations to be ordered. The intuition is expressed as the Procrastinate Principle; in the unmarked case, derivation (12a) is cheaper than (13a) (preferred by Procrastinate). In the presence of 'strong' [+WH] in C, however, (12a) leads to 'crash' at PF, leaving (13a) as the cheapest 'convergent' derivation. The logic of the approach is now familiar—descriptive contingencies indicate alternative derivations; an economy principle selects among costed alternatives, such that cheaper derivations block more expensive derivations, the latter being permitted only when a cheaper derivation is not available.

Wilder & Gärtner: Introduction

9

Chomsky (1993) points out that there is also an apparently simple 'economy' account for superiority effects, in terms of the 'minimal' legitimate derivation blocking all others. The derivation (11) in which (8a) (i.e. strong [+WH] in C) is satisfied by who rather than what, is more minimal than the alternative in the sense that the path traversed by who on raising to C is shorter than the path traversed by what. This then becomes a shortest path condition on movement. The implementation of this idea, discussed in S T E R N E F E L D ' S paper, is by no means trivial. H A I D E R offers evidence from German that the idea may be simply false. In the GB-model, derivational mechanisms (e.g. move-a) apply freely to generate a linguistic representation (a quadruple {D,S,P,L}) which independently satisfies levelspecific constraints. Research focussed on well-formedness condition on representations, that conspire to constrain movement; but the problem of overgeneration attendant on the conception o f ' f r e e rule application' remained. In the MP, overgeneration is curbed by the elevation of economy ('Last Resort') to an all-constraining principle. Where an operation is not motivated ('triggered'), it is not licensed. The paradigm case illustrating Last Resort is Case-driven NP-movement. In GB-terms, the need for a representation to satisfy the Case Filter at the level of SStructure acts as a trigger for movement of a NP in a Case-less position into a position where it can be assigned a Case Feature: (14) a. John was told t that it was raining b. *It was told John that IP The Last Resort nature of NP-movement is shown by comparison with (15). (15) a. It seemed to John that it was raining b. * John seemed to t that IP Even though the subject position is a legitimate target for NP-movement, and even though English permits preposition-stranding under NP-movement, movement of John in (15b) is blocked. The NP satisfies Case requirements in its base position (15a), so movement is unnecessary; economy ensures that the derivation leading to (15b) is ungrammatical. 17 Chomsky (1993) points out that the paradigm motivates a strengthening of Last Resort. A structure like (16) (= (15b) before movement) is ungrammatical as it stands— the clause needs an overt subject (cf. the Extended Projection Principle of the GB model), which is attributed to the need for the (nominative) Case feature in Infl to match with an NP in Spec,IP. (16) (*) seemed to John that it was raining Movement of John to the subject would be one possibility for saving (16) from ungrammaticality, and in this sense, would not be a gratuitous operation. But movement is blocked—the NP does not move since it satisfies its own Case requirement in situ. It appears that move-a cannot apply where a does not need to move to satisfy its own

10

Wilder & Gärtner:

Introduction

requirements. This 'self-serving' property of Last Resort ('Greed'), is held to constrain all applications of move-a: (17)

Greed: Move-a must result in satisfaction of requirements of a

Adopting (17) immediately restricts the class of permissible derivations; in particular, it reduces the role of true derivational economy (comparison of possible derivations) in calculating the 'optimal' one. (However, in Chomsky (1995a, 1995b), Greed is revised—as discussed also by F R A M P T O N ) . This conception also means that optional (unmotivated) movement is not possible. Optimal derivations are 'pre-determined' by the interaction of lexical properties (triggers) with output conditions (FI) and derivational economy principles. For NPmovement, w/2-movement and V-movement, the move seems successful. But in face of apparent 'optional movement' constructions, such as Scrambling (also other 'stylistic rules' such as extraposition, or the negation data discussed by HAIDER), sticking to this view might involve unacceptable cost in terms of assumptions about lexical properties (triggers). In this volume, 'optional' word order alternations are variously treated as movement triggered by optional morphosyntactic features ( G R E W E N D O R F & SABEL, M U L L E R , and S T E R N E F E L D ) , and as base-generated orders in free variation ( H A I D E R , R F J N H A R T ) . Reinhart suggests a different role for economy, that of choosing among the outputs at the PF-interface. We return to economy in syntax and competing derivations in section 6 below.

3

Lexicon, Morphology, and Syntax

The syntactic theory of MP places a heavy burden of explanation on the lexicon— specifically, on properties of lexical items. This reliance on the lexicon has three interlocking sources in MP. Firstly, syntactic operations respond to properties of lexical items. There are no construction-specific rules; only (morphological properties of) lexical items (closed class functional items) can be the source of specific syntactic patterns. Secondly, the hypothesised crosslinguistic invariance of syntax (computational system) means that much of the work in accounting for cross-linguistic syntactic diversity is delegated to the lexicon. Properties of lexical items interact with invariant syntactic principles to produce different syntactic patterns. 18 (The rest of the work must be done by phonology, a second locus of cross-linguistic variation). The third source is the hypothesis of inclusiveness (Chomsky 1995b:228), according to which syntactic objects created during the derivation, and in particular interface representations, contain nothing more than the lexical material, arranged in certain ways by the computational system: A '"perfect l a n g u a g e " should meet the condition of inclusiveness: any structure f o r m e d by the c o m p u t a t i o n . . . is constituted of e l e m e n t s already present in the lexical items selected . . . no n e w o b j e c t s are added in the course of c o m p u t a t i o n apart f r o m rearrangements of lexical properties . . .

Wilder & Gärtner:

Introduction

11

I.e. syntax can manipulate (concatenate, copy, delete) lexical items, but may not add new items or features. Phonology forms the major exception to inclusiveness: the phonological derivation must have the capacity to add features and structure (e.g. metrical structure) to lexical material, if any version of underspecification is correct. Full Interpretation (FI) dictates that interface representations only contain items interpretable there. Given inclusiveness, all such elements must have entered from the lexicon (or have been created from lexical input). While features contained in interface representations (particularly LF) come from the lexicon, not all features from the lexicon are present in the interface. Phonological features are 'stripped' at spell-out; morphological features trigger movement that cause their own elimination. What is left is 'interpretable' at LF. However, MP has little to say about the structure and organization of the lexicon itself. Lexical items are viewed as simple sets (of sets) of elements of the usual sort: i.e. phonological, morphological, syntactic and semantic features. Except for 'strength', no new lexical properties or entities are assumed for purely syntactic reasons; no additions are made to the special 'syntactic' features (e.g. the [pronominal] and [anaphoric] features dedicated to Binding theory) of the GB system. On the other hand, neither are there substantive constraints, beyond a general 'minimalist' prerogative, on what one may assume. The logic of Minimalist explanation encourages the addition of new features or properties to the inventory—if there is a movement, there must be a feature to trigger it; likewise, if movement has a certain property, perhaps this is a property of the trigger. This tendency is illustrated in this volume by G R E W E N D O R F & S A B E L ' S Agr-features, and M Í ' J L L E R ' S distinction between optional vs. obligatory features—both are attempts to come to grips with complex movement patterns within the confines of minimalist syntax. Yet the game with features is by no means restricted to syntax: in his analysis of complex inflectional patterns in Georgian and Potawatomi, WlJNDERUCH relies on minimal morphological operations, but appeals to unorthodox 'complex' features that denote relative positions in theta- and animacy hierarchies. Rather, it is fair to say that descriptive necessity is the mother of invention here. Beyond issues of the interaction of lexical items with syntax, the question of the role of economy has independent significance in lexicon theory. Economy has precursors in morphology and lexicon theory; concepts that originated from attempts to clarify the principles governing the format of lexical representations—underspecification, markedness—have a clear affinity to syntactic economy principles and to distinctions (e.g. 'strong' vs. 'weak' features) they respond to. The Elsewhere condition and the blocking effects it accounts for are clearly related to principles like Procrastinate and Shortest Move involved in ordering and selecting derivations in the syntax in MP. Underspecification forms a major component in hypotheses about the format of lexical entries. The assumption that lexical entries provide only information about marked feature values, with unmarked values being filled in 'later' by (default) rule application—is central in phonology. Similar ideas have been applied to morphological, syntactic (categorization and subcategorization), and semantic information in lexical entries (cf. B i E R W i S C H , this volume). The choice of underspecification as a descriptive strategy reflects conceptual parsimony (Economy I)—the theorist chooses the minimal format for lexical entries, in the attempt to derive predictable information and

12 generalizations by maximal appeal to general rules. To the extent that the strategy is successful, it appears that underspecification is a real factor in the organization of lexical knowledge. Taken as a property of the domain of study, underspecification may then be viewed as reflex of a substantive economy principle governing possible format of lexical entries (Economy II). Thinking of the role of the lexicon as the repository of 'permanent knowledge', underspecification may be explained as a reflex of 'representational economy'. The effect of ensuring maximally underspecified representations is to minimize space needed to store them. Hence, lexical representations may be subject to a specific economy principle (18): 19 (18)

Minimize storage.

A second feature of the organization of lexical information are the blocking effects in derivational and inflectional morphology, which Kiparsky (1982) attributes to the Elsewhere condition ('if a specific form is listed, don't create new ones by regular means'). Viewing word-formation processes as operations (taking a 'derivational view'), the Elsewhere condition can be interpreted as a reflex of a general constraint (19): (19)

Minimize computation.

Both these i d e a s are i m p l e m e n t e d in WUNDERLICH's paper.

To the extent that the system of lexical knowledge displays properties not reducible to principles underlying the syntax, the question arises as to the source of such differences. Both the derivational and representational economy principles governing syntax may be exhausted by (19)—assuming 'interpretation' (FI) can be read as 'interface computation'. The lexicon differs from syntax (the 'computational system') in its function as a permanent repository of linguistic knowledge. When applied to the lexical domain, the economy requirement may yield a distinct principle (18). It is tempting to speculate that special principles governing the organization of lexical knowledge related to or even reducible to (18). There are further questions concerning lexicon-syntax relations which complicate the picture. Some spring from a basic tension between various notions of what the lexicon is. Minimally, the lexicon is simply a store of irreducibly idiosyncratic information: it lists arbitrary associations of form and meaning, plus non-predictable information about those forms and those meanings. All predictable properties of words are then to be derived by rule (or are governed by principles), belonging to 'syntax' in a wide sense— including phonological and interpretive rules/principles, as well as those governing sentence-construction ('syntax proper'). On the other hand, the elements stored in the lexicon are treated as syntactic atoms; the lexicon precedes sentence grammar, i.e. it feeds the syntactic derivation. Serious questions arise as to what properties such syntactic atoms may and must have, beyond the minimal, purely idiosyncratic formmeaning specifications. Words are more than unstructured bundles of features; and their internal structure goes beyond the intrinsic structure that emerges from the onotology of feature-types (phonemes are sequentially ordered; words may be morphologically

Wilder & Gärtner: Introduction

13

complex; predicates have argument structure). To the extent that those properties are non-idiosyncratic, it becomes necessary to assume rules / principles governing the lexicon. Questions then arise concerning the nature of those principles—are they reducible to universal principles that govern 'sentence grammar', are there special universal principles dedicated to the lexicon? These issues are taken up by B I E R W I S C H . Related questions concern the status and place of morphology (word-formation and inflectional morphology). Is there a morphological component independent of the lexicon? Does morphology happen before, after, or even during syntax? In the MP, inflection is assumed to happen prior to syntax—in the lexicon (or in the process of forming the 'numeration'—see section 6 below) prior to derivational operations—while room is left both for syntactic word-formation (head-movement in the sense of Baker 1988) and for post-syntactic morphology in the phonological component. The latter is developed by Halle and Marantz (1993) into a model in which the determination of word-forms takes place in an autonomous Morphology component after syntax. In this volume, WlJNDERLlCH adopts an opposite, 'lexicalist' stance in which all of inflectional morphology precedes syntax, and takes issue with some of Halle & Marantz's assumptions. One issue in this debate concerns the account of syncretisms. These are handled by Halle & Marantz as the effect of 'impoverishment rules', which reduce the information passed to Morphology by syntactically fully specified trees. For Wunderlich, syncretisms reflect feature underspecification in word-forms, which is then inherited by the syntactic structures containing them.

4

Dynamic Conceptions of Phrase Structure

As discussed above, the MP reorganizes the base component of the grammar. 20 Dstructure being eliminated, the base rules—called phrase structure rules in earlier models, later generalized as the X-bar format—are left dangling in the air, so to speak. It has become superfluous to generate D-structure as an output that feeds lexical insertion 21 on the one hand and transformational rules which derive surface structure on the other. A different perspective emerges instead. Lexical items can be taken to make up the input of a generalized syntax—the computational system 'CHL'—that maps a set ('array') of such items into interface representations directly. Factually, this development took place in two phases. The first consists in postulating three operations, Select, Project, and Generalized Transformation (GT). Select takes an item X from the lexicon. Project moulds X according to X-bar structure, that is, X becomes any of (20a-c): 22 (20) a.



b.

[x.X°]

c.

[Xp[x'X°]]

GT is a more complex operation that targets a structure K, adds the dummy symbol 0 , substitutes a structure K' (K' * K) for 0 , and forms K*, K* subject to X-bar theory. An example is given in (21):

14

Wilder & Gärtner:

(21) a. b. c. d.

Target K ( = X ° ) Add 0 Substitute K' (= YP) for 0 Form K*

| | | I

Introduction

X° X°0 X°YP [X'X°YP]

Given that (2Id) seems to employ the operation Project, the relation between GT and Project seems to be amenable to further refinement. The role of Project is taken to be the preservation of certain uniformity properties, such as the requirement that elements in specifier and complement position be of XP status even if made up of a single terminal. W e return to some revision of these issues below. There is a simple argument that something like GT would have been necessary in pre-minimalist syntax to account for the facts under (22) (Lasnik & Uriagereka 1988:147; Frank&Kroch 1995): (22) a.

It is easy to please John.

b.

John is easy [cp OPj [jp PRO to please t| ] ].

c.

[ That economy was difficult to control ] is easy to see.

(22a) shows that the matrix predicate assigns only one 9-role. Thus, the N P John in (22b) does not occupy a 0-position and can therefore not have been present at Dstructure—(22b) corresponding to the standardly assumed analysis of tough-movement constructions at S-structure. To assume that John has been inserted transformationally further implies that even complex structures, illustrated in (22c), must be allowed to be generated in parallel. Hence, an operation that combines such structures—traditionally called a generalized transformation—is required. In a second phase of refining the base-component of UG, the X-bar format is fully eliminated as an independent mediator between lexicon and syntax. Lexical items now belong to the class of syntactic objects which are directly accessible to a modified form of GT called Merge. Merge is a binary operation that takes two syntactic objects, a and P, and creates a labeled set from them: (23)

Merge ( a , p) - > {a, {a, (3} }

Labelling is achieved by taking the set of the constituent syntactic objects to be a member of another set, which contains the label as an additional element. The label is a copy of (the head of) a or P, a in (23) for the sake of concreteness. If a is the label, it can be said that the entire structure is a projection of a . Thus, Merge incorporates Project, a development foreshadowed by step (23d) of GT above. In order to render the objects created by Merge commensurable with traditional notions of syntactic tree structures Chomsky defines terms, corresponding to nodes (1995b:247):

Wilder & Gärtner: (24)

Introduction

For any structure K a. K is a term of K. b. If L is a term of K, then members of the members of L are terms of K.

Accordingly, { a , { a , P } }, the product of merging a and [3, consists of three terms: (i) { a , { a , (3 } }, (ii) a , and (iii) (3. These could be mapped into a standard tree, as in (25): 23 (25)

{a, {a, p } }

Crucially, labels do not constitute terms. Bar-level status is no longer considered an inherent property of syntactic objects but is computed contextually: A category that does not project any further is a maximal projection XP, and o n e that is not a projection at all is a minimal projection X m l n , any other is an X' . . . (ibid.:242)

This definition puts an end to questions such as whether a category needs a specifier in order for it to count as maximal, issues that led to sometimes quite artificial debate. At the same time, generalizations that concern X-bar status can still be expressed. More generally, the development sketched above is driven by two forces. The feature bundles called lexical items are elevated to central agents in syntax, allowing a greater focus on features per se and acknowledging the priority of this substantive core of language (cf. section 3 above). I . . . will therefore keep to the minimalist assumption that phrase structure representation is "bare", excluding anything beyond lexical features and objccts constructed from them. (ibid.:245)

Another guideline for modeling the computational system is stressed by who observes:

FRAMPTON,

W e are certainly not at the point where we can pretend to be modeling mental computations at the level of algorithm. But feature-driven syntax may offer the possibility of taking some steps in this direction.

Given the central role of Merge, it can be asked whether there are any consequences for core notions of syntax like the concept of constituent, and the dominance, precedence, and c-command relations. In fact, Kayne (1994) showed how to derive the precedence relation from (asymmetric) c-command with the help of a Linear Correspondence Axiom (LCA), together with the requirement that terminals be totally ordered: 2 5

16

Wilder & Gärtner:

(26)

Introduction

LCA If a node a asymmetrically c-commands a node (3, then everything reflexively dominated by a precedes everything reflexively dominated by (3.

It is possible to derive a large part of X-bar theory from the LCA. Thus it follows that syntactic structures are at most binary branching, i.e. the number of nodes immediately dominated by another cannot exceed two. Take the structures in (27):

In (27a), trivially, L, M and N cannot be linearly ordered because none asymmetrically c-commands any of the others. Adding P in (27b) makes L and M asymmetrically ccommand and thus precede P; but L and M remain unordered with respect to each other. Now, adding O in (27c), although it yields a precedence ordering for L and O, undoes the ordering between M and P, because N now asymmetrically c-commands O, so that P is required to precede O, which results in a contradiction. Obviously, adding further elements will not remedy the situation. Binary branching is the limit for syntactic structures, a fact that is axiomatic in the theory of Merge. Sidestepping a number of technical issues about specifiers and intermediate projections, we may wonder to what extent the LCA is compatible with Chomsky's minimalist enterprise. The most natural interpretation seems to be to regard LCA as a PF-principle. (Chomsky 1995b:334ff). Under that view, the sister relations created by Merge are unordered; precedence only comes to be defined by an operation governed by LCA in the PF-wing of the grammar. The LCA is possibly a candidate for interpretation as an economy principle. There are 2 n choices for linearizing the terminals of a binary tree with n branching nodes (without crossing the branches); the terminals of (28a) have the four possibilities shown: (28) a.

K L

M N

O P

b.

L-N-P, L-P-N , N-P-L , P-N-L

Regarding these as 'competitors', LCA serves to distinguish one as correct. The question that arises is—why is linearization based on this correspondence (c-command

17

maps to precedence), and not some other? Is there a natural interpretation of the LCAmapping as defining the most 'economical' route from a hierarchical arrangement to a linear sequence of terminals? UR1AGEREKA suggests such an interpretation, under which the LCA falls out as a consequence of more basic assumptions. Further reduction has been envisaged by Epstein (1995), who observed that ccommand itself is closely related to the operation Merge. Given that a node a ccommands a node P iff P is the sister node of a or dominated by the sister of a , and given further that sisterhood is what results from Merge, c-command can be conceived of as dependent on Merge. Although a precise formulation of this insight is far from trivial and in spite of the fact that head movement adjunction does not easily yield itself to Epstein's analysis—the c-comand domain of X° adjoining to Y° is standardly assumed to comprise Y' and whatever Y' dominates although X° merges only with Y°, not Y'—Epstein's theory highlights another important current in the MP: a dynamic view on syntax. One might ask why syntax does not make use of just any relation definable over fully-fledged tree structures. It appears to be the case that somehow the window of syntactic processes is smaller. Relations that are not established dynamically, that is, by the operations that determine the course of a derivation, play no role. 2 6 A dynamic way of handling syntactic information had already served as an empirical motivation for adopting GT during the first developmental phase under consideration here. Lebeaux (1988) used GT to derive the contrast in (29). (29) a. * [cpi [DP Which [NP claim [cp2 that Johnj was asleep ]]]k was he; willing to discuss t^ ] ? B- [CPI [DP Which [JSJP [NP claim ] [CP2 that Johnj made ]]])< was hej willing to discuss t^ ] ? It was stipulated that adjuncts, unlike arguments, could be inserted into the structure after movement has taken place. This presupposes again that syntactic structures can be generated in parallel and that GT is available. Consequently, t^ does not contain a copy of CP2 in (29b)—as opposed to (29a)—and no principle C violation can arise. 27 Chomsky (1993) captured these facts by exempting adjunction from the general requirement on pre-Spell Out operations to extend their target. Thus, K' in (21) above has to become the sister of the root node of K, a new root immediately dominating both K and K' as a result. Likewise, the transformation Move is subject to this so called "extension" requirement on overt operations, which therefore follow the principle of strict cyclicity. Again adjunction is the exception for obvious reasons. X°-movement being adjunction to a c-commanding head Y° cannot target the root node of the structure as illustrated in (30).

18

(30) a.

X P

X°i

Y'

b. *

Y'



Y'

X°i



XP ti

Indeed, cyclicity is an important ingredient of locality requirements on movement, and finds its w a y into the discussions by HAIDER, M U L L E R , S T E R N E F E L D and U R I A G E R E K A . Kitahara (1995) o f f e r s a mechanism sensitive to a fewest steps metric to derive the cyclicity property f r o m X-bar theoretic distinctions not available to C h o m s k y ' s conception of bare phrase structure. This is discussed by HAIDER, who adds some critical remarks on the concept o f ' c y c l i c i t y ' with respect to the architecture of UG. It has to be mentioned here that the account of (29) appears to be lost under the second phase of r e f o r m i n g phrase structure: W i t h regard to M e r g e , there is nothing to say. It satisfies the extension c o n d i t i o n for e l e m e n t a r y reasons already d i s c u s s e d . ( C h o m s k y 1995b:328)

The account of (29), however, requires Merge to target a non-root node, namely, N P in (29b), which is already dominated by DP and CP1 at that point in the derivation, i.e. after w/2-movement has taken place. So, unsurprisingly, no mention is made of the data in (29).

5

Adjunction

The issue of adjunction stands out as defying full integration with minimalist principles. It is an open question whether the cost incurred by the technical complications which adjunction necessitates might not actually make it desirable to dispense with adjunction altogether. Technically, adjoining (3 to a gives rise to a segment-category distinction. (31)

Merge ( a , (3)

{ < a , a > {a, P } }

Although a projects, the result is not a new category—instead, a two segment category is formed. The special status of an adjunction projection of a , called the upper segment, is indicated by its label < a , a > . The definition of terms in (24) applies equally to (31), yielding three terms: (i) { < a , a > { a , (3 } }, (ii) a , and (iii) (3. These could in principle also be interpreted as nodes (but cf. Chomsky 1995b:247). A category must then be allowed to comprise a set of terms, containing { < a , a > { a , P } } and a , in our example. It remains to be seen whether allowing sets of terms to be functioning elements lies beyond what Chomsky calls . . . a fairly disciplined m i n i m a l i s t approach. Thus, with s u f f i c i e n t l y rich f o r m a l d e v i c e s (say, set theory), c o u n t e r p a r t s to any object (nodes, bars, indices, etc.) can readily be c o n s t r u c t e d f r o m

Wilder & Gärtner:

Introduction

19

features. There is no essential difference, then, between admitting new kinds of objects and allowing richer use of formal devices; w e assume that these (basically equivalent) options are permitted only when forced by empirical properties of language. (ibid.:381, fn. 7)

Definitions have to be formulated carefully. The relational definition of X-bar status (see above) applies to categories only. Dominance and c-command are restricted to terms (ibid.: 339). If it were not for X°-adjunction, crucially employed in headmovement, no further complication should arise. 28 With the notion "disconnected" taken to hold for two terms if neither dominates the other, the following (standard) definition of c-command suffices. (32)

X c-commands Y if (a) every Z that dominates X dominates Y and (b) X and Y are disconnected, (ibid.:339)

In head-movement adjunction (30a), however, the moved X° is standardly taken to ccommand its trace. This has sometimes been ensured by index percolation, which transfers the c-command properties of Y° to X°. Yet, c-command properties are not defined for a two segment category that restricts dominance and c-command to terms. Adding such definitions for categories as well, it is possible to make the necessary distinctions for an adjunct to c-command outside the category it is adjoined to (cf. ibid. 339). Thus X° c-commands its trace in (30a). However, complications arise in other areas of the phrase structure. If adjuncts are allowed to attach to intermediate projections (ibid. 330) they will c-command the specifier of the same projection. (33)

[ X P YP [ X ' ADJ [ X ' X° ZP ] ] ]

As soon as both YP and ADJ are complcx, they cannot be ordered with respect to each other—assuming the LCA is adopted (see above)—and no precedence ordering can be assigned to them. Similar effects—plus potential binding theoretic consequences—will arise from multiple XP-adjunction to YP, as has often been proposed for scrambling: (34)

[yp XPj [ Y p ZP k [ Y P • • • tj . . . t k . . . ] ] ]

In (34), XP and ZP cannot be ordered with respect to each other. Even with these issues in mind, the considerable success of using adjunction in the area of movement theory and locality cannot easily be overlooked (see GREWENDORF & SABEL for further arguments that specifiers and adjunction sites should be kept apart.) Still, radical departures have been proposed. Thus, the innovative work of Alexiadou (1995) reanalyzes all adjuncts as specifiers of separate (functional) projections. Zwart's (1993) pioneering study of Germanic SOV-languages opens the possibility that what has been called scrambling is actually movement into functional specifiers, AGRoP most prominently. Even X°-movement can be conceived of as creating its own landing sites rather than adjoining to a target (Ackema, Neeleman & Weerman 1993). Moreover, much of the data associated with adjunction might pertain to a stylistic component of the language faculty dealing with marked phenomena and thus belong

20

Wilder & Gärtner:

Introduction

outside the computational system as modelled by minimalist principles (Chomsky 1995b:324f). See also R E I N H A R T , who presents an integrative theory.

6

Competing Derivations, there-Sentences and FI

The introduction of economy principles into the computational system requires a reappraisal of global conditions. In order to be fully aware of the options available, one has to look closely at the reasoning that transformed "guidelines [which] have a kind of 'least effort' flavor to them" into "actual principles of language" (Chomsky 1991:418). A central case in point is the analysis of X°-movement in English and French. Thus, the following contrast has been explained by counting the number of operations required to derive the structures: (35) a.

Jean embrasse souvent Marie. John kisses often Mary b. *Jean souvent embrasse Marie.

Simplifying matters radically,' 0 we can say that (35a) blocks (35b) because the derivation of (35a) employs one step less than the derivation of (35b). Both involve a V°-to-I° movement, overt in (35a) but covert in (35b), the surface structure of (35b) having arisen from an additional, and therefore crucially superfluous I°-to-V° lowering operation. It is immediately clear—although it was not discussed at the time when (35) was analyzed—that blocking effects cannot just be a global numerical property. Thus, (36a), which arguably involves fewer derivational steps, certainly does not block (36b). (36) a.

Jean dort. 'John sleeps' b. Je sais qu'il dort. 'I know that he sleeps'

Trivial though this point may appear at first sight, it is essential that, for transderivational economy principles to apply properly, candidate or reference sets be determined among which economy principles are supposed to select the optimal competitor (recall the discussion in section 1 above). This central task is confronted by S T E R N E F E L D . The same issue underlies much of the debate in M U L L E R ' S and R E I N H A R T ' S papers. Chomsky's response to this question has been to introduce the concept of a numeration, i.e. the array of lexical items selected from the lexicon to be mapped into interface representations (see above). As a first approximation, economy conditions apply relative to such numerations. A second domain in which 'least effort' plays a role is the insertion of dummy elements, (non-emphatic) afo-insertion in (37b), which otherwise has to occur in negation and interrogative contexts must not block (37a):

21 (37) a. John wrote books. b. *John did write books. (37b)—again we simplify radically—involves only ¿/o-insertion to pick up inflectional material in 1°, whereas (37a) requires overt I°-to-V° lowering plus V°-to-I° raising at LF. To prevent a blocking effect in this case, language particular rules like do-insertion are assumed to be costlier than the universal movement rules operative in (37a). Doinsertion is only a last resort to salvage otherwise ungrammatical derivations. Another line of dissociating (37a) and (37b) would be to include do in the numeration for (37b). Numerations would thus differ and no competition among the derivations can arise. The illformedness of (37b), of course, would have to be explained independently (cf. Wilder & Cavar 1994 for a proposal). Further refinement is needed to analyze there-insertion, a construction that has been at the core of much of minimalist syntax. Here it has been assumed that there will always be part of the numeration of the sentences in which it appears. Thus, (38a) and (38b) do not compete: (38) a. A unicorn is in the garden. b. There is a unicorn in the garden. Now, to account for the distribution in (39), Chomsky (1995b:344ff) assumes that economy conditions apply in a more local fashion. (39) a. There; seems tj to be someone in the room, b. *There seems someone; to be tj in the room. Given the strictly bottom-up procedure of generating syntactic structures (see above), choice points arise at which the cheapest option possessing a well-formed continuation (of whatever complexity) must be selected (cf. F R A M P T O N ) . A S soon as (40) has been arrived at, the specifier of IP must be filled: (40)

[p to be someone in the garden ]

Since there is part of the numeration, we have two options: either insert there or move someone. Independently, it is assumed that the operation Merge is cheaper than Move, Move being subject to Procrastinate. Arguably it is also desirable to empty the numeration as fast as possible to reach the end of the derivation. Consequently, there has to be inserted. When it comes to filling Spec,IP of the matrix clause in (41), no option arises. (41)

[¡' seems [ there to be someone in the garden ] ]

There, being the only element visible for the movement rule, will be attracted to that position. The result is that (39a) blocks (39b) as desired. Yet, things turn out to be more complicated than that. The reasoning employed to analyze the data in (39) predicts that (42a) should block (42b).

22

Wilder & Gärtner:

(42) a.

Introduction

[ A rumour [cpi that there was a unicorn in the garden ] ]j was t; in the air.

b. There was [ a rumour [^pj that a unicornj was tj in the garden ] ] in the air. Both, however, are fully grammatical. The logic of embedding requires CPI to be completed first. Filling IP-Spec in CPI gives rise to the first relevant choice point (43a). Again, there must be inserted, yielding (43b): (43) a. Q' was a unicorn in the garden ] b. [jp there was a unicorn in the garden ] When it comes to filling the IP-Spec of the matrix clause, once more no option is left. There having been taken out of the numeration, movement of the complex NP is all that can be done, turning (44a) into (44b) (= 42a). (44) a.

[i' was [ a rumour [cpi that there was a unicorn ... ] ] in the air ]

b.

[ A rumour [qpi that there was a unicorn ... ] ]; was tj in the air.

The reverse ordering of Move and Merge operations required to derive (42b) is not available under the assumptions m a d e / ' F r a m p t o n offers a cross-linguistic analysis of f/zere-constructions that does not run into the problem just described. Taking up the idea exploited for fifo-insertion above, there is not part of the numeration but can be inserted at stages where this is required as a last resort. 77ze/-e-constructions also originally provided a core argument for the principle of representational economy (FI). The central facts of the construction—(i) verbal agreement is determined by the NP associate of there (45); (ii) there and its NP associate stand in the same relation to each other as links of an A-chain; and (iii) there occupies what used to be called a Case position—can be accounted for by assuming that there is replaced at LF by its associate via A-movement, the associate being in need of Case licensing. 32 (45) a. There is a unicorn in the garden. b. There were some unicorns in the garden. Replacing there would then independently satisfy the principle of Full Interpretation (FI) that requires that interface representations contain no symbols which cannot be assigned an interpretation, i.e. symbols may only transport articulatory-perceptual instructions at PF, and conceptual-intentional instructions at LF. Thus, pleonastic elements, being semantically vacuous (Chomsky 1995b:287), do not qualify as legitimate LF-objects. 33 While the MP incorporates conditions on representations (FI), much of the burden of constraining derivational possibilities lies with specific 'derivational economy' principles, governing applicability and timing of derivational operations (Procrastinate, Shortest Path, Greed, etc.). In this, it diverges from the GB paradigm—research in the 1980's focussed on the formulation of principles governing representations, to constrain

Wilder & Gärtner:

Introduction

23

movement. A natural tendency was to view the model as essentially equivalent to (or as a notational variant of) a purely representational model, with chains—a representational notion—replacing the derivational operation of Move-a (however, as Chomsky has argued, there may be real differences between the two). A representational approach has been argued to be preferred in that it incorporates "radical derivational economy"—derivational operations are minimized, i.e. there are none (cf. HAIDER'S paper; also Brody 1995); but that amounts simply to the claim there is no derivational economy principle at all (given the point about zero output made in section 1). The point is far more that proponents of representational models have to show how the insights are to be captured without appeal to derivational economy. Even among the papers which adopt a derivational syntax, the tendency is to rein in the scope of true derivational economy (transderivational comparison). Thus STERNEFELD argues that the Shortest Path condition should be formulated as a derivational condition (an absolute restriction on movement), rather than a transderivational condition (comparing potential derivations). FRAMPTON'S approach also is "motivated partly by . . . the intuition that the role of optimality in the theory should be minimized". This matches the direction taken in Chomsky (1995b:Ch.4). There, aspects of both 'derivational economy' (e.g. the Minimal Link Condition) and FI (the requirement for successful feature matching) are reanalyzed as conditions on the movement operation itself, such that 'violations' lead directly to termination (cancellation) of the derivation, rather than further computation to convergence, with comparison among convergent derivations limited now to Procrastinate. Considering this trend—what John Frampton (p.c.) has aptly called "the withering away of economy"—one might be drawn to conclude that HAIDER'S remarks are not too far off the mark: If the UG-facilitated choice of a core grammar L is determined by e c o n o m y principles on the derivational complexity, this docs not necessarily mean that the core g r a m m a r ot L contains e c o n o m y principles.

II The Papers The first six contributions are concerned with the role of economy principles in the domain of syntax proper. FRAMPTON takes issue in his paper "Expletive Insertion" with problems of computational complexity that arise from the assumption of economy conditions that compare alternative derivations. The paradigm example is (46)—in Chomsky's analysis, the derivation leading to (46a) blocks the one leading to (46b) (cf. section 6 of part I above): (46) a. There seemed to be someone in the room, b. *There seemed someone to be in the room. His main proposals are (i) that the associate nominal in this construction is not a DP but an NP, so that (46a) and (46b) do not involve identical numerations, hence do not stand

24

Wilder & Gärtner:

Introduction

in a blocking relation; (ii) —returning to an earlier idea—that expletives are not included in the numeration prior to the syntactic derivation, but are inserted in the course of the derivation; and (iii) that Economy conditions include not only Avoid Movement (Chomsky's Procrastinate, i.e. merge < move) but also Last Resort Insertion—which renders expletive insertion only possible if neither merge nor move yield a convergent derivation (i.e. merge < move < insert). Empirical coverage includes the 'dative subject' constructions of Icelandic—which necessitate a broader concept of Case features—and some notoriously difficult 'exceptions' to the defmiteness effect in i/zere-constructions, which are analyzed on a par with Possessor Raising. The paper tracks several important revisions of the MP (Chomsky 1993) that are adopted in Chapter 4 of Chomsky (1995b). A central innovation is the distinction between 'interpretable' ('intrinsic') and 'non-interpretable' ('formal') features. The latter—e.g. Case—retain the property of needing to be checked and eliminated, while the former—e.g. ((¡-features on nomináis, which are clearly able to influence semantic interpretation—are assumed to survive at the LF-interface. Intrinsic features may therefore enter checking relations, but need not do so, and do not delete if they do. Hence, unlike formal features, intrinsic features may enter into more than one checking relation. This entails the abandonment of the strong version of the Greed principle ("move a only to satisfy properties of a") in favor of a weaker version that permits movement of a triggered solely by properties of a head at the target position. The shift in perspective from the properties of a to properties of the target position as determining movement, leads to its reformulation as Attract a. The theoretical treatment of Scrambling phenomena found in German, and in several East Asian languages, forms a central theme in the next four papers. Scrambling poses a challenge for MP in at least three respects: (i) it shows strict locality effects; (ii) it is apparently optional; (iii) it is not obviously morphologically triggered. The status of scrambling with respect to the A/A'-distinction is also controversial. These issues are all addressed in G R E W E N D O R F & SABEL'S paper " ^ - s c r a m b l i n g in the Minimalist framework" (= G&S). Most accounts treat Scrambling (e.g. in German and Japanese) as movement to an adjoined position. Compared with earlier models (e.g. Chomsky 1986a), the role of phrasal adjunction diminishes in the MP, which concentrates on Spec-to-Spec and headmovement. But the account of long movement as iterated short steps, enforced by locality (Minimal Link Condition), is retained. Locality should apply to movement to adjoined positions just as it does to movement to specifiers, permitting iterated local steps. G&S propose a radical alternative, the Constraint on Adjunction'. (47)

a may undergo movement and adjunction only once in a derivation.

The paper explores an empirical domain with special relevance to (47)—the complex interaction of scrambling and w/j-movement in Japanese. Scrambling in Japanese (and Korean—cf. M Ü L L E R ' S and STERNEFELD'S papers) is far more liberal than in German: while scrambling is a finite-clause internal phenomenon in German, phrases can 'scramble' out of finite clauses in Japanese ('long distance scrambling'). Also, whphrases may scramble in Japanese, but not in German. Additional complexities in

25 Japanese include the possibility for a wh-phrase to scramble to a position outside of its scope domain, and the so-called 'anti-superiority effect' in multiple questions. In response to the 'optionality' and 'triggering' questions, G&S develop a distributionally intricate system of X- (= scrambling) and w/z-features, associated with functional heads, which then drive overt scrambling and covert w/2-movement. Further differences are derived by appeal to properties of those features, and of the functional heads which host them. G&S argue that in Japanese, short scrambling is A-movement, while both short scrambling in German and long scrambling in Japanese are instances of A'-movement. The contrast with respect to short scrambling is related to the (non-)availability of 'extra' specifier positions as landing sites for scrambling (exploiting the possibility for multiple specifiers in the system of Chomsky 1995a). Assuming that in German, specifier positions are not available as landing sites for scrambling, scrambled phrases can only adjoin; the strictly local nature of German scrambling then follows from (47). The properties of short scrambling scrambling in Japanese indicate that additional Aspecifier positions are available, while the possibility of long scrambling is due to the availability of extra A'-specifiers, enabling the effects of (47) to be avoided. Fully within the premises of MP, the differences between German and Japanese must be parametric, deriving from properties of functional items: G&S propose that AGR is the element involved. The MP reopens the case for transderivational constraints on syntax; alongside Procrastinate, these include the principles of 'Fewest Steps' and 'Shortest Path' as conditions on syntactic operations. S T E R N E F E L D formulates the notion more precisely as follows: (48)

Global Economy Condition Given two derivations D1 and D2 in the same reference set RS, D1 is preferred over D2 if and only if D1 fares better than D2 with respect to some metrical measure M.

His article "Comparing Reference Sets" tackles the main issue of syntactic globality, namely, how to single out the right competitors, derivations in the case at hand. Concentrating on superiority effects (which motivate the Shortest Path condition), and scrambling phenomena, Sternefeld subjects various formulations of the central notion of reference set to close scrutiny. Careful investigation of the Shortest Path approach to superiority phenomena reveals that neither numerations alone, nor a combination of numeration plus LF-output, furnish a sufficient basis for RSs. Consider (49)-(52): (49) (50) * (51) (52)

Whomj did John persuade tj to visit whom? Whomj did John persuade whom to visit tj ? Who wonders whoj tj bought what? Who wonders whatj who bought tj ?

Competition on the basis of identical numerations alone—although adequate for (49) vs. (50)—would improperly predict (51) to block (52). Adding a requirement for

26

Wilder & Gärtner:

Introduction

identical LF-output to the definition of RSs would correctly place (51) and (52) in different RSs—but would fail on the contrast (49)-(50) in turn. An alternative definition of RSs which invokes identity of (truth-conditional) meaning instead of identity of LF-output is considered and rejected. This would run into problems with such 'stylistic' movements as scrambling to IP in German, or long distance scrambling in Korean—these displacements lack truth-conditional effects, and should thus be blocked by their counterparts (competitor derivations) displaying shorter scrambling. Given that superiority effects are absent (at least) in German (cf. also H A I D E R ' S paper), Sternefeld considers parametrizing the defining property of RSs. Reviewing the facts relating to cyclicity and Fewest Steps, he concludes that even for English, no fully satisfactory definition of RSs emerges. The remainder of the paper is devoted to replacing transderivational constraints by derivational constraints, which enables the notion of RS to be dispensed with entirely. On the assumption that scope-taking elements possess scope indices, superiority can be assimilated to Crossover effects, and the contrast between English and German can be made to fall out directly. Sternefeld concludes that, given the computational complexity of transderivational constraints as well as the doubtful status of the concept of meaning in the treatment of syntactic anomaly, a rival account in terms of purely derivational constraints should be preferred. MuLLER's "Optional Movement and the Interaction of Economy Constraints" develops a approach to 'optional' movement within the MP that permits an account of attested instances (53), while continuing to exclude unattested instances (54): (53) a. b.

Partial w/2-movement in Iraqi Arabic and German Intermediate steps in w/z-movement targeting [-WH]-COMP positions

(54) a. Topicalization of wA-phrases b. Scrambling of w/2-phrases c. Covert w/z-movement from a [+WH]-COMP This is done on the basis of the central definitions (55)-(57): (55)

Reference Set Two derivations D1 and D2 are in the same reference set iff they yield the same LF output and converge at LF and PF.

(56)

Fewest Steps If two derivations D1 and D2 are in the same reference set and D1 involves fewer checking operations than D2, then D1 is to be preferred over D2.

(57)

Last Resort Move raises a to a position 6 only if 13 is a typical checking position for an unchecked morphological feature of a .

Wilder & Gärtner:

Introduction

27

Given certain idealizations concerning intermediate traces in the LF-output and assuming movement to be subject to the Minimal Link Condition (MLC), the following account emerges. (53b) is ruled in, since intermediate steps do not count as checking operations; Movement obeys Last Resort (a weakening of the original version of Greed) since COMP is a typical checking position for w/z-elements. Skipping intermediate COMPs on the other hand does not involve fewer checking operations, while it violates MLC. Although surface facts are somewhat tricky for partial w/z-movement (53a), the account is fully analogous to (53b). Topicalization of a wh-phrase (54a) must be followed by w/z-movement at LF, and requires two checking operations, [Spec,Top] not being a typical checking position of w/z-elements. The same LF-output— modulo intermediate traces—can be arrived at by direct Wz-movement into the [+WH]-COMP, requiring only one checking operation. W/z-topicalization is thus banned. Scrambling of w/z-phrases (54b) is ruled out analogously. The case in (54c) concerns the absence of a wide scope reading for where in (58) (cf. Epstein 1992): (58) (59)

Who wonders where we bought what? Who wonders what we bought where?

The derivation permitting a matrix scope reading of where and an embedded scope reading of what in (59) will block the derivation leading to the same reading for (58), because it involves fewer checking operations (assuming that where in (58) could bear two checkable [WH]-features). Muller's account (54a)-(54c) constitutes a derivational version of the Principle of Unambiguous Binding (Miiller&Sternefeld 1993), which rules out simultaneous binding of traces from different types of positions. The theory is accordingly extended to cases covered by that earlier account, including the ban on 'Super-Raising', 'Chain Interleaving', and long distance Scrambling in German; and the analysis of w/z-island asymmetries. REINHART'S paper "Interface Economy and Markedness" offers an intriguing perspective on the link between word order, stress patterns and Economy. The core claims are (i) the assignment of phrasal stress patterns is regulated by Economy; (ii) alternative word orders can enter into that Economy calculation. Following Cinque (1993), neutral phrasal stress is assumed to be calculated from syntactic structure by algorithm. 'Special' or marked patterns are assigned by 'discourse grammar', overriding the output of grammar, to meet discourse needs that cannot be expressed by the neutral pattern. The choice between 'costfree' neutral stress and 'costly' marked stress is governed by economy. Reinhart is concerned with the empirical content of 'markedness'—how can a pattern be 'marked' when it is neutral in that context for that sentence? Markedness effects may be only indirectly observable: where a language has a word order option that permits neutral stress in a certain context, that option must be used, while the same context may force special stress in a language that does not have that word order pattern. This idea is

Wilder & Gärtner: Introduction explored by contrasting Dutch, which has object-scrambling, with English, which does not. In Dutch, neutral stress falls on the object in 'unscrambled order', but when the object is scrambled (in pre-adverb position), it falls on the verb: (60) a.

dat ik altijd het krant leze that I always the paper read

b.

dat ik het krant altijd leze

Reinhart argues that there is no single determinant of scrambling to be found in properties of the object (contra e.g. de Hoop 1992). Rather, the common determinant (where there is one) is to be found in discourse properties of the clause as a whole. The choice between scrambled and non-scrambled orders is made at PF, where neutral stress determines overlapping but distinct 'appropriate contexts' for each. For the purposes of the economy calculation, both word orders are free; but special (shifted) stress is more costly than neutral stress. In a context that demands the use of a scrambled order in Dutch (60b), English can only use the basic order, and stress shift applies (61a). Crucially, non-scrambled order with stress shift cannot be used in the same context in Dutch (unlike (61a), (61b) permits only contrastive focus on the verb): (61) a. b.

I always read the newpaper dat ik altijd het krant leze

(= 60b) 60b)

In Dutch, the use of the scrambled order with neutral stress blocks the use of the basic order with stress shift, as dictated by economy. The proposal has consequences for the treatment of scrambling: not only is it 'optional', but it also is 'free' as far as syntax is concerned. This is not expected under a movement analysis, assuming that movement is a costly operation. Indeed, Reinhart favours the 'non-movement' view in which 'basic' and 'scrambled' orders merely reflect options for the base-generation of objects. In his paper "Formal and substantive elegance in the minimalist system", URIAGEREKA addresses both external and internal aspects of economy: the research strategy of eliminating redundancy, as well as the exploitation of economy as a principle within the system. Minimalism, he points out, directly invites the suspicion that linguistic phenomena do not directly instantiate principles of the underlying system. The search for deeper explanations involves continual reassessment of principles—including the economy principles of M P — and the generalizations that gave rise to them, as theorems of more abstract principles, such that the former become 'emergent properties' of the system (hence the subtitle: "on the emergence of some linguistic forms"). The paper contains two major proposals, one concerning the LCA, the other concerning Condition B of the Binding theory. First, a model is outlined in which the LCA emerges as a theorem, given reasonable assumptions concerning external necessities ('bare output conditions') and economy. Building on Epstein's (1995) proposal that c-command is a direct reflex of Merger, the key new feature is that spellout—including linearization—is interspersed with merger operations, hence scattered through the derivation ('multiple spell-out'). There are far-reaching implications to be

Wilder & Gärtner: Introduction

29

worked out; it is certainly a natural extension of the strictly derivational approach to the architecture of grammar pursued by Chomsky, Epstein and others. Against this background, Uriagereka approaches two seemingly dissociated aspects of grammar: local obviation effects, and the role of Case in the minimalist system. A m o n g the aspects of grammar that are sensitive to locality, these two are conspicuous. Case is mysterious—the only clear instance of features that are always 'uninterpretable', with solely system-internal function and no role to play at the interface. Condition B stands out as a local 'negative' condition, blocking binding relations for pronouns, which unlike R-expressions, are not generally required to have distinct reference from c-commanding expressions. In Chomsky's (1995b) system, following movement to functional heads, the interpretable features of subject and object(s) end up in one checking domain, Case having been eliminated. When two such arguments are pronouns with the same (interpretable) (¡»-features, the question arises of how the two feature-sets are distinguished at all at the interface. Uriagereka's proposal is that Case has this function: to 'mark' the feature-sets of arguments within a single local domain as distinct from one another. A pronoun's Case will mark it as distinct from other feature sets in its domain. The local obviation property now follows with one additional, reasonable assumption: feature sets formally marked as distinct in the grammar receive distinct interpretations at the interface. H A I D E R ' S paper "Economy in Syntax is Projective" defends a view on syntax which diverges from mainstream models. Taking strings of terminals as its starting point, the task of syntax is to assign hierarchical structures to such strings. Constraints are essentially constraints on representations; including constraints open to interpretation as being economy-based, which are responsible for banning string-vacuous chains, empty projections and non-branching structure. Haider presents arguments against (current versions of) derivational economy principles, based on head movement constructions in English that would have to be taken to arise from optional movement in a derivational model. The effect of optional movement is attributed in the 'projective' model to underspecification of the items involved. Further empirical difficulties for the derivational model are set out in an insightful analysis of superiority phenomena in German and English. Conceptually, projective economy is claimed to best correspond to UG's role as a "grammar co-processor"; this is Haider's own reconstruction of UG, primarily motivated by its being better designed to cope with questions of parameter fixing (cf. Haider 1991). Haider's string-to-structure economy further purports to be a natural extension of language processing, where time can be considered a limited resource. As the following passage makes clear, these proposals are founded on an unorthodox view of the competence-performance distinction, and hence of the domain of linguistic theory: T h e p r o b l e m addressed in the C h o m s k y a n question " W h a t is the structure o f the g r a m m a r ? " is directly connected with the question " H o w is the grammar put to use?". T h e g r a m m a r is to provide optimal data structures lor actual usage. This implies that U G is the system of cognitive routines that guarantee this result, that is, g r a m m a r s that determine optimal data structures for actual usage.

30 The final two papers move beyond the domain of syntax proper, to explore the role of economy principles in morphology and the lexicon. In "Lexical Information from a Minimalist Point of View", BIERW1SCH explores aspects of the tension between the conception of the lexicon as nothing more than a 'repository of idiosyncrasy', on the one hand; and as a 'system of entries' defining the atoms that enter the syntactic derivation, on the other. He concludes that "the assumption that lexical information is subject to economizing principles that properly belong to the Lexical System should be taken as an indispensible ingredient of the minimalist perspective". Both individual lexical entries (LEs) themselves, and the lexical system (LS) as a whole, are shown to be in fact finely structured. This structure is governed by principles that apply to LS (but not to syntax proper); and these principles may be invariant (universal)—e.g. the principles determining hierarchy of argument positions; or emerge in language-specific variants—e.g. the rules determining the Case-feature in the LE of a Case-assigning item. It is argued that "the notational convention of of disjunctive ordering, independently motivated for phonology" should be "extended to lexical information in other domains, where they also seem to apply naturally". On this basis, the effects of economizing principles (underspecification, elsewhere) can be properly exploited in accounting for the actual patterning that occurs. Just as the phonological information stored in a LE is underspecified with respect to the information associated with that entry at later stages of the derivation and at the A-P interface, so is the semantic information in a LE underspecified with respect to the information necessary to compute its contrbution to interpretation at the C-I interface. This is illustrated with subtle but general interpretive properties of change-of-state verbs and their 'unaccusative' cognates. The use of elsewhere is illustrated a.o. with Case-assignment by German prepositions. By exploiting default rules, Case-features need not be specified in lexical entries, except where a lexical specification is necessary to override the default. As in the case of affixation, closer examination of the interaction of subregularities with the Elsewhere condition reveals non-trivial effects. While the default Case for P is Dative, the accusative assigned by directional variants of prepositions such as auf (on, onto) must be analyzed as following from a more specific default rule. This in turn means that the Dative assigned by directional prepositions such as zu (to) must be marked (lexically specified), despite the existence of the Dative default. WlJNDERLlCH's "A Minimalist Model of Inflectional Morphology" presents a general framework for accounting for the distribution and properties of word-forms in paradigms. Minimalist Morphology (MM) is based on (i) a maximally simple rule of combination; (ii) specific filters that are plausibly grounded in other systems; (iii) general principles of economy governing the selection of optimal word forms. The sources for word-forms are sets of stems and affixes, both maximally underspecified. Free combination of affixes and stems (i) represents the minimal assumption about the 'form-generation' component. Independent principles (ii) that determine the general shape of legitimate output forms include the Affix Order constraint (forbidding e.g. the attachment of an Aspect-affix to a V-stem already bearing a Tense affix), imposed by external, semantic factors. The potentially massive overgeneration that is the price of assuming free combination is further constrained by a

Wilder & Gärtner: Introduction

31

few simple, natural economy principles (iii) (cf. the strategy for curbing move-a in MP). The key mechanism is the paradigm: only those forms that are legitimated in a paradigm will enter into the syntax (hence this is a lexicalist approach—word forms are determined prior to syntax). The dimensions of a paradigm are defined by the inherent features of the most specific candidate form; what paradigm slot(s) a given form eventually occupies is determined by a set of interacting economy principles. These set up a competition among the candidate forms generated to fill the slots opened in a paradigm, by ranking the forms via the specificity of their inherent feature specifications, and inserting them accordingly. The model is not only economical in the assumptions it makes (e.g. no rules other than free combination), it also relies crucially on both derivational and representational economy conditions, based on specificity, simplicity, number of operations. One of its attractive features is that it dispenses with a range of more powerful rules / mechanisms assumed by other authors to handle word-form selection. To the extent that it can stand up to the empirical challenge, MM thus seems to gain an intrinsic edge over competing models. It is with this in mind that the model is tested in case studies on two challenging verb inflection systems—Georgian and Potawatomi.

Notes 1

2

3

4

5

6

Chomsky (1993) was circulated as Chomsky (1992), and reprinted as chapter 3 of Chomsky (1995b). We treat Chapter 4 of Chomsky (1995b) as the 'definitive' version of the 'second phase'. It is important to be aware that, although much of the "Bare Phrase Structure" paper (first circulated as Chomsky 1994) was incorporated into "Categories and Transformations", some proposals from the former were abandoned in the latter. Cf. Chomsky's (1995b:8-9) 'internal' and 'external' notions of 'simplicity', the former having "resurfaced [in the minimalist program] in the form of economy considerations that select among derivations"; the latter "operative as always . . .". Prime examples are parsing and pragmatics. The central role played by 'Minimal Attachment' and similar principles in theories of sentence processing is reviewed in Frazier & Clifton (1996:Ch.l). In pragmatics, see especially Sperber and Wilson's Relevance Theory of communication and cognition, according to which "human cognitive processes [including utterance interpretation] . . . are geared to achieving the greatest possible cognitive effect for the smallest possible processing effort" (Sperber & Wilson 1986:vii). Economy is now writ large in phonology in the guise of Optimality Theory, following Prince & Smolensky (1993). Work is also being done on OT-approaches to syntax, e.g. Grimshaw (1995). Since phonology is not a topic of this volume, and none of the papers discuss optimality-theoretic approaches to syntax, OT is not discussed here. Cf. discussion in Shoemaker (1991). There are terminological traps to be aware of. The use of terms like 'best' / 'optimal' in their everyday (nontechnical) sense amounts to an interpretation of the ranking in terms of X, imposed from the outside. The (quantitative) ranking itself must not be confused with how it may be qualitatively evaluated. In similar vein, there is no question that the description of a system in terms of economy as discussed here involves attribution of 'conscious intentions' and the like, either to that system or to the 'competing' alternatives it generates—rather, such a description represents an analysis at a purely mechanistic level, e.g. an abstract characterization of some algorithm. The case where 'no output' can be legitimately selected by an economy principle is a different one: 'no output' must be among a set of alternatives which the system makes available for selection, i.e. the

32

7

8

9

10 11

12

13 14

Wilder & Gärtner:

Introduction

system is not S'. Moreover, 'no output' represents an option for achieving the goal at hand, rather than failure with respect to that goal. Cf section 6 below. A caveat is in order where elegance comes into the picture. It is by no means clear that elegance and economy considerations lead to the same results. The topic of simplicity in scientific theories goes beyond the scope of this introduction—an early serious treatment can be found in Goodman (1977). On blocking effects in derivational morphology, see Aronoff (1976:42-45); for example, while the affix -ity applies to adjectives in -ous to derive abstract nomináis (curious - curiosity, porous - porosity), the existence of a corresponding unaffixed N-form blocks -//v-affixation to the N+ou.v adjective. Thus, glory blocks affixation of -ity to glorious (*gloriosity). Kiparsky (1982) applied the Elsewhere condition, originally deployed to impose disjunctive ordering on phonological rules, to do the same with derivational and inflectional morphological processes, thereby deriving blocking effects in morphology. This illustration simplifies Kiparsky's (1982) formulation, while preserving the key intuition. Incorporating ablaut and other cases requires more complexity; cf. Halle & Marantz (1993) for comprehensive treatment of English past-formation in this spirit. An account of exceptional 'subject control' with promise consistent with the Minimal Distance Principle is found in Larson (1990). Leading instances in the 1980's are also the immediate precursors to the MP: (i) the Last Resort condition for Case-driven NP-movement, discussed in Chomsky (1986b:137), and extended in Chomsky (1991); and (ii) the minimality condition on government of Chomsky (1986b), which developed into the theory of Relativized Minimality (Rizzi 1990). This aspect of the tensiond presupposes simplicity as a criterion for explanatory adequacy. There is another aspect of the same tension, induced on the one hand by the facts of language variation, pulling us towards a diversity of grammars; and on the other, by the facts of language acquisition, leading us to expect a single grammar. These two aspects are strictly speaking logically independent, though progress on each seems to lead in the same direction (cf. Chomsky 1995b:4-5). Koster (1987) had already forcefully argued that among the three genuinely syntactic strata one would be sufficient to handle the data—S-structure on his proposal. There are of course many complications; for early discussion, cf. Lasnik & Saito (1984). These issues a r e t a k e n u p in t h i s v o l u m e by STERNEFELD, GREWENDORF & SABEL, a n d MOLLER.

15 Interface well-formedness is Full Interpretation (FI) (see below). The idea is that the morphosyntactic features that need to be matched (or assigned) are introduced into derivations as parts of lexical items, but are not interpretable at the LF interface, hence not tolerated there. Strong features are additionally not tolerated at PF. The matching requirements (8), recast as checking theory (matching and deletion of features, governed by X'-theoretic locality) still serve to 'motivate' movement; movement and checking are now interpreted as the means used by the computation to remove disturbing features. The checking theory of Chomsky (1993) is overhauled and refined in Chomsky (1995b:Ch.4). 16 The cost is extension of the 'lookahead' requirement: a successful computation must determine whether well-formedness at a non-syntactic level (PF) necessitates application of an operation. 'Strength' is reinterpreted in Chomsky (1995b:233) as affecting not PF but the compuational operations themselves—the introduction of a strong feature from the lexicon into a derivation requires immediate satisfaction (checking). See URIAGF.REKA for some speculations on the nature o f ' s t r e n g t h ' . 17 The role of the pleonastic subject in such paradigms is discussed in section 6 below; this also forms the topic of FRAMPTON'S paper. 18 Note the conceptual closeness to the architecture of categorial grammar where lexical categories (types) largely determine syntactic combination. 19 Given that the 'storage capacity' of the human brain is vastly underoccupied, there is no a priori reason to expect that minimization of storage space is an externally dictated necessity. 20 Of course, different approaches to phrase structure exist within the generative paradigm. See Baltin & Kroch (1989) and Brody (1995b). 21 This process is characterized as "an operation, call it Satisfy, which selects an array of items from the lexicon and presents it in a format satisfying the conditions of X-bar theory." (Chomsky 1993)

Wilder & Gärtner:

Introduction

22 Project can also be considered a relation holding between X and X° in a, X°—being [ x ° X ] strictly speaking—between X° and X' in (20b), and between X' and XP in (20c). Additionally, this relation is transitively closed. 23 This is an informal translation indeed. The 'ontology' of minimalist syntax has yet to be spelled out in a transparent way. For a promising attempt see Wartena (1994). 24 For an interesting discussion of these difficult and controversial matters see Stabler (1983). 25 (22) is adapted from Kayne's originally more elegant formulation: (i) LCA

26

27 28 29

30 31

d(A) is a linear ordering of T (Kayne 1994:6)

A is the set of ordered pairs of nodes, such that x asymmetrically c-commands y. T is the set of terminals. Incidentally, there is considerable similarity to the compositionality principles advocated in Montague semantics and categorial grammar. Thus, the sisterhood relation is the canonical configuration for function application. See Safir (1986) for similar considerations and Brody (1995) for a critical review of the reconstruction facts. Recall that X°-movement adjunction also constitutes an exception to Epstein's (1995) theory. Globality here concerns transderivational conditions. Lakoff (1970) deals with derivational conditions only. The latter type of globality has since been incorporated into syntax by means of trace theory, feature percolation, and checking mechanisms, as well as conditions on representations. See Grewcndorf (1990) and Lasnik (1994) for detailed discussion, including mention of some recalcitrant problems with this analysis. See Lasnik (1995) for some discussion of the same paradigm and an alternative analysis of thereconstructions.

32 The analysis of ECM contexts in this regard is non-trivial, (i) I believe there to be a unicorn in the garden. 33 Cf the general discussion of the interpretability of features by Chomsky (1995b:276ff). The analysis of /Aere-constructions has seen a lot of technical refinement, far too detailed and controversial to be covered here. Again, we refer the reader to FRAMPTON , Lasnik (1995) and Chomsky (1995b, sections 4.5.3, 4.9, and 4.10.3). The tendency, however, seems to be to dissociate the properties of there and its NP associate, such that the role of FI diminishes. Note, incidentally, that applying the logic of thereconstructions to cfo-insertion would require the English main verb to replace do at LF for reasons of Fl. In structures like (i) (i) John does not write books The crossing of NEG° by V° at LF would, however, result in exactly the kind of ECP violation that the entire analysis was supposed to avoid.

References Ackema, Peter, Ad Neeleman and Fred Weerman. 1993. Deriving Functional Projections. OTS Working Papers. TL-93-00I. Alexiadou, Artemis. 1994. Issues in the Syntax of Adverbs. Doctoral dissertation. University of Potsdam. Aronoff, Mark. 1976. Word Formation in Generative Grammar. Cambridge, Mass.: MIT Press. Baltin, Mark and Anthony Kroch eds. 1989. Aternative Conceptions of Phrase Structure. Chicago: Chicago University Press.

34 Baker, Mark. 1988. Incorporation: A Theory of Grammatical Function Changing. Chicago: Chicago University Press. Brody, Michael. 1995a. Lexico-Logical Form. Cambridge, Mass.: MIT Press. Brody, Michael. 1995b. Towards Perfect Syntax. Working Papers in the Theory of Grammar, Vol. 2, No.4. Budapest. Chomsky, Noam. 1973. Conditions on Transformations. In: S. Anderson & P. Kiparsky (eds.) A Festschrift for Morris Halle. Holt, Reinhart & Winston: New York. 236-286. Chomsky, Noam. 1981 .Lectures on Government and Binding. Foris: Dordrecht. Chomsky, Noam. 1986a. Barriers. Cambridge, Mass.: MIT Press. Chomsky, Noam. 1986b. Knowledge of Language: Its Nature, Origins and Use. New York: Praeger Chomsky, Noam. 1991. Some notes on economy of derivation and representation. In Principles and parameters in comparative grammar, ed. R. Freidin, 417-454. Cambridge, Mass.: MIT Press. Papers in Chomsky, Noam. 1992. A Minimalist Program for Linguistic Theory. MIT Occasional Linguistic Theory, 1. (= Chomsky 1993) Chomsky, Noam 1993. A Minimalist Program for Linguistic Theory. In Hale and Keyser, ed. 1-51. Chomsky, Noam. 1994. Bare Phrase Structure. MIT Occasional Papers in Linguistic Theory, 5. (= Chomsky 1995a) Chomsky, Noam. 1995a. Bare Phrase Structure. In Government-Binding Theory and the Minimalist Program, ed G. Webelhuth, 383-439. Oxford: Blackwell. Chomsky, Noam. 1995b. The Minimalist Program. Cambridge, Mass.: MIT Press. Chomsky, Noam. 1995c. Language and Nature. Mind 104:1-61. Epstein, Samuel D. 1992. Derivational Constraints on A'-chain Formation. Linguistic Inquiry 23: 235-259. Epstein, Samuel D. 1995. Un-Principled Syntax and the Derivation of Syntactic Relations. Ms., Harvard University. Frank, Robert and Anthony Kroch. 1995. Generalized Transformations and the Theory of Grammar. Studio Linguistica 49:103-151. Frazier, Lynn & Charles Clifton. 1996. Construal. Cambridge, Mass.: MIT Press. Goodman, Nelson. 1977. The Structure of Appearance. Dordrecht: Reidel. Grewendorf, Günther. 1990.Verbbewegung und Negation im Deutschen. Groninger Arbeiten zur Germanistischen Linguistik 30:57-125. Grewendorf, Günther. 1995. Sprache als Organ - Sprache als Lehensform. Frankfurt/Main: Suhrkamp. Grimshaw, Jane 1995. Projection, Heads, and Optimality. ms. Rutgers University. Haider, Hubert. 1991. Die menschliche Sprachfahigkeit - exaptiv und kognitiv opak. Kognitionswissenschaft 2:11-26. Hale, Kenneth and Samuel Keyser, ed. 1993 The View From Building 20: Essays in Linguistics in Honor ofSylvain Bromherger. Cambridge, Mass.: MIT Press Halle, Morris and Alec Marantz, 1993. Distributed Morphology and the Pieces of Inflection. In Hale and Keyser, ed., 116-176. Hoop, Helen de. 1992 Case Configuration and NP Interpretation. PhD dissertation, Groningen. Huang, C.-T. James. 1982. Logical Relations in Chinese and the Theory of Grammar. PhD., MIT. Kayne, Richard. 1994. The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Kiparsky, Paul. 1982. Lexical morphology and phonology. In Linguistics in the Morning Calm, ed. the Linguistic Society of Korea, 3-91. Seoul: Hanshin. Kitahara, Hisatsugu. 1995. Target a : Deducing Strict Cyclicity from Derivational Economy. Linguistic Inquiry 26:47-77. Köster, Jan. 1987. Domains and Dynasties. Dordrecht: Foris. Lakoff, George. 1970. Global Rules. Language 46:627-639. Larson, Richard. 1991. Promise and the Theory of Control. Linguistic Inquiry 22: 103-139 Lasnik, Howard. 1994. Verbal Morphology: 'Syntactic Structures' Meets the Minimalist Program. Ms., University of Connecticut, Storrs. Lasnik, Howard. 1995. Case and Expletives Revisited. Linguistic Inquiry 26:615-633.

35 Lasnik Howard & Mamoru Saito. 1984. On the Nature of Proper Government. Linguistic Inquiry 15. 235289. Lasnik, Howard & Juan Uriagereka. 1988. A Course in GB Syntax. Cambridge, Mass.: MIT Press. Lebeaux, David. 1988. Language Acquisition and the Form of Grammar. Doctoral dissertation, University of Massachusetts, Amherst. May, Robert. 1985. Logical Form: its Structure and Derivation. Cambridge, Mass.: MIT Press. Millier, Gereon & Wolfgang Sternefeld 1993 Improper Movement and Unambiguous Binding. LinguisticInquiry 24:461-507. Pesetsky, David. 1982. Paths and Categories. PhD., MIT. Prince, Alan and Paul Smolensky. 1993. Optimality Theory: Constraint interaction in generative grammar. Technical Report of the Rutgers Center for Cognitive Science, Rutgers University. Rizzi, Luigi. 1990. RelativizedMiminality. Cambridge, Mass.: MIT Press. Rizzi, Luigi. 1991. Residual Verb Second and the Wh-Criterion. Technical Reports in Formal and Computational Linguistics 2., Université de Genève. Rosenbaum, Peter. 1967. The Grammar of English Predicate Complement Constructions. Cambridge, Mass.: MIT Press. Safir, Ken. 1986. Relative Clauses in a Theory of Binding and Levels. Linguistic Inquiry 17.4:663-689. Schoemaker, Paul. 1991. The Quest for Optimality: A Positive Heuristic of Science?. Behavioral and the Brain Sciences 14:205-245. and Cognition. Oxford: Basil Sperber, Dan & Deirdre Wilson. 1986. Relevance: Communication Blackwell. Stabler, Edward. 1983. How are Grammars Represented? Behavioral and the Brain Sciences 6:391-421. Wartena, Christian. 1994. Spelling Out the Minimalist Program. Doctoral dissertation, University of Nijmegen. Wilder, Chris & Damir Cavar. 1994. Word Order Variation, Verb Movement and Economy Principles. Studia Linguistica 48:46-86. Zwart, Jan-Wouter. 1993. Dutch Syntax: A Minimalist Approach. Doctoral dissertation, University of Groningen.

Expletive Insertion

John Frampton The goal of this paper is to advance the Minimalist Program (Chomsky 1993, 1994). The method is standard, the examination of weaknesses in the theory and the attempt to modify the theory to strengthen its conceptual foundations and improve its empirical adequacy. The particular problem which is the starting point is the contradiction between the assumption that phrases undergo movement in order to satisfy their own ' n e e d s ' and the apparent fact of movement which is driven by clausal Extended Projection Principle (EPP) requirements. Since expletives appear to be 'syntactic inventions' designed to satisfy the EPP requirement, a rethinking of the EPP leads naturally to a rethinking of expletive constructions. The most interesting theoretical innovation is an adaptation to the framework of the Minimalist Program of the old idea that expletives are inserted derivationally.

1

The Framework

For the most part, I assume the framework developed in Chomsky (1994). I will briefly review the general perspective, highlight some particularly relevant points, present some of Chomsky's recent modifications to the earlier theory (class lectures, Fall 1994), and make some modifications myself. The review is not comprehensive.

1.1

Well-formedness: Representations, Operations, and Derivations

Structures of increasing levels of complexity are recognized: lexical items, representations, operations (which can be viewed as ordered pairs of representations), and derivations. Well-formedness conditions are imposed at each level, and only wellformed structures are used to build structures at higher levels. The set of well-formed derivations (wfds) is therefore highly constrained. The well-formedness conditions on representations include assumptions about headedness, binary branching, maximal projections, etc. Crucially, representations may consist of a number of disconnected components. This allows various parts of a syntactic structure to be built independently. A complex DP, for example, can be fully assembled before it enters into syntactic relations with another component. During the course of a derivation, certain operations (merger operations) amalgamate separate components into more complex components, thereby reducing the number of components in the representation.

Frampton:

Expletive

Insertion

37

Two types of operations are defined, merger operations and movement operations. Merger operations transform a representation by taking two of its components and combining them into a single component. Movement operations transform a representation by acting on a single component of the representation. I will assume that there are no separate feature checking operations and that feature checking takes place automatically as an aspect of the operation which establishes the locality necessary for checking. The crucial well-formedness conditions on operations are the wellformedness conditions on movement operations. I will return to this shortly. A derivation is a sequence of well-formed representations such that successive representations are related by well-formed operations. A derivation is called wellformed if, additionally, it meets certain 'interface conditions' with the lexicon, with the conceptual-intentional system, and with the phonological system. The interface condition with the lexicon is a condition on the initial representation, which is assumed to be the only point of contact between a derivation and the lexicon. Specifically, the initial representation is assumed to be a simple list of lexical items. This is a wellformed representation whose components are lexical. The interface condition with the conceptual-intentional system is a condition on the terminal representation of the derivation, which is assumed to be the only point of contact between a derivation and this system. I will call the terminal representation the 'LF representation'. I will return below to the content of this interface condition. Chomsky uses the term 'convergent derivation' for what I call a well-formed derivation. I use the less colorful 'well-formed' because 1 want to stress that the wellformedness conditions are recursive in an important sense. Much of the constraint on derivations stems from well-formedness conditions on operations, and much of the constraint on operations stems from well-formedness conditions on representations.

1.2

The Role of Optimality

All of these constraints on the possible well-formed derivations (wfds) are prior to any consideration of optimization. In order to precisely specify the role of optimization, it is helpful to define the notion of a well-formed continuation (wfc) of a derivation. Suppose (cq, 0C2,..., a n ) is a derivation. Then the representation a is called a wfc of the derivation if there are representations Bj such that ( a j , ..., a n , a , 6], ..., B m ) is a wfd. It certainly may be the case that a derivation has no wfc. That is, it may be that a derivation cannot be continued to a wfd. It may also be the case that a derivation has multiple wfes. Optimization principles only come into play in choosing between multiple wfes. Operations are of two types, merger and movement. At this point, I assume a single optimality principle, Avoid Movement. (This will be revised later.) A wfc via movement is optimal only if movement cannot be avoided. That is, if there is no wfc via merger. A wfc via merger is always optimal. A wfd is called optimal if all of its representations, apart from the initial representation, are optimal well-formed continuations of the partial derivation up to that point. The major impact of imposing Avoid Movement is to prevent movement that is driven by weak features from taking place until after all merger and movement driven by strong features has taken place. The distinction between weak and strong features is

38

Frampton: Expletive

Insertion

discussed below. It is striking that there are few examples which have a different form. Chomsky's (1994) analysis of the contrast below is a notable exception. (1) a.

There was believed t to be someone in the room,

b. *There was believed someone to be t in the room. Consider an intermediate representation of the derivation corresponding to (la): (2)

{[to be someone in the room], there, was, believed, Infl}

There are two wfcs: 1 (3) a. b.

{[someone [to be i i n the room]], there, was, believed, Infl} {[there [to be someone in the room]], was, believed, Infl}

The continuation (3a) is by movement, while the continuation (3b) is by merger. Optimality chooses the desired continuation. This example plays an important role as a paradigm example in the theory. In large part, the view of optimal derivation which was outlined above is designed around this paradigm. I note this now, because a later section of this paper will propose an analysis of the contrast in (1) which does not rely on optimality considerations.

1.3

Formal Features

In his early work on the Minimalist Program, Chomsky assumed that none of the features which enter into syntactic feature checking can be present in the LF representation. Chomsky's recent proposal that there is a split in the feature system is a very important modification of the assumption. Some features, ((¡-features are a paradigm example, not only enter into the syntactic computation, but persist at LF because they are relevant to the semantic computation as well. Other features have relevance only to the syntactic computation and should not be present in the representation which is the input to the semantic computation. Case features are a paradigm example of features of this kind. It contributes nothing to the semantics, for example, to know that a particular DP has nominative Case. Nominative Case, in effect, is not something that semantics knows anything about. We can sum the point of view up by saying that ' L F must be written in the language of semantics'. Features which are relevant only to the syntactic computation are called formal features. It is an interface condition on derivations that the LF representation contain no formal features. Chomsky assumes further that formal features must delete in checking operations, whereas non-formal features (which I will call intrinsic features) never delete. The intuitive basis for this is the idea that formal features should not be present in the LF representation. Unfortunately, the connection between the intuitive basis and the assumption is somewhat tenuous. In the first place, there is no reason to assume that formal features are utilized by the phonological computation either. The task of

Frampton: Expletive Insertion

39

stripping away unneeded features in the interface with that computational system is assigned to the interface itself, not the syntactic computation. In the second place, even if we grant that formal features must be stripped away by the syntactic computation, there are no grounds for assuming that formal features obligatorily delete in feature checking operations. If a feature does not delete, it can potentially enter a second checking operation. The LF condition would only require that a formal feature be removed in the last such checking relation it enters. In spite of these conceptual problems, I will adopt Chomsky's proposal. The requirement that formal features always delete not only appears to fit the facts, but does retain some measure of intuitive appeal. This understanding of the feature checking mechanism will play an important role in what follows. It will, for example, allow categorial features (i.e. category labels) to enter into checking relations. It will also allow ^-features to persist through checking operations. Furthermore, the assumption that formal features always delete in checking constrains the kinds of checking operations which we can expect to find. Of course, constraint is only welcome if it can accomodate the facts. I will assume that it does and attempt to show that this assumption is justified. There is a further partition of the feature system, the partition of the set of formal features into strong features and weak features. The distinguishing property of strong features is that they cannot be embedded. More elegant characterizations are possible, but the following is simple and will suffice for this paper. Maximal projections bearing strong features are not permitted except as components of multicomponent representations. This is a condition on representations. For example, an inflectional morpheme bearing a strong feature can project by merging with a complement phrase without satisfying this feature, but the features must be satisfied before the maximal projection is closed off."

1,4

Movement

A second important change from earlier work which Chomsky has recently made is to disentangle the notion of a well-formed movement operation from considerations of economy of derivation. The interaction of considerations of economy of movement with other optimality considerations led to serious problems of computational complexity. The core idea of the change is that well-formedness conditions on movement are independent of optimality considerations for derivations. Whatever the well-formedness conditions on movement are, they simply help to specify the class of wfds. This view of the relation between movement and economy is implicit in the outline above. The optimality conditions (only Avoid Movement above) choose between sets of wfds which agree up to a certain point. An instance of movement moves a phrase into the checking domain of a head and carries out some feature checking computation. It is useful to shift the perspective from the phrase which moves to the head which is the target of movement. Rather than the operation Move (a), consider the operation Attract (to X). Candidates for attraction to X are phrases c-commanded by X which can enter into a checking relation with X and satisfy (i.e. erase) one of X's formal features. Note that a head which has no formal features is inert, in the sense that it cannot be a locus of attraction. Attract does not 'accept' all candidates. A candidate for attraction to X is not subject to Attract if there is

40

Frampton:

Expletive

Insertion

a closer candidate for attraction to X. 3 I omit the details, a discussion of 'equidistance' in particular, needed to make the notion 'closer candidate' precise. The essence is that if a and 8 are two candidates for attraction to X, then a is closer than 8 if a c-commands 8 and a and 8 are not equidistant from X. This is more than sufficient for what is needed in what follows.

1.5

Computational Complexity

Transformational grammar has had an ambivalent attitude towards the computational complexities of its models. On the one hand, Chomsky realized early on that insisting on a theory which modeled mental computation down to the level at which computations are carried out would make it impossible to construct any theory at all. Correctly, Chomsky drew a distinction between competence and performance, insisting on deciding questions of competence before questions of performance. Nevertheless, the question of the extent to which the computation involved in competence theories modeled actual mental computation has always been in the background. W e are certainly not at the point where we can pretend to be modeling mental computations at the level of algorithm. But feature-driven syntax may offer the possibility of taking some steps in this direction. This requires taking seriously questions of computational complexity. These questions become quite serious in a model which incorporates optimality computation. Unless the range of alternatives is severely restricted, serious complexity problems arise. It may turn out to be premature to be concerned with computational complexity. Nevertheless, this paper is motivated partly by such concerns. Specifically, by the intuition that the role of optimality in the theory should be minimized. 4

2

The EPP and the Principle of Greed

Within Government-Binding theory, A-movement was taken to be Case driven. Consider f o r example: (4)

* It seems John to be believed t is happy.

How is this ruled out? John, which is Case-marked in the deepest clause, cannot raise to the intermediate clause to satisfy the EPP requirement of that clause. Chomsky (1986, for example) attempted to capture this fact by imposing the Chain Condition on Achains. It specified (among other things) that the head of an A-chain must be in a Caseposition, and that this is the unique Case-position in the chain. This aspect of the Chain Condition directly expresses the idea that A-chains are formed in order to move elements into positions where they can receive Case. Clearly, the Chain Condition rules out (4) since John is not in a Case position. In an attempt to achieve a less stipulative explanation of this phenomenon, Chomsky took a different approach in the Minimalist Program. The basic idea that A-chains are Case driven is retained, but it receives a different expression. Two interrelated proposals are made. First, that Case is represented on Case-assigners and recipients by syntactic

Frampton: Expletive

Insertion

41

features and that Case-marking should be viewed as an agreement phenomenon, with Case-marking realized as the erasure of corresponding features under suitable locality conditions. Secondly, that phrases move only to satisfy their own features, the Principle of Greed. This provides an account of (4). The phrase John has its unsatisfied features erased in the deepest clause and is left with no unsatisfied features. It is therefore prevented by the Principle of Greed from further movement. It has proved difficult, however, to give a satisfactory account of the EPP in this framework. If the EPP is an effect of movement driven by the requirement of feature satisfaction, it is hard to see how it can be a feature of the moved element (as demanded by the Principle of Greed) which requires satisfaction. First, there are alternations which appear to be driven by a feature of the target (lnfl, in this case), not the phrase which moves. Consider the much studied DativeNominative alternation in Icelandic. (Inflections are indicated in parentheses, with nominative and dative Case glossed as N and D.) (5) a.

Olafi var gefin bokin. 01af(D) was given book-the(N)

b.

Bokin var gefin Olafi. book-the(N) was given 01af(D)

It has been established that the dative phrase in (5a) is in [Spec,lnfl]. Apparently, two possibilities are open. The indirect object can receive inherent dative Case directly from the verb and move to [Spec,lnfl], satisfying the clausal EPP requirement, with the direct object receiving nominative Case covertly, presumably by covert adjunction to lnfl. Alternatively, the indirect object can remain in place, with the direct object moving overtly to [Spec,lnfl] where it both satisfies the clausal EPP requirement and receives nominative Case. Second, unlike Case satisfaction, a phrase is able to satisfy the EPP requirements of multiple clauses. (6)

Jack was believed t to be likely t to appear t to be guilty.

If an unsatisfied feature of the moved element is driving the movement, we conclude that this feature is not satisfied until the final step. In early Minimalist Theory, there was the notion of an unfilled specifier position. In that framework, the sequence of steps could be viewed as one multi-step operation, with that multi-step movement operation driven by the requirements of feature satisfaction. The development of the theory in Chomsky (1994), however, leaves no place for positions independent of the phrases that fill them. Movement, by nature, is one step at a time. In this framework, which is justified on other grounds, it is impossible to justify each movement step in (6) on the grounds of the Principle of Greed. Let us therefore discard the Principle of Greed. Before we consider the problem of accounting for examples like (4) without the Principle of Greed, let us first consider what we need to account for the EPP. Chomsky (class lecture, 1994) proposed that lnfl has a strong formal feature that simply checks for a D category feature. Since a D

42 category feature is intrinsic, and therefore is not erased when checked, this would explain why a phrase can satisfy the EPP requirement of multiple clauses. Let us adopt this proposal. The strong feature of Infl will simply be called the EPP feature. The problem now is to find a way to exclude (4) without excluding (6). It is tempting to rule out (4) by using Chomsky's idea that Avoid Movement forces early expletive merger, used earlier to rule out (lb). The derivation underlying (4) competes with the alternative: (7)

It seems t to be believed John is happy.

The possibility of merger with it makes the choice of raising John non-optimal. The derivation underlying (4) is well-formed, but non-optimal. This line can be pursued to provide an account of most Chain Condition effects without appeal to the Principle of Greed. If we take the problem of computational complexity seriously, however, there is reason to be skeptical of pursuing this line. The optimality computation which is called for is complex, requiring significant lookahead. It would significantly simplify the computation needed to rule out (4) if it could be decided in a simple manner at the point in the derivation where the component (8) is first built that the derivation is destined to be ill-formed, regardless of what other material in the representation is available for merger. (8)

[John to be believed [t is happy]]

The Chain Condition had this virtue, as did the Principle of Greed. What we need is a way to rule out a derivation with the component (9a) without ruling out a derivation with the component (9b). (9) a.

[John to be believed [/ is happy]]

b. [John to be believed [t to be happy]] Note that all of the formal features of John are satisfied in the embedded clause in (9a). Suppose that a phrase becomes inactive when its formal features are satisfied, in the sense that only phrases with formal features are subject to attraction (movement). 5 This preserves something of the effect of the Principle of Greed, but is much weaker. While only phrases with formal features can be attracted, there is no requirement that a formal feature of the moved phrase must actually be satisfied in the operation. There is an immediate empirical concern. We have been focusing exclusively on the formal features which are associated with structural Case: the structural Case features themselves, the agreement features, and (I assume) the EPP feature. If we consider the interaction of movement to satisfy features in this system with scopal movement, the distinction between active and inactive phrases does not yield the desired result. We want to make sure that the following is correctly ruled out: (10)

* Who do you think [/ to be likely [t is guilty]]?

Frampton:

Expletive

Insertion

43

Consider the derivation at the point where the following component has been built: (11)

[to(Infl) be likely [who is guilty]]

From the standpoint of movement to Infl to satisfy its EPP feature, who must be inactive. Otherwise it would be subject to attraction by Infl and we would end up judging (10) as well-formed. From the standpoint of w/z-movement, however, who must be active. Otherwise, it could not be attracted to whatever head drives w/z-movement. What emerges from these considerations is something like the old A/A' distinction. 1 will assume that some notion of feature type along these lines is real and that there are various types of formal features, Case-related features and scope-related features at least. Further, that attraction to a head is possible only if the attracting head and the phrase which is attracted have formal features of the same type. This solves the empirical problem. In (11), Infl has only Case-related features and who has only scoperelated features. I will adopt the following well-formedness condition on movement (attraction): (12)

Attract to X A phrase is a candidate for attraction to a head X if it has a feature which could potentially satisfy a formal feature of X under movement of the phrase to the checking domain of X. The corresponding movement operation is taken to be well-formed if: 1. there is no closer candidate; and 2. the candidate and the attracting head have formal features of the same type.

The full range of Chain Condition effects are captured by (12). A DP cannot raise to satisfy the EPP feature of a higher clause unless it is 'Case active' (i.e. has a Caserelated feature). Note that there is a conceptual subtlety in the formulation in (12). It is illustrated by the following example: (13)

* Jack was believed it is likely t to be happy.

The intervening it blocks movement of John to the matrix subject position since it has a ((¡-feature which will satisfy a formal feature of the attracting head, Infl. Consider the relevant intermediate component: (14)

[was(Infl) believed [it is likely [Jack to be happy]]]

When Infl looks for something to attract, it does not look past it because the ((»-features of it satisfy one of Infl's formal features, its agreement features. On the other hand, if movement to an attractor is limited to those phrases with formal features, Attract does not even consider it as a candidate for movement since all of the original formal features of it have been satisfied (i.e. erased). There is therefore some contradiction with the idea that 'long-distance' attraction is blocked if there is something closer to attract. In this case, long-distance attraction is blocked by an

Frampton: Expletive

Insertion

element which is formally a candidate for attraction, but which is actually barred from being attracted because it does not have a formal feature of the same type as the attractor.

3

Expletives

If the EPP is a manifestation of the requirement that Infl is matched with a D, then it is natural to assume that expletive there, in English, is simply an expletive determiner. An expletive determiner is the minimal element required to satisfy the EPP. How is the relation between the expletive and its associate established? In the Scandinavian languages (and many others ), overt N raising to D is obligatory in some configurations. The following paradigm, from Icelandic, is typical: 6 (15) a.

hinn sterki hestur the strong horse

b. *hinn hestur the horse c.

hesturinn horse the

Longobardi (1994) discusses the internal structure of DPs and suggests that covert N raising to D is quite general. If movement is driven by unsatisfied features of the target heads, it is clear that UG must provide some feature which attracts Ns to D. I will assume that there is such an N checking feature which, optionally, can appear along with the D category feature. In order to keep the N checking feature distinct from the N category feature which it attracts, I will denote the N checking feature by ~N. The ~N feature is assumed to be optional since pronouns, which are generally taken to be determiners, do not attract an N. 7 Now consider a simple example: (16)

There are three women in the garden.

There are a number of possibilities that we need to consider. First, there is a question of the category of the phrase three women. The two possibilities are N and D. In the next section I will show that under very natural assumptions, D is excluded. So let us put that case aside for the moment and consider the case where three women is an NP. There are two possibilities, the expletive may or may not bear a ~N feature. If it does, covert N adjunction to D will be forced. The structure is: (17)

there(+~N,+D) are [NP three women] in the yard

Frampton: Expletive Insertion

45

The feature ~N is the only formal feature of the expletive D. Adjunction of the head of the NP to the expletive is a valid instance of Attract since there are no intervening heads which can satisfy this feature. 8 Now consider the case where the expletive determiner does not have a ~N feature. (18)

there(+D) is [^p a man] in the room

There are two questions to address. First, does (18) correspond to a well-formed derivation? Second, if it does, what interpretation, if any, does it receive? Putting aside the well-formedness question for the moment, we can address the question of interpretation. Longobardi (1994) argues that a D is required to assign argument interpretation to a nominal. If this is so, the nominal in (18) does not receive a semantic interpretation. That is, unless the expletive has a ~N feature (hence requires what is commonly called an 'associate'), semantic interpretation is blocked. Although this conclusion is sufficient for what follows, it is worth pointing out that (18) may not, in fact, correspond to a wfd. There is a complex of questions concerning Case that I have bypassed, but which could bear on the well-formedness of (18). Syntactically, I have been treating Case on a DP as atomic, but (at least morphologically) it can spread over the determiner, adjectives, nouns, etc. It is not at all clear what the Case requirements of the various subphrases of a DP are or what role the determiner plays in this. GB theory had a semi-justification for the requirement that arguments bear Case based on the idea of visibility for 0-role assignment. That justification has completely evaporated in the Minimalist Program. In fact, Case features are present when there is no 9-role assignment, and are erased before semantic interpretation is done. A better understanding of the role of Case with respect to establishing the argument status of nominals, as well as a better understanding of Case spreading within a nominal, could possibly lead to the conclusion that (18) is not wellformed. The proposal that there is an expletive determiner, in concert with the assumption that Ds can bear a ~N feature, has an effect somewhat like Lasnik's (1995) proposal that expletive there is specified in the lexicon as being an affix. Under either proposal, covert movement to there is forced by the features of the expletive. Both proposals depart in a significant way from Chomsky's (1986) proposal that movement of the associate to the expletive is driven by a principle of full interpretation. Both proposals are a move towards viewing movement as being driven by straightforward (and easily computed) feature satisfaction and away from allowing movement to be driven by longterm 'goals', like full interpretation. Note that the demands of full interpretation cannot motivate movement. Only formal features motivate movement. It does not even appear that full interpretation can be imposed as a well-formedness condition on movement. Implicit in the idea of 'interface condition' is that only relatively superficial computation is required to verify the condition. The LF interface condition can require, for example, that only features of a certain kind appear. Full interpretation is something of a different order. In order to determine if something is interpreted, the semantic computation must be carried out more or less to its conclusion. That is not an interface condition, it is a condition on semantic interpretation.

46

Frampton: Expletive Insertion

There are several counter-arguments to Chomsky's original expletive replacement analysis of these constructions which are met by the current ' D support' analysis. Consider one well-known problem, the scope problem. (19) a. b.

There seem to be three women in the room, Three women seem to be t in the room.

It has been noted that expletive replacement predicts that the readings of (19a) and (19b) should be the same. But (19b) has a reading roughly along the lines of (20) and (19a) does not. (20)

There are three women who seem to be in the room.

N o w consider the LF structure which the ' D support' analysis yields. (21)

there + women seem to be three t in the room.

The semantics of such structures remains to be explicated. But it is clear that no restrictions on the common noun set corresponding to the raised noun are outside the scope of seem. The absence of the reading (20) is expected. Incidently, note also that the composite head, expletive with adjoined N, does have ((»-features so that the agreement features of Infl and the nominative Case feature of the argument can be checked. There is a category of existential constructions which have proved problematic for many analyses, but which appear to be compatible with the present analysis. (22) a. b.

There are the covers of two books on my desk, There are two books' covers on my desk.

Here, it is clear that the postcopular is a DP, not an NP. Further, the relevant indefinite is not the postcopular itself, but a subphrase. The question that this poses for the present analysis is how the N, embedded in the postcopular DP, could move to adjoin to the expletive determiner. Of course, corresponding questions are posed for all of the various analyses that have been advanced. Consider the following paradigm, from Baker (1988). I give only English glosses. Constructions along these lines appear in several languages which extensively employ N-incorporation. (23) a. 1 saw Bill's house, b. *I Bill-saw house c. I house-saw Bill The possessor cannot be incorporated. So (23b) is ruled out. But the head of the object can be incorporated, as in (23c). This is the Possessor Stranding phenomenon. The head of the object incorporates into the verb, and the possessor receives accusative Case directly from the verb. Presumably, an incorporated N can receive some kind of inherent Case, freeing up structural accusative for assignment to the possessor.

47 Once stranded, the possessor itself can raise, as in (24). (24)

Bill was house-seen

This is the Possessor Raising construction. The possessor raises to subject position and receives the Case associated with that position. Incorporation of the N results in a 'domain extension' which allows raising the possessor. Complex questions of equidistance need to be settled before a serious analysis can be given of exactly how raising the head of the object frees up movement of the possessor. But the paradigm is at least suggestive for the examples in (22a) and (22b). The analog of possessor raising for (22a) would be something like (25). (25)

there + books2 Infl + [D the + coversi] [ [ ^ ^l] of [two tj]] on the desk

First, covers raises and adjoins to its associated D. Then this complex raises and adjoins to Infl and gets Case directly, stranding the possessor. The possessor then raises to adjoin to there. There are various questions of Case assignment. The discussion of inherent Case, structural Case, and their interaction which is taken up in the last section of this paper addresses some of these questions.

4

Expletive Insertion

In early transformational grammar, it was assumed that expletive there was not present in deep structure, but was inserted in the course of the derivation of surface structure. Conceptually, that idea is still appealing within the Minimalist Program theory. If derivations are driven by necessity, with optimal satisfaction of the necessary feature checking always being the option which is chosen, we might imagine that expletives are only inserted if they are needed to achieve well-formedness. The first obstacle any such proposal faces is to explain why pairs like the following are both grammatical. (26) a. A man was in the room. b. There was a man in the room. If expletives are only inserted as a last resort to achieve well-formedness, how do we account for the expletive in (26b)? Based on (26a) it appears that well-formedness can be achieved without insertion of an expletive. If the proposal made earlier that there is a categorial difference between the phrase [a man] in (26a) and (26b) is correct, this problem is only apparent. Consider a stage in the derivation where the following component has been built: (27)

was [NP a man] in the room

At this point, insertion of there is required to satisfy the EPP. Crucially, we assumed that the EPP feature checks for a D feature.

48 On the other hand, consider a stage in a derivation where the following has been built: (28)

was [op a man] in the room

In this case, insertion of an expletive is not required. The DP simply raises to be Case checked and satisfy the clausal EPP feature. Implementing this proposal requires a revision of the basic architecture, but it is straightforward. Certain lexical items are excluded from the initial representation. In the interests of limiting the discussion which follows, I will restrict the set of excluded items to minimal lexical items. That is, lexical items whose only intrinsic feature is their category label. This will include there and exclude expletive it, which has ^-features. Along with merger operations and movement operations, insertion operations are also admitted. The three kinds of operations are ranked by 'cost'. (29)0. 1. 2.

Merge Move Insert

More costly operations are permitted only if cheaper operations do not yield wellformed derivations. Again, the evaluation is made at each step of the derivation. We can summarize the scheme in two slogans—Avoid Movement and Last Resort Insertion (LRI). Expletive insertion as a last resort has two possible effects. In situations in which an expletive must be inserted at some point in order to achieve well-formedness, it will have the effect of delaying the insertion until it can no longer be avoided. This could be termed 'late insertion'. In general, it will be the EPP which forces expletive insertion. Late insertion will prevent insertion of the expletive lower in a clause than the head bearing the EPP feature. A second effect of LRI will be that some structures with expletives that we might otherwise expect to occur, will be excluded altogether since the EPP can be satisfied through movement. Since the effects are more dramatic and the argument less theory internal, I will restrict myself to discussing the second effect of LRI, the unexpectedly missing expletive constructions. The Indefiniteness Effect constitutes one such missing construction. Since the possibility of filling the subject position in Icelandic clauses without assigning nominative Case to the subject allows for some subtle probes of the relevant structures, most of the data will be taken from Icelandic.

4.1

Quirky Subjects in Icelandic

Icelandic quirky subject constructions are by this time fairly well known. 9 Consider the Dative-Nominative alternation mentioned above. (30) a.

Bokin var gefin okkur. book-the(N) was ( 3sg) given us (D)

Frampton: b.

Expletive

Insertion

49

Okkur var gefin bokin. us(D) was(3sg) given book-the (N)

There are two main considerations, Case and agreement. 10 Case assignment in (30a) is straightforward. The indirect object gets inherent Case directly and the direct object moves to [Spec,Infl] where its nominative Case feature is checked. In (30b), the indirect object again receives inherent dative Case directly, but moves to [Specjnfl] to satisfy Infl's EPP feature. The direct object must have its nominative Case checked covertly. Its head adjoins to Infl and Case checking takes place. Now note that verbal agreement in (30b) is not with the subject, but with the nominative object. In Icelandic, perhaps generally, nominative Case assignment always goes together with number agreement. This is illustrated again in (31b). (31) a.

b.

Hann hjalpaSi okkur. he helped us Okkur var hjalpaS. us(D) was(3sg) helped

There is no nominative Case assignment in (31b), so the verb shows up with 'default agreement', which morphologically coincides with 3rd person singular. It is not a real explanation of the facts, but we can compactly describe them by assuming that {+Finite,+Agr} checks {+Nom,+(j)}. That is, the feature pairs simultaneously check. I intend +([> as a shorthand for the (¡»-features of the nominal. If formal features are erased as a result of checking, the agreement features of Infl and the Case feature of the nominal will simultaneously delete. This feature checking algorithm therefore captures the basic properties of nominative Case checking. It ensures that number agreement always accompanies nominative Case checking. It also ensures that Infl can check the nominative Case feature of only one nominal phrase. Checking +Nom will erase the agreement feature of Infl and no further +Nom can be checked. Both of these conclusions appear to be descriptively correct. See Sigurdsson (1993) for further discussion. There is one further point concerning default number agreement. If we suppose that there is actually a default agreement feature, +DefAgr, which Infl can bear, as in (31b), how is this feature checked? Presumably, the agreement features of Infl are formal. The simplest solution is to suppose that nominals can also optionally bear a +DefAgr feature and the features are checked against each other. Assuming that there are no multiple agreement features, it must be optional. Assuming that Infl cannot bear multiple agreement features, a quirky subject combined with a nominative object, as in (30b), could not have its agreement features checked. I assume that the default agreement feature, unlike a «(¡-feature, is a formal feature which must be erased before LF.

50

4.2

Expletive Pad in Icelandic

Now let us consider expletive constructions. We first note that Icelandic has a lexical element Pad which corresponds roughly to English there. It appears in similar contexts, with an 'associate' displaying familiar indefiniteness characteristics. (32) a.

I>a6 er ma5ur i gar6inum. there is man-a in garden-the

b. t>a6 var drepinn maSur i gar6inum. there was killed man-a in garden-the But Icelandic Pad, unlike English there, also occurs in contexts with no associate." (33) a. b.

i>ad rignt. there rained f>a3 var hlegiS a6 okkur. there was laughed at us(D)

We need an explanation for the wider distribution of the expletive in Icelandic. One could pursue the idea that the difference lies with the lexical items which occur in (32) and (33), one having features more akin to the English pronoun it. I will assume, however, that the lexical items are essentially the same. In particular, that Pad does not occur as a pronoun. The appearance of unassociated Pad in Icelandic must then be due to independent properties of the language rather than fundamental morphological differences in the expletive element. What are these differences? As a starting point, consider Sigur6sson's suggestion (1993) that finite Infl in English, as opposed to Icelandic Infl, must assign nominative Case. This certainly excludes English versions of the examples in (33). Nominative Case checking depends upon agreement with (¡»-features, so the absence of ((»-features on the expletive blocks nominative Case checking. Note that there is no problem with an associated there in English appearing in what appears to be a position of nominative Case assignment, as in (34). (34)

There were three men in the house.

Here, nominative Case checking does not take place until after N covertly adjoins to the expletive, bringing both the +Nom and ((»-features of the noun into the checking domain of {+Finite,+Agr}. We can do a little better than simply stipulating that there is an Icelandic-English difference in Infl's nominative Case checking properties. Suppose we assume only that English does not have default agreement, in the sense that a +DefAgr is not an available feature in English. If finite Infl must have agreement features, this has the desired consequence. In (33), both Infl and the expletive can bear +DefAgr. Because default

Frampton: Expletive

Insertion

agreement features are absent, English is forced to use a pronominal expletive in contexts where Icelandic can use a minimal expletive.

4.3

The Effects of Last Resort Insertion in Icelandic

First, we review some of the relevant conclusions from the two previous sections. (35) a. b. c.

Expletive Pad is not assigned nominative Case. If the subject is not assigned nominative Case, nominative Case can be assigned to another argument. Icelandic finite Infl need not assign nominative Case.

Given these conclusions, we need an account of the following: (36) a. *t>a6 var maSurinn i gardinum. there was man-the in garden-the b. *I>ad var hjalpad okkur. there was helped us LRI (Last Resort Insertion) provides a direct account in both cases. The examples in (36) are blocked by the following: (37) a.

MaSurinn var i gardinum. man-the was in garden-the

b. Okkur var hjalpad. us(D) was helped In both cases, expletive insertion can be avoided. There is one further particularly revealing example which needs to be added to the evidence for LRI. Icelandic has a raising verb virdast, which is completely parallel to the English raising verb seem. Just like English seem, Icelandic virdast can take an indirect object. In Icelandic, the indirect object appears with dative inherent Case, unlike the PP construction which English requires. As is standard in Icelandic, the inherently Case marked object (indirect in this case) can move to subject position and surface as a quirky subject. The following paradigm results. (38) a.

Baekurnar virdast [hafa verid lesnar]. books-the(N) seem have been read(N)

hafa verid lesnar], b. Mer virdast [baekurnar me(D) seem books-the(N) have been read(N) 'It seems to me the books have been read.'

52

Frampton: Expletive

Insertion

In (38b), the quirky subject satisfies the EPP, but the matrix Infi retains its ability to check nominative Case. The head of the DP subject of the embedded clause covertly adjoins to Infi and its +Nom feature is checked. Something must block the following: (39) *

I>aö viröist [baskurnar hafa veriö lesnar], there seem books-the(N) have been read(N)

Again, LRI provides a direct explanation. Since (38a) is well-formed, (39) is excluded by LRI. There is a similar paradigm with a quirky embedded subject. (40) a.

Okkur viröist [hafa veriö hjälpaö], us(D) seems have been helped 'We seem to have been helped.'

b. Mer viröist [okkur hafa veriö hjälpaö]. me(D)seems us(D) have been helped 'It seems to me that we have been helped.' c. * I>aö viröist [okkur hafa veriö hjälpaö]. there seems us have been helped 'It seems that we have been helped.' The well-formedness of (40a), coupled with LRI, blocks (40c).

4.4

Structural Licensing

The examples with the verb virdast are particularly important because the accounts of the paradigm that I am aware of, Sigurösson (1991) and Schütze (1994), both resort to stipulative assumptions in order to achieve the desired result. Both investigators attempt to attribute (39) and (40c) to a failure of structural licensing. It is known by now that some form of generic structural Case (or structural licensing) is required for arguments, including arguments which are assigned inherent Case. The evidence is extensive. " Consider the following, for example: (41) a. *Hün harmaöi [okkur hafa veriö hjälpaö]. she regretted us(D) have been helped b. Hün taldi [okkur hafa veriö hjälpaö], she believed us(D) have been helped Although the dative quirky subject has Case, it is clear that the ECM verb plays a crucial role in licensing its appearance in the embedded subject position. Sigurösson and Schütze assume that (39) and (40c) are ruled out in the same way, by a failure of structural licensing. The stumbling block for attributing their status to structural

Frampton:

Expletive

Insertion

53

licensing deficiencies is that that it would appear, contrary to fact, that (38b) and (40b) would also be ruled out on the same grounds. Sigurdsson, working in a framework of argument licensing via head government, claims that assigning Case to the indirect object transforms virdast into a head governor suitable for argument licensing. Schiitze, working in the framework of the Minimalist Program, must make a similar stipulation. He proposes that there are two lexical entries for virdast, one without an indirect object which is intransitive, and one with an indirect object, which is transitive and therefore able to (exceptionally) license the subject position of the embedded clause. The LRI account is clearly superior. LRI removes the mystery from (39) and (40c), but it does not explain how the quirky subjects of the embedded clause in (38b) and (40b) are licensed. Since an answer to this question would bolster the case for LRI, let me offer a proposal. Suppose that arguments initially must bear a generic structural Case feature, SCase, in addition to whatever particular Case feature, inherent or structural, that they bear. SCase is checked by the usual structural Case checkers, but with the difference that agreement is not involved. That is, +Finite and +Transitive check +SCase. If +Finite is an intrinsic feature, there is no obstacle, in principle, which prevents +Finite from checking the Case of multiple arguments. However, since +Finite is not a formal feature, an argument cannot be attracted to the checking domain of Infl unless there is some other formal feature of Infl which can be satisfied by the argument. This explains (41a), the matrix Infl erases its agreement feature in assigning nominative Case to the subject. It is left with no formal features and cannot attract the embedded quirky subject. Now consider (38b). Here, the embedded subject moves to Infl to satisfy Infl's agreement feature, nominative Case is checked and SCase is checked. In (40b), we can suppose that the quirky embedded subject and the matrix Infl both have default agreement features. A default agreement feature of Infl attracts the embedded subject and both agreement and SCase are checked. The SCase of the matrix quirky subject is also checked by Infl, in a standard spec-head relation. Note that in this case the subject does not at any point in the derivation bear a default agreement feature. We have assumed that such a feature is a formal feature and must be deleted before LF. If the subject did bear a default agreement feature, there would be no way for it to be checked since the default agreement feature on Infl is consumed in checking the default agreement feature of the adjoined D. Last, note that there is no obstacle, other than LRI, for the same licensing to take place in (39) and (40c). Finally, consider the ECM case (41b). If we make the plausible assumption that the ECM mechanism involves agreement checking, then a +DefAgr on the embedded quirky subject is sufficient to attract it into the checking domain of +Transitive and the +SCase feature of the embedded quirky subject can be checked.

4.5

Pseudo-Passives

Some comment on pseudo-passives is in order. The Icelandic paradigm is: (42) a.

t>a6 var there was

hlegid a6 honum. laughed at him(D)

54

Frampton: Expletive

Insertion

b. *Honum varhlegi6 aS. he(D) waslaughed at This is completely consistent with LRI. Presumably, (42b) is not well-formed because all the Case-related features of the object are satisfied PP internally, so that it cannot be attracted to Infl. If (42b) were well-formed, it would block (42a), by LRI. It is therefore tempting to attribute the absence of (43) in English to LRI and the availability of pseudo-passives. (43) a.

We were laughed at.

b. *There was laughed at us. But, not surprisingly, what LRI and (43a) rule out is: (44)

* There was laughed at we.

In (44), the object is not assigned Case by the preposition and bears nominative Case. Crucial to the pseudo-passive construction in English is the withdrawal of the Caseassigning properties of the preposition. Expletive insertion is not required, as (43a) shows, and (44) is blocked. An account of (43b) is still required, particularly in the face of the Icelandic (42a). In (43b), the object receives Case from the preposition. In fact, all the Case-related features of the object are satisfied. Infl is therefore left with no way to satisfy its agreement features. The object cannot be attracted to Infl, and the expletive does not have (¡¡-features. English, unlike Icelandic, does not have default agreement features so the agreement features of Infl must be checked against ((¡-features. We conclude that the paradigm (43), although consistent with LRI, furnishes no evidence for it.

4.6

The Indefiniteness Effect

All of the pieces of solution have been worked out. LRI ensures that a nominal appearing in a position from which movement to a subject position filled by an expletive would otherwise be possible must be an NP, not a DP. The expletive must bear a ~N feature, otherwise the NP could not be interpreted as an argument. Early, the echo of Lasnik's LF affix idea was noted in the ~N feature of the expletive. Here we can note the echo of Chomsky's idea that movement to the expletive is required for full interpretation. It is not, however, that the expletive must be replaced so that it is not subject to interpretation, but that its associate needs the expletive (its D feature) in order to be interpreted as an argument. The associate of an expletive certainly will not move for EPP reasons since it does not have a D feature. But it does have ((¡-features and could possibly move to satisfy formal agreement features of some head. This remains to be investigated, but it must be cause of the movement of the associate of an expletive, when it does occur. SigurSsson (1991), for example, gives the following Icelandic paradigm.

Frampton: (45) a.

b.

Expletive

Insertion

[f»rem batum] mundi sennilega hafa veriS bjargad. three boats would probably have been rescued i>ad mundi sennilega f r e m batum] hafa verid bjargad. there would probably three boats have been rescued

c. * i>ad mundi sennilega hafa [t>rem batum] verid bjargad. d. *f>ad mundi sennilega hafa verid [f>rem batum] bjargad. e.

I>ad mundi sennilega hafa verid bjargad [I>rem batum]

In (45a), the nominal is a DP and the structure is a familiar passive. In the expletive constructions, the nominal is an NP. The grammaticality of (45c,d,e) are as we expect, the nominal has no reason to move (i.e. no head attracts it). In (45b), however, there must be the possibility of some high intermediate inflectional head which attracts the nominal. I will not speculate further on the details.

5

Concluding Remarks

This paper came to three main conclusions: 1. The Principle of Greed is untenable in its strong form. But a weak version of this principle, incorporating the idea of feature types, is needed to account for Chain Condition Effects. 2. If an expletive-associate construction is contrasted with the corresponding construction without the expletive, the corresponding nominals are not identical. The associate in an expletive-associate construction is an NP, not a DP. 3. Minimal expletives are inserted derivationally as a last resort. These conclusions have some independence. The last conclusion does depend crucially on establishing that (in a theory which includes derivational expletive insertion) the expletive-associate construction and the corresponding non-expletive construction are not in competition. But the precise nature of the feature distinction between the nominals in the two constructions is not important. What is crucial is that the associate has some 'deficit' which precludes it from satisfying the clausal EPP requirement. The conclusion that the expletive-associate relation is a simple D-N relation does, however, explain the indefiniteness effect in a fairly direct way. Something is responsible for examples like: (46)

* [John to be believed t is guilty] is surprising.

56 The conclusion arrived at in this paper, in essence, is that John becomes inert to Caserelated movement because all its Case-related features are satisfied in the most deeply embedded clause. This is not a major advance over the Chain Condition itself. Aside from the recognition that several different features are Case-related, it is more or less a restatement (in the framework of a feature-driven syntax) of the idea that A-chains terminate when Case assignment is complete. Future work will decide whether a less descriptive and more revealing formulation is possible. One suspects that it is. If the approach in this paper is tenable, the next step would be to establish the notions of 'Case-related feature' ('scope-related feature', etc.) in a less stipulative way. Incorporating a weak version of the Principle of Greed into the notion of a wellformed movement operation was important to the account of expletives because it allowed EPP to be expressed in terms of simple D-checking. If EPP is D-checking, (46) above must be blocked by some restriction on movement. Other approaches to restricting movement which accomplished the same end would also be compatible with the theory of expletives which was developed. It is difficult to see how the movement in (46) could be blocked by optimality considerations.

Notes *

T h i s paper is a substantial revision of my talk at the Berlin W o r k s h o p on the Role of E c o n o m y Principles in Linguistic Theory (February 1995). I want to thank the faculty and staff of the M a x Planck G e s e l l s c h a f t for their hospitality. Particular thanks are due to Chris Wilder and H a n s - M a r t i n Gärtner for their hospitality, for the conception and direction of t h e conference, and for several useful discussions o f the linguistic issues. M o s t importantly, I want to a c k n o w l e d g e the contribution of m y colleague Sam G u t m a n n . W e have had innumerable stimulating discussions of the Minimalist P r o g r a m . ** A note on the sources of the Icelandic examples: The crucial e x a m p l e s in Sections 5.3 and 5.6 are taken f r o m S i g u r ö s s o n ( 1991). T h e r e m a i n i n g examples w e r e cobbled together f r o m Falk (1989), S i g u r ö s s o n (1994), Thräinsson (1986), G l e n d e n i n g (1961), and Thräinsson (p.c.). Finally, H ö s k u l d u r Thräinsson kindly checked them over for me. He should not be held responsible for any errors that m a y have crept in, particularly since I could not resist a f e w last minute changes. 1

S i n c e it is not crucial for anything that follows, I assume for the sake of simplicity that there is a single clausal inflection head, Infi. As far as 1 k n o w , assuming that various features of Infi are distributed over separate heads does not alter the analysis.

2

W h e n t w o c o m p o n e n t s merge, one is taken to project and the other is taken to be a m a x i m a l projection (i.e. its projection is 'closed o f f ) . T h e shift f r o m M o v e to Attract therefore incorporates the insight of Ura (1994a,b). T h e w i d e g a p b e t w e e n such an intuition and a scientific understanding of the complexity issues should not be underestimated. It is not even clear which computations bear a useful r e s e m b l a n c e to p e r f o r m a n c e questions. T h e present f r a m e w o r k poses t w o salient decision p r o b l e m s :

3 4

5 6 7

a. Given an initial representation (a list of lexical items), decide if it has a w f c . b. G i v e n a w f d , decide if it is optimal. It is not clear if either of these decision problems bears any m e a n i n g f u l relation to p e r f o r m a n c e computation. T h e original idea for w e a k e n i n g the Principle of Greed along these lines is d u e to S a m G u t m a n n . S e e Delsing (1988) for a general discussion of DP structure in the Scandinavian languages. It could be, of course, that p r o n o u n s also bear an N feature and the checking d o m a i n of a head includes the head itself. In that case, a ~N feature on a determiner could be satisfied internally.

57 8

In some sense, this is 'exceptional feature satisfaction.' In general, when a head attracts a head, it attracts a head out of its complement. Often, a head attracts the head of its complement. This is the usual case for satisfaction of the ~N feature of D, as is seen with the Icelandic example above. A careful consideration of c-command, however, shows that it permits N raising to D here as well. 9 See the many references cited in Schiitze (1993). 10 Structural licensing will be considered shortly. 11 Note that these examples are direct evidence against the idea that some principle of the theory forces expletive elimination. 12 See, for example, Freidin and Sprouse (1991), SigurSsson (1991), and Schiitze (1994).

References Baker, M. 1988. Incorporation. Chicago: University of Chicago Press. Chomsky, N. 1993. A Minimalist Program for Linguistic Theory. In The View from Building 20: Essays in Linguistics in Honor of Sylvain Bromberger, ed. K. Hale and S.J. Keyser. Cambridge, Mass.: MIT Press. Chomsky, N. 1994. Bare phrase structure. MIT Occasional Papers in Linguistics 5, MIT, Cambridge, Mass. Chomsky, N. 1986. Knowledge of Language. New York: Praeger. Delsing, L. 1988. The Scandinavian Noun Phrase. Working Papers in Scandinavian Syntax 42: 57-79. Falk, C. 1989. On the Existential Construction in the Germanic Languages. Working Papers in Scandinavian Syntax 44: 45-59. Falk, C. 1990. On Double Object Constructions. Working Papers in Scandinavian Syntax 46: 53-100. Freidin, R. and R. Sprouse. 1991. Lexical Case Phenomena. In Principles and Parameters in Comparative Grammar, ed. R. Freidin. Cambridge, Mass.: MIT Press. Glendening, P. 1961. Teach Yourself Icelandic. Hodderand Stoughton. Lasnik, H. 1995. Last Resort. To appear in Proceedings of the First Numazu Conference on Formal Linguistics. Longobardi, G. 1994. Reference and Proper Names. Linguistic Inquiry 25.4: 609-665. Schiitze, C. 1993. Towards a Minimalist Account of Quirky Case and Licensing in Icelandic. MIT Working Papers in Linguistics 19: 321-375. Sigurösson, H. 1991. Icelandic Case-marked PRO and the Licensing of Lexical Arguments. Natural Language & Linguistic Theory 9.2: 327-363. Sigurösson, H. 1993. The Case of Quirky Subjects. Working Papers in Scandinavian Syntax 49: 1-26. Thräinsson, H. 1986. On Auxiliaries, AUX and VPs in Icelandic. In Topics in Scandinavian Syntax, ed. L.Hellan and K.Christensen. Dordrecht: Reidel. Ura, H. 1994a. On the Economy Condition on Derivation: A Preliminary Sketch. Ms., MIT, Cambridge, Mass. Ura, H. 1994b. Varieties of Raising and the Feature-based Bare Phrase Structure Theory. MIT Occasional Papers in Linguistics 7. MIT, Cambridge, Mass.

^-Scrambling in the Minimalist Framework

Günther Grewendorf and Joachim Sabel 1

Introduction*

Looking at free word order phenomena in scrambling languages from a comparative perspective yields surprising results because no uniform picture seems to emerge. For example, long scrambling out of finite CPs is possible in Japanese (1) but not in German (2a), where only short scrambling is allowed cf (2b): (1)

[ipsono hon-oj [jp John-ga [vpBill-ni [cp Mary-ga t\ motteiru to] itta]]] (koto) that bookacc J-nom B.DAT M.nom have C O M P said fact 'That book, John said to Bill that Mary has.'

(2) a. * daß [ipdieses Buch; [ip Hans [vp Peter gesagt hat [cp daß Maria fj besitzt ]]]] that this bookacc H.nom P - d a t said has that M •nom owns b.

daß Hans Peter gesagt hat [cp daß [ip dieses Buch; [ip Maria t\ besitzt]]] t h a t H •nom P - d a t said has that this bookacc M.nom owns

Short scrambling has A-properties in Japanese (3) but A'-properties in German (4); i. e. only in Japanese can a scrambled N P act as an A-binder for an anaphor. The (a)examples represent violations of Principle A of the Binding Theory. (3) a. ?*[[Otagaii-no sensei]-ga [karera-oj hihansita]] (koto) each othergen teacher n o m they a c c criticized fact 'Each other's teacher criticized them.'

(Saito 1992:74f)

b. ? [Karera-Oi [[ otagaij-no sensei]-ga [f; hihansita]]] (koto) theyacc each other g e n teacher n 0 m criticized fact 'Themj, each other's; teachers criticized.' (4) a. * daß [die Lehrer von sich;] zweifellos [den Studenten;] in guter that [the teachers of himself] n om undoubtedly [the student a c c] in good Erinnerung behalten haben memory kept have

59 b. * daß [den Studenten]; that [the Student ac c]i Erinnerung behalten memory kept

[die Lehrer von sich;] zweifellos t\ in guter [the teachers of himselfi] n om undoubtedly in good haben have

German has overt w/z-movement whereas Japanese is a wh-in situ language (5a). Further w/z-phrases in Japanese freely undergo scrambling (5b) (Saito 1985, Takahashi 1993, among others), in contrast to German (6) (Fanselow 1990): (5) a.

John-ga [ Mary-ga nani-o katta ka] sitteiru John n o m Mary n o m whatacc boughtQ knows 'John knows what Mary bought.'

b.

nani-o John-ga [ Mary-ga t katta ka] sitteiru whatacc John n om Marynom bought Q knows 'John knows what Mary bought.' * W i e hat was der Mann gestern t repariert H o w has w h a t a c c the m a n n o m yesterday repaired ' H o w did the man fix what yesterday.'

(6)

First, we will review the analysis proposed in Grewendorf and Sabel (1995) in section 1, where it is argued that the above mentioned differences (l)-(4) between German and Japanese can be traced back to parametric properties of Agr. However, the focus of this paper lies in pointing out further differences between the Agr-systems in Japanese and German, which—as we will argue—cause different properties of whmovement in both languages as in (5)-(6). In addition to the parametric variation of properties of Agr, our account makes use of a constraint on adjunction movement. In Grewendorf and Sabel (1994) it is argued that scrambling is not permitted to operate via successive cyclic adjunction (7c): (7) a.

[ W P ZP [ W P W [ X P X [ Y P Y tZP ]]]] t

b.

I

[ W P W [ X P ZP [ X P X [ Y P Y tZP ]]]]

t

I

c. * [ W P ZP [ W P W [ X P tZP [xp X [ Y p Y tZp ]]]]]

t

It

I

Given this condition and the derivations for the word orders found in (7a) and (7c) we concluded that only (7a) is a possible derivation, but not (7c), although the intermediate landing site in (7c) is a potential landing site (7b). Obviously, this view is in conflict with the condition Minimize Chain Links (MCL) (see Chomsky and Lasnik 1993, Saito 1994b), which states that it is impossible to skip potential landing sites.

60

Grewendorf & Sabel:

Wh-Scrambling

As argued in Grewendorf and Sabel (1994) the main arguments for treating scrambling not as subject to Minimize Chain Links (MCL) come from the fact that MCL would force a scrambled element to neutralize barriers for movement, leaving unexplained the locality restrictions found with scrambling. Hence we proposed the constraint mentioned above which rules out the intermediate adjunction strategy as a derivational possibility for neutralizing islandhood. Furthermore, in Sabel (1994) several arguments are presented showing that the 'constraint on adjunction' is in fact a universal (movement-) constraint, derivable from economy considerations. It is shown that it holds for X°-movement, Quantifier Raising, A- and w/z-movement as well. This gives rise to a stronger version according to which movement in general may not proceed via intermediate adjunction, i.e. that adjunction is a 'dead end' for every kind of movement. One might ask whether this constraint is falsified by the fact that languages such as Japanese allow w/z-scrambling. Given that w/z-scrambling is adjunction, LF-movement of a scrambled wh-phrase to a Spec,CP operator position should be blocked. Based on our account of the differences found in (l)-(4) we will show that contrary to appearances, w/z-scrambling does not undermine the constraint on adjunction. We will argue that the Agr-system in Japanese differs from the Agr-system in German in so far as a [+wh]-feature which is base-generated in Infl/Agrs (Rizzi 1990, 1991) may remain there in Japanese and may be checked there by adjunction to AgrsP, i. e. via scrambling as an instance of operator-movement. This analysis provides an account for several scope ambiguities with respect to w/z-scrambling as discussed in Takahashi (1993) and it provides an explanation for the 'additional-w/z effect' in Japanese (Saito 1994a). On the other hand, in German the [+wh]-feature may only be realized in C°, hence it can only be checked via Spec-head-Agreement, i. e. as a result of w/z-movement to Spec,CP. Before going into the anlysis of the phenomenon of w/z-scrambling, we briefly summarize the basic assumptions of the analysis presented in Grewendorf and Sabel (1995) to account for the different locality effects found with scrambling in German and Japanese (l)-(2) and for the different A-/A'-properties in connection with short scrambling in German and Japanese (3)-(4).

2

Phrase-structural Differences between German and Japanese and their Implications

Chomsky (1994) eliminates X'-Theory in favor of what he calls 'Bare Phrase Structure theory'. Phrase structure building in this theory is imposed by the two operations 'Merge' and 'Move'. Merge is a binary structure-building operation that applies cyclically building trees from bottom to top. Two terms (constituents) are combined becoming a complex term (constituent), which has the properties of its head (cf. Chomsky 1994:12ff.). For example, a DP like [DP the book] is a result of merger. The complex term {(D)the {(D)the, (N)book}} results from merging the terms the and book where D (the) is the (projecting) head of the complex term DP. It is important in the present context that this theory allows for phrases with multiple specifiers (cf. Chomsky 1994:4Iff.) as can be seen from (8a):

61

Grewendorf & Säbel: Wh-Scrambling (8)

a.

X P (+max, - m i n ) Spec2

b.

X' ( - m a x , - m i n )

Spec!

Spec!

X1 ( - m a x , + m i n )

Complement

X P (+max, - m i n ) X' ( - m a x , - m i n )

Complement

X ° ( - m a x , +min)

X° (-max, +min)

G i v e n that in contrast to G e r m a n , Japanese is not a ' f o r c e d a g r e e m e n t '

language

( K u r o d a 1992), i. e. nothing constrains the number o f X P s that g o into an a g r e e m e n t relation w i t h A g r , w e assume that A g r in Japanese m a y potentially c h e c k L - r e l a t e d features o f m o r e than one X P v i a S p e c - h e a d - A g r e e m e n t . H e n c e , the sentence structure m a y contain multiple ( A g r o - and A g r s - ) specifiers in Japanese but not in G e r m a n , due to p a r a m e t e r i z e d properties o f A g r e e m e n t . I f w e assume that case is c h e c k e d in S p e d in Japanese ( 8 a ) and G e r m a n ( 8 b ) , (short) scrambling in Japanese m a y g o into S p e c 2 o f a functional p r o j e c t i o n , w h e r e a s it must be adjunction to X P in G e r m a n .

2.1 Now

The A-/A'-properties o f Short Scrambling in German and Japanese let us c o n s i d e r

how

this analysis p r o v i d e s

an account

f o r the

fundamental

d i f f e r e n c e s f o u n d with respect to scrambling in G e r m a n and Japanese. Firstly, in ( 9 ) ( = ( 3 b ) ) the scrambled o b j e c t is located in S p e c 2 , A g r s P , i. e. an A - or L - r e l a t e d p o s i t i o n f r o m w h e r e it can act as an A - b i n d e r f o r the anaphor. (9)

?

[Karera-Oj [ [ o t a g a i j - n o theyacc

sensei]-ga [i; hihansita]]] ( k o t o )

each o t h e r g e n t e a c h e r n o m

criticized

(Saito 1992:74f)

fact

' T h e m ; , each other'sj teachers c r i t i c i z e d . ' In contrast, the scrambled o b j e c t in ( 1 0 ) ( = ( 4 b ) ) is in an I P - a d j o i n e d ( b r o a d l y L - r e l a t e d ) p o s i t i o n , w h i c h — a s w e a s s u m e — h a s only A ' - p r o p e r t i e s . (10)

*

daß [den Studenten]; [die Lehrer

von sich;]

z w e i f e l l o s ? ; in guter

that [the student a c c]i [the teachers o f h i m s e l f j ] n 0 m undoubtedly in g o o d Erinnerung behalten

haben

memory

have

kept

H e n c e , the fact that short scrambling has A - p r o p e r t i e s

in Japanese

( 9 ) but

A'-

properties in G e r m a n ( 1 0 ) f o l l o w s f r o m the d i f f e r e n t phrase-structural properties in both languages, w h i c h are due to parametric properties o f A g r e e m e n t .

2.2

Scrambling as Feature-driven Movement

In C h o m s k y ( 1 9 9 3 ) it is argued that a c c o r d i n g to a g l o b a l e c o n o m y c o n d i t i o n t w o ( o r m o r e ) c o n v e r g i n g derivations h a v e to be c o m p a r e d and that o n l y the one i n v o l v i n g the

62 fewest (movement) steps is legitimate. Now, if we compare a sentence in which scrambling has taken place with its unscrambled variant, assuming that both derivations converge—i. e. all relevant features are checked but scrambling is an instance of optional movement in the former derivation—then the question arises of why the derivation that involves scrambling is possible at all 1 . Assume that scrambling is triggered by the need to check a scrambling feature E and that this feature is associated with Agr- (Agro or Agrs) heads. In fact, then, scrambling to IP is scrambling to AgrsP (as w e have already assumed in our discussion of (9)-(10)) and scrambling to VP is scrambling to AgroP. The E-feature in (9)-(l 0) is realized on Agrs. 2 Given Chomsky's (1993) definition o f ' C h e c k i n g domain' the Z-feature may be checked via adjunction to AgrsP in German or via substitution into Spec2 of AgrsP in Japanese. 3 Before w e can explain the different locality constraints that hold for scrambling in G e r m a n and Japanese, recall that there has been some discussion of how to analyze long w/z-movement in the minimalist framework. Is it derived via one long w/z-movement of the w/z-phrase to the checking position and insertion of intermediate traces by Form Chain (Chomsky 1993), or—as traditionally assumed—by movement of the Wz-phrase through every intermediate Spec,CP position? There is empirical evidence that the latter approach is correct (Collins 1993, 1994; Ferguson and Groat 1994). If we adopt this approach in conjunction with the assumption that movement is triggered solely by feature-checking, we are forced to assume that movement through intermediate positions also applies to satisfy feature checking. As the above mentioned authors suggest, a natural implementation of this is to assume that in the case of w/z-movement there is a kind of feature-spreading, i.e. that the embedded C°-heads bear defective or quasi w/z-features (wh 1 ) that need to be checked: (1 0

fcP[+Wh]

Wh-phrase

... [ c p [ + W h ' ]

t" -

[cP[+Wh']

t'

flP

]]]]

W e make use of a similar mechanism for scrambling here. Given that the scrambling feature is associated with Agr-heads, in (l)-(2) (= (12)-(13)) it is located in Agrs of the matrix and embedded clause, as can be seen from (14): (12)

(13)

( 1 4 )

[AgrsP

sono

hon-oi [Agrs' John-ga [vpBill-ni [cp t\' Mary-ga t\ motteiru to] that bookacc J-nom B.dat M. n 0 m have C o m p itta]]] (koto) said fact 'That book, John said to Bill that Mary has.' * daß [AgrsP dieses Buchj [ A g r s P Hans [yp Peter gesagt hat [cp daß fj' that this bookacc H. n 0 m P-dat said has that Maria t\ besitzt ]]]] M •nom owns 'That book, Hans said to Peter that Mary has.' [cp[AgrsP(X)

[TP U g r o P

[VP

[CP

[AgrsP(I')

[TP

...]]]]]]]]

63 The scrambled element has to check both Z-features. N o w , given our assumption that successive adjunction is generally impossible, elements in German may not be long scrambled because a scrambled element that is moved to an adjunction site inside the embedded clause like AgrsP in (13) may not move further into the matrix clause. On the other hand, scrambling in Japanese may proceed in a successive-cyclic manner via the embedded specifier (of AgrsP) as in (12), i.e. not via XP-adjunction. Hence we derive the different locality constraints that hold for long scrambling in German and Japanese from our assumption that Agr in German and Japanese licenses different types of phrase structure.

2.3

The Consequences of L-relatedness

Only arguments may undergo long scrambling out of finite clauses in Japanese. This restriction follows from the fact that Spec,AgrP positions are L-related positions. Hence only L-related elements may use the Spec2 positions of AgrPs as intermediate landing sites in Japanese. This accounts for the fact that adjuncts (15b) may not be long scrambled in Japanese in contrast to arguments (12) (Saito 1985, Nemoto 1993, Boskovic and Takahashi 1995): (15) a.

Mary-ga f John-ga riyuu-mo naku sono setu-o sinziteiru to] M •nom J-nom reason-even without that theory a c c believes that omotteiru (koto) thinks (fact) 'Mary thinks that John believes in that theory without any reason.'

b. * riyuu-mo nakuj Mary-ga [ t\ John-ga t\ sono setu-o sinziteiru to] reason-even without M. n o m J-nom that theory a c c believes that omotteiru (koto) thinks (fact) The adjunct riyuu-mo naku 'without any reason' in (15) may not check the embedded Z-feature (E' on Agrs) via movement through the embedded Spec2 position of the embedded AgrsP because it is not an L-related element. Consequently the adjunct has to adjoin to the embedded AgrsP (as in the case of argument scrambling in German) to check the Z-feature and the constraint on adjunction forbids further movement into the matrix clause. (15b) represents an illicit derivation of the kind (7c). The fact that L-related positions may only be used by L-related elements has further consequences. Long distance scrambling in Japanese has A'-properties as noted by Saito (1992). For example, a long scrambled N P cannot bind an anaphor in the matrix clause: (16) a. * otagai-noj sensei-ga [ Hanako-ga karera-Oj hihansita to] itta(koto) each o t h e r g e n teacher n 0 m Hanako n 0 m they a c c criticized COMP said b. *karera-Oi otagai-noj sensei-ga [ Hanako-ga t hihansita to] itta (koto) t h e y a c c each otherg e n teacher n 0 m Hanako n om criticized COMP said

64

Grewendorf

& Sabel:

Wh-Scrambling

The ungrammatically of (16a) results from a violation of Principle A, as expected. However, in contrast to short scrambling (9), long scrambling of a potential antecedent to a position in front of the anaphor does not result in grammaticality in (16b). From this we can conclude that long scrambling out of finite clauses has only A'-properties in Japanese. This can be explained along the following lines: An argument that is long scrambled does not count as L-related with respect to the Spec2 position of Agrprojections in the matrix clause. Hence it may not move into the Spec2 position of the matrix AgrsP in (16b) in order to check the Z-feature. Consequently it has to adjoin to AgrsP for feature-checking, and—as already pointed out—XP-adjunction creates A'positions. This is why in (16b) there is no possibility for the scrambled element to Abind the anaphor. 4 This account for the difference between (9b) and (16b) seems to rest on the assumption that a scrambled argument is only L-related to the Spec2 positions of the clause from which it originates. Although this generalization is an oversimplification as will become clear in our discussion of scrambling out of infinitivals, let us assume it to be correct for the moment. Why should this be the case? Following ideas of Holmberg (1985), who argues that object shift is contingent on verb raising, we assume that movement from Spec,AgrP to Spec,AgrP (Agr = Agro or Agrs) is also contingent on (overt or covert) verb raising. Now consider the abstract representation of a derivation with short scrambling of an NP-object to AgrsP: (17) [AgrsP(Z) Spec2 [ A g r s P S p e d [ T P [ A g r o P Spec2 [ A g r o P S p e d [ V P t S u b j Obj...]]]]]]

t

IT

I

Let us assume that verb movement to Agrs applies in Japanese (Koizumi 1995). Given that the verb is located in Agrs, the object may move via S p e d of AgroP (to check its case) on to Spec2 of AgrsP to check the E-feature (let us assume that S p e d of AgrsP is occupied by the raised subject). This is in accordance with the view that movement from Spec,AgrP to Spec,AgrP (Agr = Agro or Agrs) is contingent on (overt or covert) verb raising. N o w let us turn to long scrambling, where we have to consider two cases: long scrambling out of finite clauses as in (16b) and long scrambling out of infinitivals. In cases of long scrambling out of finite clauses the embedded verb does not move up to the matrix Agro or Agrs position, hence the corresponding specifier positions do not count as L-related positions for long scrambling, and a long scrambled argument must move to an XP-adjoined position as we saw in connection with our analysis of (16b). This is why long scrambling out of finite clauses only shows A'-properties. What about the second case, long scrambling out of infinitivals? There are reasons to assume that verb raising of the embedded infinitival verb into the finite matrix verb may apply. In Grewendorf and Sabel (1994) it was argued that long scrambling out of infinitivals is possible in German if verb incorporation between the embedded and the matrix verb takes place. For example, versuchen 'try' in (18) is an incorporation verb, whereas behaupten 'claim' (19) is not:

Grewendorf (18) a.

b.

(19) a.

& Säbel:

Wh-Scrambling

daß jemand [PRO die Frau zu heiraten] versuchte that someone the woman to marry tried daß die Frau jemand [PRO t zu heiraten^] versuchte14 that the woman someone to marry tried 'Someone tried to marry the woman.' daß jemand [PRO die Frau zu heiraten] behauptete that someone the woman to marry claimed

b.* daß die Frau jemand [PRO t zu heiraten] behauptete that the woman someone to marry claimed 'Someone claimed to marry the woman.' Leaving technical details of the analysis aside, it must be mentioned that verb incorporation was assumed to take place at LF, but coindexation between the finite and non-finite verb, a mechanism established by Baker (1988) under the name abstract incorporation or reanalysis, ensured that the infinitival was already transparent in the overt syntax. This incorporation analysis, if it is extended to Japanese infinitivals, predicts that in connection with long scrambling out of infinitivals, the Spec,AgrP positions in the matrix clause should count as potential landing sites for long scrambled arguments. Hence, long scrambling out of infinitivals should behave like A-movement because covert verb movement of the infinitival verb into the matrix clause applies. This prediction is in fact borne out (cf. Saito 1994b, (7)): (20) a. * John-ga [[ otagaii-no sensei] -ni [PRO kareraj-o homeru yooni] J o h n n o m each other's teacher to they ac c praise to tanonda] (koto) asked fact 'John asked each other's teachers to praise them.' b. ?John-ga Johnnom tanonda] asked

karera\-o [ otagai;-no sensei] -ni [PRO t homeru yooni] they a c c each other's teacher to praise to (koto) fact

c. ? karera\-o John-ga [[ otagaij-no sensei] -ni [PRO t homeru yooni] theyacc John n 0 m each other's teacher to praise to tanonda] (koto) asked fact In (20b) the embedded object is scrambled to the Spec2 position of the matrix AgroP, whereas it is located in the Spec2 position of the matrix AgrsP in (20c). The Spec2 position of the matrix Agr-phrases is accessible due to verb-raising at LF. Note that (20) makes clear that it is misleading to say that a scrambled argument may only use L-

66

Grewendorf & Säbel:

Wh-Scrambling

related positions (Sped and Spec2 positions of Agr) of the clause from which it originates. To sum up, a natural explanation for the different locality restrictions of scrambling as well as for its different A'-/A-movement properties in German and Japanese can be given if some ideas about sentence structure presented in Chomsky's (1994) theory of 'Bare Phrase Structure' are adopted and connected with several assumptions about the Agreement-system in both languages. Without going into further differences between German and Japanese which this analysis is able to explain 5 we now want to turn to the phenomenon of w/z-scrambling in Japanese on the basis of the present analysis.

3

PF/z-scrambling

The constraint on adjunction discussed in the preceding sections seems to face serious empirical problems in view of the fact that there are languages like Japanese that allow scrambling of w/z-elements. If long scrambling is adjunction, long scrambled whelements should not be able to undergo any further movement at the level of LF. To resolve this problem we will make use of the idea, entertained by Rizzi (1991) on morphological grounds, that INFL can be endowed with the w/z-feature. We assume that in languages like Japanese, a strong [+wh] feature is realized in Agrs, and that it can be checked by XP-adjunction to AgrsP in a position that is broadly L-related to Agrs, whereas the weak [+wh]-feature is in C. On the other hand, in languages like English and German, the strong [+wh] feature may only be realized in C and checked in Spec,CP. A wh-feature that is located in Agrs cannot be checked in a specifier position of AgrsP since a specifier position of AgrsP does not constitute an operator position. We thus need to establish that XP-adjunction to an AgrsP whose head bears a [+wh] feature exhibits the properties of an operator position and as such requires no further movement at LF. To achieve this goal let us consider the following examples (Takahashi 1993): (21) a. John-ga [cp Mary-ga nani-o katta ka] sitteiru J-nom M.nom what acc bought Q knows 'John knows what Mary bought.' b. Nani-oi John-ga [CP Mary-ga t\ katta ka] sitteiru whatacc J-nom M. n o m boughtQ knows In example (21b), the w/z-phrase has been long scrambled out of its scope, indicated by the question marker ka. We have already seen that the target position of scrambling out of finite clauses is adjunction to IP (AgrsP). Since example (21b) represents a declarative sentence with an embedded w/z-question and thus has the same interpretation as (21a), the adjoined w/j-phrase must be in a w/j-operator position of the embedded clause at the level of LF. Example (21b) therefore provides an illustrative instance of the fact pointed out by Saito (1989) that scrambling as A'-movement can be undone at LF. Note that scrambling in (21b) is triggered by S-feature-checking. After

67 reconstruction the wh-phrase moves to the embedded Spec,CP in order to check the weak [+wh]-feature in C, as is also the case in (21a). Now compare (21) with (22) (Takahashi 1993): (22) a.

John-wa [cp Mary-ga nani-o tabeta ka] siritagatteiru no? J.Top M-nom what a c c ate Q wants-to-know Q 'Does John want to know what Mary ate?' or 'What does John want to know whether Mary ate?'

b. Nani-oj John-wa [cp Mary-ga what a c c J.Top M.nom

tabeta ka] siritagatteiru no? ate Q want-to-know Q

The examples in (22) differ from those in (21) in that they have a question marker in both the embedded clause and the matrix clause. Since the question marker ka is ambiguous between a scope marker for a wh-phrase and a complementizer corresponding to whether in English, sentence (22a) is ambiguous with respect to the scope of the w/a-phrase nani-o. As indicated in the translations it can either be a yes/no question with an embedded w/z-question or a w/z-question with an embedded whetherquestion. The interesting fact about (22b) is that long scrambling of the embedded wh-object has the effect that contrary to (22a), the w/z-phrase in (22b) can only have matrix scope. Unlike the scrambled w/z-phrase in (21b), the scrambled wh-phrase in (22b) obviously cannot be reconstructed at LF. Takahashi (1993) concludes from this observation that long movement of a w/z-phrase to the initial position of a clause headed by a [+wh] COMP counts as w/z-movement in Japanese. He assumes that the target position of the w/z-phrase is Spec,CP in that case and attributes the fact that the w/z-phrase in (22b) cannot undergo LF-movement (reconstruction) to a constraint (Lasnik and Saito 1992, Epstein 1992) according to which S-structural movement of a w/z-phrase to a [+wh] COMP prevents this w/z-phrase from undergoing any further movement at LF. There is evidence against the view that in Japanese, movement of a w/z-phrase to the initial position of a [+wh] clause is movement to Spec,CP. This evidence consists of the observation that in Japanese, short (23) and long (24) overt movement of a w/z-phrase to the initial position of a [+wh] clause may cooccur with scrambling of a non-w/z-^phrase into a position to the left of this w/z-phrase (examples due to Mamoru Saito, p.c.): (23)

John-ga [Bill-ni nani-o Mary-ga t t watasita ka] siritagatteiru J-nom Bill-to whatacc M. n0 m handed Q want-to-know 'John wants to know what Mary handed to Bill.'

(24)

Tom-ga [ Bill-ni nani-o John-ga [ Mary-ga t t watasita to] T.nom Bill-to whatacc J-nom M.nom handed that omotteiru ka] siritagatteiru think Q want-to-know 'Tom wants to know what John thinks that Mary handed to Bill.'

68

Grewendorf & Sabel:

Wh-Scrambling

We therefore assume that Japanese overt (long) w/z-movement to the initial position of a [+wh] clause is adjunction to AgrsP. However, we would like to maintain an important aspect of Takahashi's analysis of (21) and (22), namely that in Japanese, overt w/z-movement to the initial position of a [+wh] clause prohibits further movement at LF. We therefore restate the relevant constraint in such a way that overt movement of a whphrase to a [+wh] operator position prevents the w/z-phrase from undergoing any further movement at LF. What we have seen so far is that scrambling of a w/z-phrase to the initial position of a [ - w h ] clause can be undone at LF (21b), whereas the same movement cannot be undone if the clause is headed by a question marker (22b). Given that apart from this difference, w/z-scrambling in Japanese generally obeys the constraints on scrambling developed in the preceding sections, we can assume that in (22b) the long scrambled wh-object adjoins to the matrix AgrsP and that no further LF-movement is required since it already occupies an operator position. This analysis is not only compatible with the constraint on adjunction, it is supported by the fact that the scrambled wh-object in (22b) only has matrix scope. An obvious problem for our analysis is raised by examples such as (25) (Takahashi 1993): (25)

John-wa [ nani-o\ Mary-ga t\ tabeta ka] siritagatteiru no? J-Top whatacc M.nom ate Q want-to-know Q 'Does John want to know what Mary ate?' or 'What does John want to know whether Mary ate?'

As was the case with the examples under (22), in (25) both the embedded clause and the matrix clause are marked as interrogative sentences. However, unlike (22b), the scrambled w/z-phrase in (25) can have the wide scope reading as well as the narrow scope reading, as indicated in the translation. This seems to be at variance with the claim that a w/z-phrase that has moved to the initial position of a [+wh] clause cannot undergo further movement at LF. Hence, if the scrambled Wz-phrase in (25) were in an operator position, the fact that it takes scope over the matrix clause would not only militate against the generalization that we have argued for in the preceding paragraphs, it would also constitute a clear counterexample to our constraint on adjunction. To solve this problem it is crucial to notice that the wh-phrase in (25) has undergone short scrambling. According to the analysis of scrambling that we have developed in section 1, short scrambling in Japanese is to an A-position. We can therefore assume that the scrambled wh-phrase in (25) is not located in an operator position (adjoined to AgrsP) but occupies the Spec2 of the embedded AgrsP. To get from there into an operator position, it can either adjoin to the embedded AgrsP or to the matrix AgrsP (yielding the narrow or the wide scope reading for the wh-phrase), both options being in line with the constraint on adjunction as well as with the generalization derived from (22b). The difference between (22b) and (25) can then be taken to lend further support to our analysis of scrambling in Japanese. However, as was first brought to our attention by Mamoru Saito (p.c.), Japanese whscrambling also allows a combination of long scrambling into a [-wh] clause and upward LF-movement into an operator position that seems to confront our constraint on

Grewendorf

& Sabel:

Wh-Scrambling

69

adjunction with serious empirical problems. A relevant example can be found in Takahashi (1993): (26)

Kimi-wa [cpnani-o; John-ga [cp Mary-ga t\ tabeta ka] sitteiru to] youxop whatacc J-nom M.nom ate Q know COMP omotteiru no? think Q 'Do you think that John knows what Mary ate?' or 'What do you think that John knows whether Mary ate?'

In (26), the wh-object nani-o is long scrambled, hence adjoined, to a [ - w h ] clause. As indicated in the translation, this w/z-phrase may take scope over the highest clause, which is a [+wh] clause since it contains the question marker no. So it is to be assumed that the long scrambled w/z-phrase is able to undergo LF-movement to an operator position in the highest clause, which, as we have argued, is again adjunction to AgrsP. So the steps of movement that seem to be involved in (26) can be schematized as follows: (27)

[c'P[+whl ... [cP[-wh] nani-oj ... [cP[+wh] - ti ... ]]] | LF-movt. | | overt movt. |

It is obvious that the steps of movement indicated in (27) violate the constraint which states that adjunction must not operate in successive cyclic fashion. A solution to this problem suggests itself if we take note of the fact that long scrambling of the wh-phrase in (26) is to the initial position of a [ - w h ] clause. As we have seen in the case of (21b), this sort of scrambling can be undone at LF. We can therefore assume that the wh-phrase in (26) is first reconstructed at LF to its base position and then moved either to the operator position of the lowest clause to create the narrow scope reading or the operator position of the highest clause to create the wide scope reading (cf. also the discussion of example (21)). Before we take a closer look at the questions how exactly LF movement of the whphrase has to proceed in the case of (26) and how it is triggered, we would like to point out that the solution proposed for (26) receives independent support from the following. It can also explain why the wh-phrase has only the intermediate clause as its scope if the intermediate and the matrix verbs in (26) are replaced with a verb selecting a [-wh] clause and one selecting a [+wh] clause, respectively (cf. Takahashi 1993, fn. 5): (28)

Kimi-wa [nani-oj John-ga [Mary-ga f; tabeta to] omotteiru ka] kikimasita ka? youjop whatacc J-nom M-nom ate COMP thought Q asked Q 'Did you ask what John thought that Mary ate?' or 'What did you ask whether John thought that Mary ate?'

Here the selectional properties of the matrix verb kikimasita 'asked' require the intermediate clause to be an interrogative clause. Contrary to what we found in (26), the w/z-phrase, which is again long scrambled to the initial position of the intermediate

70 clause, is now moved to the initial position of a [+wh] clause and, unlike (26), takes only scope in its surface position. Now the analysis that we suggested for (26) permits us to treat (26) on a par with (21b) and (28) on a par with (22b). In other words, unlike (26), where the wh-phrase can be reconstructed, the w/z-phrase in (28) has been long scrambled to an operator position and thus cannot undergo any further movement at LF, in accordance with the constraints to be found in Lasnik and Saito (1992) and Epstein (1992).

4

Weak vs. Strong [+PF/z]-features

Let us now return to the question as to how long w/z-movement, as required for the wide scope reading of (26), proceeds at LF when originating from the base position where the w/z-phrase has been reconstructed. If the wh-feature is located in Agrs, as we argued above, we would expect this movement to operate in a successive cyclic way through the broadly L-related positions associated with Agrs-heads bearing the w/z-feature.7 But this kind of movement would be successive cyclic adjunction, which is in conflict with the constraint on adjunction. To solve this problem we assume that it is only the strong wh-feature that is located in Agrs in Japanese whereas the weak wh-feature is associated with C°, as traditionally assumed. We thus ensure that at LF, there is successive cyclic w/z-movement through Spec,CP. If we conceive of LF-movement as feature movement along the lines of Chomsky (1995), who assumes that covert operations consist of adjoining a set of features to a head, our analysis would be tantamount to saying that LF w/z-movement in Japanese involves the C-system while overt w/z-movement involves the Agr-system. The assumption that successive cyclic LF movement of w/z-phrases involves the Csystem is required on independent grounds. Lasnik and Saito (1992:37) point out that in Japanese (as well as in Chinese) the existence of successive cyclic LF movement is reflected in the fact that a wh-adjunct that is embedded in a declarative embedded clause can have scope over the matrix clause as in (29). (29)

Kimi-wa [cpkare-ga naze konakatta to] omotteru no? youjop he no m why came-not COMP think Q 'Whyj do you think [ that he didn't come ] ?'

Note that, on the other hand, long scrambling of adjuncts is subject to a different locality restriction, i. e. it is impossible in Japanese (see (15b)). Given our constraint on adjunction we have to conclude for (29) as well that successive cyclic w/z-movement at LF checks a weak wh-feature in C" and thus does not use an adjoined position as an intermediate landing site. There is independent evidence for the assumption that the strong wh-feature is checked in the Agr-system in Japanese, while the weak wh-feature is checked in the Csystem. The evidence concerns the different readings of an example such as (30) (Saito 1994a) and is related to the question of whether a strong wh-feature in Agrs and a weak wh-feature in C° may cooccur in one and the same sentence.

Grewendorf & Sabel: Wh-Scrambling (30)

71

Kimi-wa [ C P [ I P dono hon-oj [ip dare-ga tosyokan-kara t\ karidasita]] ka] youTop which bookaCc whonom library-from checked-out Q siritai no want-to-know Q 'Q you want to know [Q [which bookj, who checked out t\ from the library]]'

The structural positions of the w/z-phrases in (30), which is an interrogative sentence with an embedded interrogative clause, can be schematized as in (31): [CP ... [CP [IP wh a c c [IP whnom ••• 'acc ••• ] ] Q ] Q ]

(3 1)

According to Saito (1994a) there are four logically possible scope interpretations of the w/z-phrases: Both w/z-phrases take embedded scope and the matrix clause is a yes/no question. b. Both w/z-phrases take matrix scope and the embedded clause is a yes/no question. c. w/zacc takes matrix scope and whnom takes embedded scope. d. whnom takes matrix scope and w/zacc takes embedded scope.

(32) a.

According to Saito's judgements, the reading in (32a) is allowed and those in (32b) and (32c) are marginally possible, whereas the reading in (32d) is totally impossible. We will first try to show that (32d) could be derived if the strong wh-feature in Agrs and the weak w/z-feature in C (whether 'defective' or not) were allowed to cooccur in the embedded clause of (30). In that case, wh-acc would be overtly adjoined to IP (AgrsP) to check the strong w/z-feature in Agrs, and the wh-nom would covertly move to the specifier of the matrix CP, passing the embedded Spec,CP where it checks the 'defective' weak wh-feature in C (see Ferguson & Groat 1994). Provided that only the strong w/z-feature in Agrs is present and checked by overt movement of wh-acc, successive cyclic movement of w/z-nom would no longer be possible. Proceeding from the assumption that the strong w/z-feature in Agrs and the weak w/zfeature in C are not allowed to cooccur in one and the same clause, we can not only account for the ungrammaticality of (32d), we also get derivations of the other readings that accord with Saito's judgements. Note that the constraint on the cooccurrence of strong and weak w/z-features need not be stipulated. It follows from an economy principle suggested in Chomsky (1995:sect.5.4), according to which an element a "enters the numeration only if it has an effect on output". To derive reading (32a) we have to assume that the w/z-feature of the embedded clause is weak (if it were strong it would be overtly checked by wh-acc and w/z-nom could not be moved at LF). If the embedded w/z-feature is weak, the scrambled wh-acc in (32) must be in Spec2 of the embedded AgrsP, hence in an A-position, to check a Efeature. Since we adopt Saito's (1994a) assumption that antecedent government is possible from an A-adjoined position, but not from an A'-adjoined position, we have to assume that at LF, w/z-nom first adjoins to wh-acc and then the w/z-complex moves to the embedded Spec,CP.

72

Grewendorf

& Sabel:

Wh-Scrambling

As for reading (32b), we still have to assume that wh-acc occupies Spec2 of AgrsP and that there is a weak w/z-feature in the embedded as well as in the matrix clause. Again, wh-nom adjoins to wh-acc, but in this case the complex w/z-element has to cross a wh-island (created by the embedded w/z-complementizer) to reach the matrix Spec,CP. This explains the marginality of (32b). Reading (32c) is derived by assuming again that the wh-feature of the embedded clause as well as that of the matrix clause is weak. Then wh-acc must occupy Spec2 of the embedded AgrsP. At LF wh-nom moves to the embedded Spec,CP and wh-acc moves to the matrix Spec,CP, thereby crossing the wh-island. Since Saito (1994a) rejects the suggestion made in Saito (1987) to account for the different readings of (30) in terms of a 'linear crossing constraint', the problems associated with the interpretation of (30) must still be considered as unexplained. The fact that our analysis offers an account for the readings in (32) can be taken as evidence for our assumptions about the distribution of strong and weak w/z-features in Japanese. 8 It is a consequence of our analysis that there are four instances of w/z-movement in Japanese: (33) a. b. c. d.

A-movement of w/z-phrases to Spec2 in the presence of a L-feature. Adjunction of w/z-phrases to AgrsP in a [-wh] clause in the presence of a Efeature. Adjunction of w/z-phrases to AgrsP in the presence of a strong w/z-feature. Movement of w/z-phrases to Spec,CP in the presence of a weak w/z-feature.

In contrast, w/z-phrases in German may not bear ^-features, and Agrs may not bear a [+wh]-feature. Hence w/z-scrambling is impossible in this language. 9 Our analysis therefore implies that movement of w/z-phrases is no longer conceived of as a special type of w/z-movement with specific properties. It is rather considered an instance of movement whose properties follow from independent properties of the language at issue. This view of w/z-movement is in full accord with basic assumptions of the Minimalist Program.

5

The Additional-w/z Effect

In this section, we would like to present one more piece of evidence in favor of the constraint on adjunction that we have tried to defend in the preceding sections. It has to do with restrictions on the so-called 'additional-w/z effect' (Saito 1994a). Although Japanese allows multiple w/z-questions rather freely, it exhibits an ECP effect in a base-generated configuration where an adjunct w/z-phrase like naze precedes a w/z-object as in (34a). Interestingly, the ECP effect disappears when the wh-object is scrambled into a position preceding the Wz-adjunct as in (34b), or when a third w/zphrase is added as shown in (34c) (Watanabe 1992, Saito 1994a): (34) a. * John-ga naze nani-o katta no J-nom why whatacc bought Q

73 b. John-ga nani-oj naze t\ katta no J-nom whatacc why bought Q c.

Dare-ga naze nani-o katta no whonom why what ac c bought Q

The effect displayed by (34) is reminiscent of the superiority violation in languages such as English, which can also be 'remedied' by the addition of a third wh-phrase: (35) a. * What booksj did you persuade who PRO to give t\ Peter? b. What books; did you persuade who PRO to give t\ to whom? The crucial difference between Japanese and English is that in English, the remedying effect is achieved by adding a w/z-phrase in a position that is lower than the other whphrases while in Japanese, it is achieved by adding a wh-phrase in a position that is higher than the other w/z-phrases. Watanabe (1992) therefore descriptively contrasts the English 'Superiority Effect' with the Japanese 'Anti-superiority Effect'. According to the former, a multiple w/z-question is well-formed only if there is a wh-phrase that does not c-command the trace of the w/z-phrase that is extracted first. According to the latter, a multiple w/z-question is well-formed only if there is a wh-phrase that is not ccommanded by the trace of the w/z-phrase that is extracted first. In other words, in English multiple w/z-questions it is not licit to extract the lowest wh-phrase first while in Japanese mulitple w/z-questions it is not licit to extract the highest w/z-phrase first. The facts in (34) are explained by Watanabe (1992) in terms of a 'Principle of Relation Preservation', according to which a relation established at a certain point in the derivation must be maintained throughout. In (34a) the w/z-adjunct would have to be moved first in order to ensure antecedent government of its trace. But then the whobject nani-o, which is adjoined to the Wz-operator as shown in (36), would c-command the w/z-adjunct at LF, not preserving the relation that exists at S-structure where the latter c-commands the former: 10 (36)

a NPi / \ nani-o NPi nazej

NPj I 0Pi

As for (34b), the complex w/z-operator (36) is in line with the Principle of Relation Preservation since the w/z-object c-commands the w/z-adjunct at S-structure. Finally, the additional-w/z effect, as represented by (34c), is explained as follows. Given that the empty operator must again originate from naze for reasons of antecedent

74

Grewendorf & Sabel:

Wh-Scrambling

government, the Principle of Relation Preservation would again be violated as far as the LF-relation between naze and nani-o is concerned. However, it would be fulfilled with respect to the relation between naze and the added w/z-phrase dare-ga. Watanabe therefore hypothesizes that this principle is fulfilled as long as the wh-phrase coindexed with the empty operator (and hence with the head C) is in a licit relation with one other w/z-phrase. Saito (1994a) adopts Watanabe's account of the difference between (34a) and (34b) but proposes an alternative analysis for the (higher) additional-w/z effect. According to this alternative, an offending wh-phrase can be saved by a higher wh-phrase because it may adjoin at LF to this w/z-phrase, forming a complex w/z-element, which then moves to the Spec,CP position. Since the adjoined w/z-phrase (naze in (34c)) is adjoined to a w/z-element in an A-position and since antecedent government is taken to be possible from a position adjoined to an A-position, the trace of the wh-adjunct that is supposed to be adjoined to the w/z-subject in the LF-representation of (34c) does not violate the Saito takes his analysis to be superior to Watanabe's since the latter fails to account for two clause-boundedness restrictions that the additional-w/z effect is subject to. First, the additional-w/z effect is not operative when the additional higher w/z-element is long scrambled out of a deeper clause. This restriction is illustrated by the examples (37) and (38) (Saito 1994a): (37) a . * naze dare-ga Mary-ni [cp John-ga sono hon-o katta to] ittano why whonomM.dat J-nom that bookacc boughtCOMP said Q 'Q who told Mary [that John bought that book] why.' itta no b. *Mary-nij naze dare-ga t\ [cp John-ga sono hon-o katta to] M.dat why whonom J-nom that book a c c bought COMP said Q c.

itta no dare-nij naze dare-ga t\ [cp John-ga sono hon-o katta to] whodat why whonom J-nom that book a c c bought COMP said Q katta to] itta no bought COMP said Q that bookacc why who n 0 m M.dat j-nom 'Q who told Mary [that John bought that book] why.'

b.?*nani-oj naze dare-ga Mary-ni [cp John-ga t\ katta to] ittano whatacc why who n om M.dat J-nom bought COMP said Q 'Q who told Mary [ that John bought what ] why.' (37) shows that the wh-adjunct naze in the matrix clause can be rescued by scrambling the indirect wh-object dare-ni (as opposed to a non-wh indirect object) of the matrix verb in front of the wh-adjunct. (38) shows that «aze-rescuing is not achieved if the w/z-phrase that is moved to the front of naze is scrambled out of a finite clause. 1 3 Secondly, as shown by (39), the additional-w/z effect fails to take effect if the added higher wh-phrase is in another clause than the w/z-phrase to be rescued:

75 (39) a. * Mary-ga [CP John-ga naze nani-o katta to] omotteiru no M •nom J-nom why what bought COMP think Q 'Q Mary thinks [that John bought what why].' b.

Mary-ga [CP dare-ga naze nani-o katta to] omotteiru no M •nom whonom why what a cc bought COMP think Q 'Q Mary thinks [that who bought what why].'

c.?*dare-ga [cp John-ga naze nani-o katta to] omotteiru no whonom J-nom why what a c c bought COMP think Q 'Q who thinks [that John bought what why].' Comparing (39b) and (39c) one can see that the w/j-adjunct of the embedded clause can be rescued by adding a w/i-subject dare-ga in the embedded clause but not by adding it in the matrix clause. Let us first consider the examples in (38). Example (38a) can be analyzed on a par with (34a). It violates either the ECP or the Principle of Relation Preservation. As far as (38b) is concerned, we know that the long scrambled embedded object is adjoined to the matrix AgrsP. Since the matrix clause is an interrogative clause, we can conclude that the matrix Agrs bears a strong wh-feature. Hence, the long scrambled wh-object can be taken to occupy an operator position. Further movement of this object, which would independently be barred by the constraint on adjunction, is thus not required in the first place. Since presence of a weak wh-feature in C is excluded on our assumptions, the Wz-adjunct must adjoin to the operator at LF, yielding a superiority-like configuration of the corrylex operator from which the trace of the w/z-adjunct is not antecedentgoverned. Our account of (38b) predicts that the long scrambled object nani-o may rescue the wh-adjunct in (38b) if it originates from a control infinitive rather than from a finite clause. This prediction is in fact borne out as shown by example (40), taken from Nemoto (1993): (40)

Nani-oj naze dare-ga Michael-ni [PRO t\ utau yoo(ni)] itta no what a c c why who n o m M. d a t sing told Q 'What, why who told Michael to sing.'

It follows from our analysis of scrambling out of infinitivals that in (40), the long scrambled w/z-object occupies the Spec2 position of the matrix AgrsP. Since this position is an A-position, the w/z-adjunct naze can be rescued in much the same way as in (34b) by adjoining to the scrambled w/z-object at LF. Then, nothing prevents the complex w/z-element (or its wh-feature, cf. Chomsky 1995) from moving to the operator position at LF. Turning to the second clause-boundedness effect, we first have to determine the nature of the wh-feature of the interrogative clause in (39c). If this feature is strong and located in Agrs of the matrix clause, the w/z-subject of the matrix clause occupies a position adjoined to the matrix AgrsP. Since the embedded clause is not an interrogative clause, it cannot contain a weak w/z-feature in the C-position. Hence, successive cyclic

76

Grewendorf & Säbel:

Wh-Scrambling

LF-movement of the wh-adjunct can only be triggered by a defective strong wh-feature borne by the embedded Agrs. But if the wh-adjunct adjoins to the embedded AgrsP, as required by the defective feature in Agrs, it cannot undergo any further adjunction due to our constraint on adjunction. It is therefore prevented from reaching the w/z-subject in the matrix clause. Let us now assume that the w/z-feature in the matrix clause is weak and located in the C-position. In that case, the w/z-subject of the matrix clause is in situ, and there is a defective weak wh-feature in the embedded C-head. Successive cyclic LF-movement of the embedded wh-adjunct then operates through the embedded Spec,CP position. However, from the embedded Spec,CP the w/z-adjunct is not allowed to adjoin to the matrix subject for independent reasons (Müller and Sternefeld 1993, Grewendorf and Sabel 1994). It would have to move to the matrix Spec,CP position by successive cyclic 'COMP-to-COMP' movement. In that case, however, it is not only not 'rescued' by the matrix w/z-subject, it rather finds itself in the same structural configuration with its clausemate nani-o which caused the ungrammaticality of the simple clause (34a).

6

Conclusion

In this article we have discussed scrambling in German and Japanese on the basis of different properties of the Agr-system in both languages. We argued that differences between the A-/A'-properties of scrambling in both languages as well as the locality restrictions holding for long scrambling out of finite clauses can be attributed to the fact that different structural positions are used as landing sites for scrambling in the two languages. Drawing on a consequence of 'Bare Phrase Structure Theory' (Chomsky 1994) we argued that unlike German, where scrambling is XP-adjunction, the Agrsystem in Japanese exihibits layered specifiers, which function as A-positions and provide landing sites for scrambling. The discussion of w/z-scrambling in Japanese has shown that overt w/z-movement in Japanese also involves the Agr-system, in contrast to German, where the [+wh]-feature is located in C. We have argued that w/2-scrambling in Japanese may apply overtly to check a strong [+wh]-feature in Agrs, whereas only the weak [+wh]-feature may be located in C in Japanese, triggering covert movement. This analysis accounts for several scope ambiguities found with w/z-scrambling in Japanese, as well as for the additional w/z-effect. Throughout the paper we have suggested that the constraint on adjunction, according to which adjunction movement is a 'dead end', is a substantive constraint on movement, compatible with the properties of scrambling in German and Japanese.

Notes We would like to thank the participants of the workshop 'The Role of Economy Principles in Linguistic Theory', especially John Frampton, Hans-Martin Gärtner, Jane Grimshaw, Hubert Haider, Peter Staudacher and Chris Wilder. We also thank Mamoru Saito for helpful discussions. Work for this article was supported by DFG grant # Gr 559/5-1.

77 1

2

3

4

5

Fukui (1993) a s s u m e s that in consonance with the head-parameter, S V O languages have optional m o v e m e n t to the right whereas S O V languages like Japanese have optional m o v e m e n t (scrambling) to the left, and that neither m o v e m e n t counts for e c o n o m y . However, w h a t is not predicted in Fukui's analysis is the fact that S V O languages like Polish and Russian also have scrambling, hence optional m o v e m e n t to the left. W h e t h e r scrambling is motivated by feature-checking or w h e t h e r it represents real ' o p t i o n a l m o v e m e n t ' is a difficult question, but if scrambling is a syntactic p h e n o m e n o n and optional m o v e m e n t is impossible in general, then scrambling should never be possible for the reasons outlined in the text. Hence, w e proceed f r o m the assumption that scrambling is triggered by feature checking. W e a s s u m e that subjects in G e r m a n and Japanese m o v e to Spec,IP in the overt syntax. H o w e v e r nothing really hinges on this question. As far as we can see, our analysis is also compatible with the v i e w that subjects remain in V P in the overt syntax. A c c o r d i n g to C h o m s k y (1993) an element a is in the checking d o m a i n of a h e a d (X) if (i) it is in a Spec-head relation with X, or (ii) it is in a position adjoined to the head X, or (iii) it is a d j o i n e d to the m a x i m a l projection of X. or (iv) it is a d j o i n e d to the specifier of X. This analysis does not necessarily imply that scrambling out of finite clauses can only cross one clause b o u n d a r y . In fact, e x a m p l e s of ' s u p e r - s c r a m b l i n g ' , where a scrambled element crosses t w o C P nodes, are attested in Japanese (Takahashi 1993:665, Sakai 1994:308). If these constructions really represent instances of m o v e m e n t , their grammaticality can be attributed to the fact that certain matrix verbs in J a p a n e s e license long A - m o v e m e n t out of finite clauses, as argued in Ura (1994). See G r e w e n d o r f and Sabel (1995) for further discussion of this topic. For e x a m p l e , the p r o p o s e d analysis predicts that Japanese allows Super-Raising (Ura 1994) in contrast to G e r m a n and it has implications for the analysis of surprising a s y m m e t r i e s b e t w e e n German and J a p a n e s e r e m n a n t - m o v e m e n t p h e n o m e n a as well. It is a well-known fact that scrambling out of a d j o i n e d categories is impossible, as can be seen f r o m e x a m p l e s with scrambling out of scrambled C P s in G e r m a n ( G r e w e n d o r f and Sabel 1994): (i)

* daß [den H u n d j j zweifellos that the d o g a c c undoubtedly

|/j zu füttern]; to feed

keiner nobody

versuchte tried

T h e c o r r e s p o n d i n g construction is grammatical in Japanese because short C P - s c r a m b l i n g in Japanese is not X P - a d j u n c t i o n but substitution to a specifier of A g r P (Saito 1994a:226): (ii)

6

[ ¡ p s o n o h o n - o j [ j p John-ga [ c p l A g r s P t C P M a r y - g a i\ katta to]j [ Bill-ga /j itta]] that b o o k a c c J.nom M.nom bought COMP B . n o m said to] omotteiru]] COMP think ' T h a t book, J o h n thinks that [that Mary bought t] Bill said.'

See G r e w e n d o r f and Sabel (1995) for an extensive discussion of these matters, as well as for m o r e empirical m o t i v a t i o n for treating short scrambling as A - m o v e m e n t in Japanese and as A ' - m o v e m e n t in G e r m a n in the context of a discussion of parasitic gaps, weak crossover, and a n a p h o r reconstruction e f f e c t s with scrambling. N o t e that the s c r a m b l e d indirect object Bill-ni in (23) and (24) cannot be t a k e n to o c c u p y a V P - or A g r o P - a d j o i n e d position in the highest clause since scrambling out of a finite c l a u s e obligatorily targets IP (Saito 1994b, G r e w e n d o r f and Sabel 1995). Furthermore, the s a m e ordering of scrambled non-whand w/;-phrases as in (23) and (24) can be f o u n d in n o n - e m b e d d e d contexts: (i)

Bill-ni nani-o John-ga | M a r y - g a / t watasita to] omotteiru no Bill-to w h a t a c c J-nom M-nom handed COMP think Q ' W h a t does John think that M a r y handed to Bill?'

78 7

8

Grewendorf & Säbel: Wh-Scrambling Recall that, following the analysis in Ferguson and Groat (1994), we assume that successive cyclic whmovement is determined by some sort of 'defective' w/i-features that are associated with relevant intermediate heads in much the same way in which non-finite T bears defective Case-features in English E C M constructions. A problem for our analysis seems to arise in the presence of a configuration such as (i) (i)

[CP nanij-o

... [CP ... naze

Q ]Q]

In (i), the embedded object narti-o is long scrambled to the initial position of an interrogative clause and the adjunct naze remains in situ in the embedded clause, which is also an interrogative clause. If the strong vc/;-feature of the matrix Agrs were present in the embedded Agrs as a 'defective feature' preventing the embedded C from bearing a weak vc/i-feature, there would be no way for the embedded wh-adjunct to move at LF. To solve this problem one has to take note of the fact that it is neither necessary nor possible that the strong w/j-feature of the matrix clause is realized as a defective feature in the embedded clause, since (i) cannot and need not be derived as successive cyclic w/j-movement of the embedded wA-object. Recall that in Japanese, overt successive w/j-movement is not allowed in our account because of the constraint on adjunction. The object in (i) thus must first move to Spec2 of the embedded AgrsP to check a E-feature, which is A-movement, and from there undergo A'-movement to its final position in one step. Regardless of this derivation one may consider the idea that a strong defective wA-feature, should it be possible at all, does not prevent a weak w/?-feature from appearing. After all, in the case of overt successive cyclic vi'/i-movement in a language like English, the defective w/i-feature in the C-position of an embedded declarative clause does not make this clause an interrogative sentence. 9 See Grewendorf and Sabel (1995) for further typological predictions of this analysis with respect to multiple fronting languages like Bulgarian, Czech and Polish. 10 Actually, the structural notion used by Watanabe (1992:106) is the notion of 'seg-command': X segcommands Y iff X does not dominate Y and every segment that dominates X dominates Y. Watanabe further assumes that in Japanese, an empty w/j-operator moves to Spec,CP at S-structure (from the specifier position of a wh-DP), hence at LF a = CP in (36). 11 We do not want to go into the problem of how a trace can be licensed at an intermediate stage of the derivation from S-structure to LF (see Saito 1994a, section 3.1.). 12 Sohn (1994:318) points out that there is also an S-structure effect according to which an adjunct whphrase within an island can also avoid violating the ECP when there is a non-wh argument phrase which moves out of that island together with it: (i) a. *

John-wa [[Mary-ga sono hito-o naze uttaetato io] uwasa]-o kiita-no J-Top M-nom the man a cc why sues r u m o u r a c c heard-Q ' W h y did John hear [the rumour[that Mary sued the man /]]?'

b. ?(?) sono hito-oj the manacc

nazej John-wa [[Mary-ga why J-Top M-nom

t\ /j uttaeta to iuj uwasa]-o kiita-no sued r u m o u r a c c heard-Q

13 It should be pointed out that the adjunct naze in (37) must be base-generated in the AgrsP-adjoined position. If the adjunct occupied this position as a result of movement, then our constraint on adjunction would imply that it cannot be 'rescued' by undergoing further adjunction to the rescuing higher whphrase. But note that the possibility of base-generating naze in AgrsP-adjoined position seems to create a problem for our account of the ungrammaticality of long adjunct scrambling in examples such as (15b). As pointed out to us by Hans-Martin Gärtner, scrambling of adjuncts out of finite clauses cannot be ruled out by our constraint on adjunction if the adjunct is base-generated in AgrsP-adjoined position since no successive cyclic adjunction movement is involved in a structure like (i): (i)

nazej [AgrsP ••• [CP [AgrsP 'i [AgrsP ...]]...]...]

Grewendorf & Säbel: Wh-Scrambling

79

Notice, however, that the embedded Agrs-head in (i) bears a Z-feature that needs to be checked. Following Chomsky (1995) we assume that with the potential exception of expletive insertion, a checking relation cannot be established by the operation Merge. It then follows that (i) does not represent a convergent derivation. On the other hand, in (ii) checking of the ^-feature has taken place— but this derivation violates the constraint on adjunction: (ii)

nazei [AgrsP ••• [CP [AgrsP 'i' [AgrsP 'i [AgrsP —JIT—3—1

Thus, the observation that adjuncts can be base-generated in AgrsP-adjoined position, does not undermine our account of the ungrammaticality of adjunct scrambling out of finite clauses. 14 This account is similar to the one given in Saito (1994a), where it is assumed that long scrambling is A'-movement and that antecedent government is impossible from an A'-adjoined position. In Saito (1994b), a different account is proposed in which Saito proceeds from the idea that the long scrambled w/)-object must undergo reconstruction thereby providing no 'rescuer' at LF for the w/j-adjunct to adjoin to.

References Baker, M. 1988. Incorporation. A Theory of Grammatical Function Changing. Chicago: The University of Chicago Press. Boskovic, Z., and Takahashi, D. 1995. Scrambling and Last Resort. Ms., University of Connecticut and CUNY. Chomsky, N. 1993. A Minimalist Program for Linguistic Theory, In The view from building 20, ed. K. Hale and S. J. Keyser, 1-52. Cambridge, Mass.: MIT Press. Chomsky, N. 1994. Bare Phrase Structure. MIT Occasional Papers in Linguistics 5. Chomsky, N. 1995. The Minimalist Program. [To appear, MIT Press. (Draft of chapter 4)]. Chomsky, N., and Lasnik, H. 1993. The Theory of Principles and Parameters. In Syntax: An International Handbook of Contemporary Research, ed. J. Jacobs et al., 506-569. Berlin: Walter de Gruyter. Collins, C. 1993. Topics in Ewe Syntax. Ph.D. Diss., MIT, Cambridge, Mass. Collins, C. 1994. Merge and Greed. Ms., University of Cornell. Epstein, S. 1992. Derivational Constraints on A' Chain Formation. Linguistic Inquiry 23: 235-259. Fanselow, G. 1990. Scrambling as NP-movement. In Scrambling and Barriers, cd. G. Grewendorf, and W. Stcrnefeld, 113-140. Amsterdam: Benjamins. Ferguson, K.S., and Groat, E.M. 1994. Defining Shortest Move. Ms., Harvard University. Fukui, N. 1993. Parameters and Optionality. Linguistic Inquiry 24:399-420. Grewendorf, G., and Sabel, J. 1994. Long Scrambling and Incorporation. Linguistic Inquiry 25: 263-308. Grewendorf, G., and Sabel, J. 1995. Multiple Specifiers and the Theory of Adjunction: On Scrambling in German and Japanese. Ms., Universität Frankfurt/Main. Holmberg. 1985. Icelandic Word Order and Binary Branching. Nordic Journal of Linguistics 8:161-195. Koizumi, M. 1995. Phrase Structure in Minimalist Syntax. Ph.D. Diss., MIT, Cambridge, Mass. Kuroda, S.-Y. 1992. Whether We Agree or Not: A Comparative Syntax of English and Japanese. In Japanese Syntax and Semantics, ed. S.-Y. Kuroda, 315-257. Dordrecht: Kluwer. Lasnik, H., and Saito, M. 1992. Move a. Cambridge, Mass.: MIT Press. Nemoto, N. 1993. Chains and Case Positions: A Study from Scrambling in Japanese. Diss., University of Connecticut. Rizzi, L. 1990. Speculations on Verb Second. In Grammar in Progress, ed. J. Mascaró and M. Nespor, 2532. Dordrecht: Foris. Rizzi, L. 1991. Residual Verb Second and the Wh-Criterion. Technical Reports in Formal and Computational Linguistics 2, Université de Genève.

80 Sabel, J. 1994. Restrukturierung und Lokalität. Universelle Beschränkungen für Wortstellungsvariationen. Ph. D. Diss., Universität Frankfurt/Main. Saito, M. 1985. Some Asymmetries in Japanese and Their Theoretical Implications. Ph.D. Diss., MIT, Cambridge, Mass. Saito, M. 1992. Long Distance Scrambling in Japanese. Journal of East Asian Linguistics 1: 69-118. Saito, M. 1994a. Additional-w/; Effects and the Adjunction Site Theory. Journal of East Asian Linguistics 3: 195-240. Saito, M. 1994b. Improper Adjunction. In MIT Working Papers 24: Formal Approaches to Japanese Linguistics /, ed. M. Koizumi and H. Ura, 263-293. Sakai, H. 1994. Derivational Economy in Long Distance Scrambling. In MIT Working Papers 24: Formal Approaches to Japanese Linguistics A ed. M. Koizumi and 11. Ura, 295-314. Sohn, K-W. 1994. Adjunction to Argument, Free Ride and a Minimalist Program. In MIT Working Papers 24: Formal Approaches to Japanese Linguistics 1, ed. M. Koizumi and H. Ura, 315-334. Takahashi D. 1993. Movement of Wh-phrases in Japanese. NLLT 11: 655-678. Ura, H. 1994. Super-Raising and the Feature-Based-X-bar Theory. Ms., MIT, Cambridge, Mass. Watanabe, A. 1992. WH-in-situ, Subjacency, and Chain Formation. MIT Occasional Papers in Linguistics 2.

Comparing Reference Sets

Wolfgang Sternefeld A derivational constraint (e.g. subjacency) is a condition that must be met by each operation used in the course of generating a well-formed syntactic structure. A transderivational constraint (e.g. Fewest Steps) is a condition that must be satisfied by a derivation when compared with other derivations. In order to obey a derivational constraint, it is sufficient to look at subsequent steps of the derivation in question; for a transderivational constraint to be satisfied it is necessary to look at a certain set of other hypothetical derivations. I will discuss several possible ways to define this set of hypothetical derivations, the so-called reference set, trying to make various options compatible with the facts. In previous work (cf. Miiller & Sternefeld (1996)), it has been demonstrated that under a certain definition of reference sets, a transderivational constraint like Fewest Steps yields wrong predictions and should be replaced by a derivational constraint, the Principle of Unambiguous Binding. In this paper I will look more closely at alternative definitions of reference sets and at the Shortest Paths Condition as another relevant transderivational concept. I will argue that this condition too yields problematic results; moreover, no correct and straightforward choice of the set of hypothetical derivations is readily available. Granted the complex nature of transderivational constraints when compared to "ordinary" derivational constraints, these results suggest that the transderivational device should be dispensed with in the most simple (i.e. economical) theory of grammar. I will argue that the relevant data can be explained equally well by derivational constraints which substitute for economy constraints without loss of theoretical or empirical adequacy.

1

The Economy Framework

Within the Minimalist Program economy constraints play a central role in ruling out ungrammatical derivations. The core of the minimalist theory of economy can be formulated as a general economy condition which subsumes various more specific economy constraints proposed in the literature. This general statement of economy, given in (1) below, presupposes two supplementary definitions: First, there is a small number of more specific explications of economy which define in what ways derivations can differ, these definitions will provide us with specific measures that describe which of the different derivations are "the best ones." Second, we cannot randomly compare any derivation with any other derivation; the derivations to be compared must have something in com-

Sternefeld: Reference Sets mon. So, there must be a definition of competing derivations which says in which sense the derivations to be compared must be alike. Any set of competing derivations is called a reference set, denoted by RS. The set of specific measures will be denoted by M . The general economy condition stated in (1) is a constraint of Universal Grammar which says that given a certain RS, only those of its elements "survive" (i.e. will not be ruled out as ungrammatical by considerations of economy) that do best with respect to the metrical properties in M . The contents of the theory can thus be stated as follows: (1)

General Economy Condition: Given two derivations Di and D 2 in the same reference set RS, D| is preferred over D2 if and only if Dj fares better than D2 with respect to some metrical measure in M .

As indicated above, the set M provides metrics to evaluate derivations or structures generated by the computational system. Consider, for example, the total number of FORM CHAIN operations that occur in a given derivation or the number of nodes crossed by a movement operation (to be discussed in more detail below). In combination with (1), these two measures yield the two economy constraints known as "Fewest Steps" and "Shortest Paths." For instance, if a derivation Di contains more chains than D-2 and both are in the same RS, then Dj is ruled out as a violation of Fewest Steps, the relevant measure being the number of FORM CHAIN operations a derivation has. If at the same time Di would fare better than D2 with respect to a second (different) measure in M , e.g. with respect to Shortest Paths, both derivations would be ruled out; cf. Chomsky (1993, p. 15), who discusses an apparent conflict between two economy constraints (Fewest Steps and Shortest Move, also called "Shortest Link" or "Minimize Chain Links") on the basis of the assumption that a violation of only one of these constraints is sufficient to rule out a derivation as ungrammatical. This is expressed more explicitly in the following reformulation of (1): (2)

Corollary: A derivation D, is ungrammatical if there is a metric m in M and a derivation Dj in the same RS as D,, such that Dj fares better than D,: with respect to m.

Note that (2) does not prevent a given RS from containing more than one grammatical derivation. On the other hand, economy does not imply that optimal derivations are automatically grammatical; since in general each derivation is subject to additional constraints, it may happen that the most economical derivation is ruled out by an additional constraint, whereas the remaining less economical derivations satisfy all constraints except economy. In that case the RS in question fails to contain any well-formed derivation at all. Having sketched the general framework, it remains to define and exemplify the notion of a RS, a matter to be dealt with in the next section.

Sternefeld:

2

Reference

Sets

83

Defining Reference Sets

In this section I aim to demonstrate that RSs can be defined in a number of different ways. As the empirical domain of the discussion, I have chosen superiority effects; the relevant metrics is the length of a movement path. The following three subsections illustrate three different possibilities of defining RSs, each of which has a different empirical impact on the derivation of superiority from economy.

2.1

Superiority and Shortest Paths

The following quote is taken from Chomsky (1993, p. 14), where the ungrammatically of example (9-a-ii) below is analyzed in terms of what has been called the Shortest Paths Condition: "(9)

a. b.

(i) Whomi did John persuade t] [to visit whom^] (ii) *Whom 2 did John persuade whomi [to visit t 2 ] [...]

Looking at these phenomena in terms of economy considerations, it that in all the 'bad' cases, some element has failed to make 'the move.' In (9aii) movement of whom? to [Spec, CP] is longer in a sense (definable in terms of c-command) than movement of whomi position ..."

is clear shortest natural to this

Chomsky does not make the "economy considerations" explicit in a way that would allow for a complete explanation along the lines suggested. Collins (1994a), however, develops a more detailed theory. He first defines the number of nodes in a movement path as the relevant measure; following Pesetsky (1982), a path generated by move-n can most simply be defined as the set of nodes that dominate the trace of a and are ccommanded by n. Depending on how much invisible structure we are willing to impose on the sentences in Chomsky's example (9), the path of whom\ contains (at least) the VP headed by persuade, the I' and IP whose specifier is John, and the C' headed by did. It is clear, then, that the path of whom2 must contain these nodes as a proper subset; hence, "movement of whom-i to [Spec, CP] is longer in a natural s e n s e . . . " Instead of counting the nodes that dominate a trace, one might as well consider the nodes that c-command a trace; this is the set of nodes crossed by movement. For example, if movement goes from a to ¡1 as in (3), the path of a as defined above consists of X 2 and Xi, but the nodes crossed by movement are Y 2 and Y j . This alternative metrics is defined in (4). It can be based on the notion of asymmetric c-command, with rv asymmetrically c-commanding if and only if a c-commands iJ, but not vice versa.

84

(4)

Shortest Paths Metrics If (i moves to ¡3, M.v,(a

(=Msp): —> /i) is the number of categories Y such that

a.

Y asymmetrically c-commands a and

b.

d (asymmetrically) c-commands Y.

According to (4), it is did, John, and persuade themselves that form the path of whom\ in (9-a-i). Although the two ways of defining the length of a movement path yield the same empirical results, I will use (4) as the basis for further modifications below. Having defined the length of a single movement, it remains to define the length of an entire derivation. Intuitively, this is the sum of the lengths of the movement operations or links it contains. To be precise, let me quote Collins' definition of "nodes traversed" (from Collins (1994a, p. 56)): (5)

Nodes traversed: Let D be a derivation, and {L,} 0 its links. Let 7r, be the path associated with L, and N, be the cardinality of 7r,:, i.e. N, = M sp (7r,). The number of nodes traversed is defined as follows: M. s; ,(D) = the sum of N/ for L, in |L/}D.

(6)

Length of a derivation: D-2 is longer than Di if and only if D 2 traverses more nodes than D], i.e. if and only if M „ / D , ) < M«„(D-2).

(7)

Shortest Paths Condition: If two derivations D | and D 2 are in the same RS and D 2 is longer than D 1 ? then D] is to be preferred over D j .

It is clear that (7) is implied by the general constraint (1), with the relevant metrics being defined as M.SJ)(D). It only remains to explain the notion of a RS, which is defined by Chomsky (1995, p. 393) as in (8): (8)

Reference Set (1): Two convergent derivations are in the same RS if and only if they have the same numeration.

A numeration is understood as a complete list of occurrences of the lexical items used by the computational system in the course of a derivation. Accordingly, all superiority

Sternefeld:

Reference

Sets

85

violations should result from a w/i-phrase having made a longer move than another w/i-phrase in an alternative derivation with the same numeration, i.e. the same lexical material. Note that for this analysis to work properly, it seems necessary to assume that two numerations count as identical even if two of their elements differ in the strength of their features. This simply follows from the assumption that the overtly moved item whomi in (9-a-i) has a strong feature (in the sense of the minimalist theory), whereas the vv/z-phrase whom-i which remains in situ must have a weak feature (for otherwise both w/z-phrases would have to move). In contrast, (9-a-ii) must exhibit the converse distribution of features. Accordingly, if we want to compare two derivations which differ with respect to whether or not an item moves, it seems best to disregard the strength of the features that trigger movement. However, there is a potential problem here, arising from optional movement. As a representative example, consider w/z-movement in French, illustrated in (9): (9)

Optional movement in French: a. Tu as vu qui ? You have seen whom b.

Qui, as/t-tu U vu tp. Whom have-you seen

Given the feature account of movement and the vv/i-criterion, optionality of whmovement will arise if the feature of the w/z-phrase can choose either a strong or a weak value. But as soon as the difference between weak and strong features is ignored when defining RSs, it seems that the variant without movement will incorrectly block the option of movement. We therefore have to assume that of the two features relevant for one single movement (namely the feature of the w/z-phrase that becomes checked after movement and the corresponding feature that does the checking), it is the feature of the moved item that is irrelevant for RS. Nonetheless, strong and weak features of heads - in so far as they check movement of other elements - will still matter in the definition of RSs so that the two derivations, one with and the other without movement, can be put into different RSs. Another example of sloppiness in identity conditions for RSs is provided by the following paradigmatic case for superiority; since movement of what is longer than that of who, the contrast in (10) should be explained straightforwardly by economy: (10)

Subject/object superiority: a. Who bought what? b. *What did who buy?

However, abstracting away from word order (which is what numerations do) will not leave us with the same morpho-syntactic material, as would be necessary for having the same numeration. We conclude that for the two sentences to be in the same RS, identity of numeration cannot be understood too literally, which forces us to ignore different realizations of tense, "inserted elements," or other items that do not truly constitute lexical categories in the narrow sense of the term (cf. Emonds (1985)).

86

2.2

Anti-Superiority Effects: A Revised Definition of Reference Sets

Despite its original appeal, the economy account faces further problems that call for more radical modifications. It has been noticed by Huang (1982, p. 576), Lasnik & Saito (1992, p. 120), and others that the grammatical sentence (12-a) refutes the superiority condition as stated in Chomsky (1973, p. 246): (11)

The Superiority Condition: a. No rule can involve X, Y in the structure ... X...[... Z... WYV...]... where the rule applies ambiguously to Z and Y, and Z is superior to Y. b. The category A is 'superior' to the category B if every major category dominating A dominates B as well but not conversely.

(12)

The counterexample: a. Who wonders what[ who-2 bought t j ? b. ([e] who wonders) [e] who-2 bought what]

(S-structure) (D-structure)

Movement of whati into the lower SpecC position [e] of (12-b) violates superiority, since there is another wh-phrase who-i that could move into [e] and is closer to [e] than what is. Yet (12-a) does have a grammatical reading, namely the one in which who-2 has matrix scope. Therefore, (11) is too strong as it stands. This line of reasoning also carries over to the present explanation of superiority in terms of Shortest Paths. Thus, compare the following two sentences: (13) a. b.

Anti-superiority: Who wonders what] who2 bought 11? Who wonders who-2 h bought what]?

The problem here is that the derivation in (13-b) blocks (13-a) on account of the same considerations that would (legitimately) apply to (10); having identical numerations, the two sentences in (13) are in the same RS, but again, movement of who-i in (13-b) is shorter than movement of what\ in (13-a), and therefore (13-a) is blocked by economy. However, this blocking effect, albeit desirable as far as (10) and Chomsky's examples (9) are concerned, is unwarranted for (13-a). Looking at the examples more closely, it seems that the trouble arises from the ability of one derivation to block another derivation without them having the same semantic interpretation. Hence, the above data suggest that an additional factor is to be taken into account, namely LF. And as far as L F is concerned, it appears that already Baker's (1970) ambiguity illustrated in (14) lends independent support to a revision of the relevant economy constraint: (14)

LF-movement

and Shortest Paths:

who knows where we bought what | l o n g e r , but

shortest move grammatical

|

87 Given that the scope ambiguity of what is accounted for by L F movement into either the matrix SpecC of who or the embedded SpecC of where, the shorter move should not block the longer one; otherwise, no scope ambiguity could arise. Hence, under classical assumptions about LF movement, comparison of LFs should make the correct prediction, because it implies that the two sentences belong to different RSs. In the light of similar evidence, Kitahara proposed the following reformulation of the relevant economy condition: (15)

Shortest Paths (cf. Kitahara (1993, p. 109)): Given two convergent derivations D| and D 2 with the same LF output, both minimal and containing the same number of steps, Di blocks D-2 if D]'s chains are shorter.

In Kitahara's formulation the Shortest Paths requirement is operative only if the derivations already satisfy other economy constraints, i.e. contain the same number of steps and are 'minimal' (a condition not discussed here; note that (15) is identical with the condition in Chomsky (1993, p. 34), except for the phrase "with the same LF output" added by Kitahara). This particular arrangement of conditions contrasts with the more general theory pursued here which presupposes that the economy conditions are independent from one another. I find no reason to depart from this practice; hence, I simplify and reformulate (15) as (16): (16)

Shortest Paths (revised): Given two convergent derivations Di and D 2 with the same LF, D! blocks D-2 if Dj 's chains are shorter.

Accordingly, the appropriate notion of RS, which is required in order to derive (16) from the general economy condition (1), is the following: (17)

Reference Set (2): Two convergent derivations are in the same RS if and only if they have the same numeration and the same LF.

Thus, it would seem that the counterexamples to Chomsky's (1973) superiority condition no longer pose a problem, and indeed, the required distinction that makes (13-a) and (13-b) elements of different RSs can easily be derived from the different scopal positions that result from LF-movement of the in situ w/z-phrases. However, alluding to L F movement creates a number of additional problems. First, recall that in all true superiority cases the overtly moved item "substitutes" in SpecC (i.e. generates SpecC), whereas the covertly moved phrase is adjoined to SpecC. Following the convention of Miiller & Sternefeld (1994, p. 472ff), covert adjunction of w/z-phrases in English goes to the same position as overt adjunction in certain Slavic languages, hence to the right of SpecC (rather than to its left, as seems to be common practice; cf. Kayne (1994)). Thus, the two competing LFs for Chomsky's examples have the specifiers shown in (18): (18)

Structure of specifiers: a. [ rI . [ [ w h o m j whom-J ...]

Sternefeld: b.

Reference

Sets

[n> [[\vh0m2] whom] ] ...]

It is clear, then, that these structures are, strictly speaking, non-identical although they should belong to the same RS. We must assume, therefore, that by definition the two SpecC positions count as identical with respect to sameness of LF. A more severe problem arises from taking into consideration the length of LF paths. By way of illustration, consider Chomsky's examples, repeated here as (19) for convenience. Making minimal but traditional assumptions with respect to the underlying structure, we see that whom] in (19-a) crosses two nodes (John and did), and whom-2 crosses six positions at L F (to, PRO, t\, persuade, John, and did), which makes a total of eight nodes. (19)

Shortest Paths metrics (S-structure + LF): a. Whom] did John persuade ti [ PRO to visit whom 2 ] tt

M

'V = 2

I

b. *Whom 2 did John persuade whomi [ PRO to visit t-2 It 1

=2

I

M.,p =

6

Conversely, overt movement in (19-b) crosses six positions in our "minimal" representation (words and empty categories), and covert movement crosses only two, again a total of eight positions. Hence, LF movement would level out the distinctions needed for explaining superiority. Now, according to Reinhart (1992), Chomsky (1993), Reinhart (1994), and others, the traditional assumption that movement of w/z-in-situ phrases is compulsory at the level of LF has been challenged by semantic considerations. According to Reinhart, wh-words are not interpreted as operators but rather function as bound variables. The relevant binder of these variables is an abstract operator (located in SpecC or adjoined to CP) which roughly corresponds to the abstract Q-morpheme postulated by Baker (1970). This abstract operator simultaneously binds all vv/i-phrases which have, according to traditional terminology, the same "scope" as the abstract operator. There is no LFmovement of w/i-phrases. Following this line of reasoning allows us to get rid of the disturbing effect of LF movement in the calculation of the length of movement. On the other hand, lack of LF-movement does not in and of itself solve the problem. Defining RSs by LF-identity would imply that the two derivations in (19) cannot compete. Since they have different surface structures, they also have different LFs (by lack of LF-movement); hence, we are deprived of any means to explain superiority.

2.3

LF vs. Meaning: A Third Definition of Reference Sets

As a reaction to this problem, Kitahara (1993, p. 115) proposes that economy considerations should take into account semantic interpretation rather than L F as such. Since the LFs in (19) coincide with their S-structures (by lack of LF movement), they are

Sternefeld:

Reference

Sets

89

different, but "as far as L F interpretation is concerned, these two L F representations [corresponding to (19)] are identical. Taking 'LF output' [...] to mean 'LF interpretation', I assume that [ . . . c a s e s like (19-a) and (19-b)] yield the same LF output precisely because they have the same LF interpretation [...]" In other words, what he proposes in terms of "LF output" is (20): (20)

Reference Set (3): Two convergent derivations are in the same RS if and only if they have the same numeration and the same semantic interpretation.

It now follows that what distinguishes the grammatical anti-superiority data from the true superiority effects is that the two derivations to be considered in the real cases of superiority are in the same RS, whereas the grammatical cases, repeated as (21) for convenience, differ in semantic interpretation, and hence cannot compete with respect to economy considerations. (21)

2.4

Anti-superiority effect (=( 13)): a. Who wonders whati who-2 bought t|? b. Who wonders wh02 t2 bought what] ?

Problems and Prospects

Summarizing so far, I illustrated three different definitions for RSs: One is Chomsky's original proposal, which led to difficulties with the Shortest Paths Condition when applied to anti-superiority data. These in turn suggested a new definition of RSs in terms of LF identity, also compatible with most of the arguments in Epstein (1992) or Miiller & Sternefeld (1996). As argued above, this redefinition still did not work for the case under discussion, so I (provisionally) adopted Kitahara's proposal to define RSs in terms of identity of meaning; this version is also defended in Fox (1994), to which I return below. Using Kitahara's definition implies that one has to subscribe to a crucial assumption not shared by everyone working in the generative framework, namely, that there is no LF movement of w/z-phrases. Moreover, the very idea of defining RSs in terms of meaning is vulnerable to conceptual objections. Most importantly, a semantic concept has been incorporated into an otherwise purely syntactic theory; this move clearly runs against the so-called autonomy of syntax. Granted that the data in question could alternatively be accounted for in purely syntactic terms (albeit not necessarily within an economy approach; cf. Epstein (1993)), invoking identity of semantic interpretation is a challenge to Occam's razor. Furthermore, the global character of economy constraints becomes more and more unconstrained. Previously, the globality of the theory consisted in the fact that one could evaluate derivations only after it became clear that they converged, i.e. when they were terminated at LF. This is no longer sufficient in a theory that extends to meaning. Here we can compare derivations only after having computed a semantic

90

Sternefeld:

Reference

Sets

interpretation. As regards interpretation, however, it is not totally clear precisely how "LF-output" or "meaning" should be understood; cf. e.g. Fox (1994) for some discussion of controversial points of view. In the next section I will show that there is also an empirical problem with referring to meaning. In any case, it seems necessary to discuss more data and more consequences of the Shortest Paths Condition. This will be done in the next three sections.

3

Scrambling in German and Korean

In order to be precise about meaning, I assume meaning to be understood in terms of truth conditions. Accordingly, two sentences differ in meaning if and only if they have different truth conditions. An argument against defining RSs in terms of identity of lexical choice and identity of meaning could then be based on two grammatical derivations that have identical truth conditions but differ with respect to the Shortest Paths metrics. Such sentences can easily be found in languages that exhibit more word order variation than English. Let us first look at German main clauses which exhibit verb second order as a result of topicalization of any maximal projection (which would remain in situ in subordinate clauses). Since the topicalized item can be reconstructed, topicalization does not influence truth conditions. Now, since topicalization of an object is longer than topicalization of a subject, we must ensure that these derivations are in two different RSs. This in turn is possible only if they have different numerations, which will be the case if there is a certain top-feature on the topicalized item that enters into numerations. Recall, however, that we decided above that the feature of a moved (vWz-)phrase should be irrelevant for numerations. This assumption should now be called into question, since it looks as if we were driven into an arbitrary decision as to which features count for calculating numerations and which don't. Let us therefore revise our assumptions about w/i-features, assuming now that these features are weak in languages that have fronting of at most one w/z-phrase, and that they are strong in languages that exhibit overt fronting of all w/z-phrases. In English, then, w/i-movement is triggered by the strong head of a CP, as before, but the weak feature of the wh-phrases will still count (insignificantly) for numeration. Since features of moved items now play a role again, we can account for the possibility of free topicalization of any maximal projection. Our next step is to look at the interaction between topicalization and scrambling. Whereas topicalization is usually analyzed as movement into SpecC (followed by movement of the verb into C; cf. Thiersch (1978)), scrambling is analyzed as adjunction to IP or V P (or any functional projection between them). A standard proof for the existence of scrambling (and against analyzing a scrambled object as being base-generated in its surface position) is based on the following two assumptions: (22)

Basic tenets: a. Exactly one maximal projection per clause can be topicalized. b. A transitive verb alone does not constitute a maximal projection.

Sternefeld:

Reference

Sets

91

Now, the possibility of fronting a transitive verb without its object, as exemplified in (23), is regarded as evidence that topicalization must be preceded by movement (i.e. scrambling) of the object out of its VP (cf. Thiersch (1982), Besten & Webelhuth (1990)): (23)

Remnant topicalization [vr t] Zu küssen to kiss

(¡): hat gestern keiner [N1> die Antje ]i t2 versucht, has yesterday no-one ART Antje aix tried

I assume that the infinitival marker z.u in (23) has been incorporated into the verb and that scrambling in (23) is (string-vacuous) adjunction to the VP whose lower segment has undergone topicalization. Now, as pointed out by Müller (1995, p. 89), adjunction of the NP to IP as shown in (24) would be as grammatical as (23). (24)

Remnant topicalization (2): a. [Vp tj Zu küssen ] 2 hat [NP die Antje h gestern keiner t2 versucht. to kiss has ART Antje acc yesterday no-one tried b.

[Vl. t] Zu küssen ] 2 hat gestern [NP die Antje ]i keiner t2 versucht. to kiss has yesterday ART Antje aix no-one tried

Paradoxically, the perhaps most natural, unmarked, or "coherent" variant is (24-a) where the object has moved as close to the topicalized VP as possible. But, as Müller observes, this is also the variant that calls for the longest path. Since there is no detectable difference in numeration or meaning, this derivation is (incorrectly) blocked by the other derivations. Basically the same holds if we reverse the positions of the subject and the adverb: (25)

Remnant topicalization (3): a. Zu küssen hat die Antje keiner gestern versucht. b. Zu küssen hat keiner gestern die Antje versucht. c. Zu küssen hat keiner die Antje gestern versucht.

Here again, the more acceptable variants seem to be the ones with the longer paths. Remnant topicalization in German has been used by Müller (1995) as an argument against defining RS in terms of numeration alone; here we have seen that the same argument carries over to identity of meaning. One might object, however, that there is an easy way to overcome the problem. Since scrambling in German is local, one might try to revise the Shortest Paths metrics by introducing an appropriate notion of equidistance along the lines proposed in Chomsky (1993). The effect of such a move should be that the paths of scrambling are of equal length. This solution, however, does not work for languages with long distance scrambling, like Korean. The following data (courtesy of Shin-Sook, p.c.) show that the landing site of scrambling can be in different finite clauses. Consequently, relaxing locality to permit Korean scrambling - even on a parametric basis - would empty locality of any content whatsoever, thus ruling out any appeal to equidistance:

Sternefeld: (26)

Reference

Sets

Long distance scrambling in Korean: a. [CI> ku ch'aek-ul, Minsu-nun [CP Suna-ka [ t, Achim gekiisst ]]. said who,voA/ hass„/^ Achim kissed

(51)

a.

sagte [IbpI. Maria, habeTl,p [„, Achim t, gekiisst ]] ? said M a r i a ^ c r has.,.„f,j Achim kissed sagte [TopP wen, habeTop [„. Achim t, gekiisst ]] ? said w h o m . i r r has.,,,^ Achim kissed

Wer who b. *Wer who

This implies that if we look for alternative derivations that are cheaper in the sense of economy, these must be derivations that use fewer features. In other words, Fewest Steps should prefer derivations with as few features as possible. For this to be feasible we must assume that at least some features that trigger movement cannot be part of the numerations or the structures that constitute RSs. Given that a w/z-feature on a w/z-phrase is obligatory, and assuming that topicalization features are optional, this means that we

Sternefeld:

Reference

Sets

103

have to find a derivation in the same RS that does without the top feature, i.e. without topicalization. Note in passing that such a surface structure will necessarily differ from the structure to be blocked, a fact which immediately rules out RSs as a possible theory for Fewest Steps. Granted that optional features do not enter into the calculation of RS, the comparison between structures with and without topicalization is straightforward for (49) but requires some elaboration for the German examples (50) and (51). The main verb sagte may take either a [+WH]-CP or a [-WH]-CP as a complement. Assume that it subcategorizes for [+WH]-CP in (50-b) and for [-WH]-CP in (51-b). Now, the S-structures without topicalization are (52-a) and (52-b) respectively: (52)

a.

Fritz sagte [n> wer, [„> t, Achim geküsst habe ]]. Fritz said who Achim kissed has

b.

Wer sagte [ n . daß [„> Achim wen geküsst habe ]] ? who said that Achim who kissed has

However, these structures differ from the topicalized variants in several respects. Due to the lack of embedded topicalization, there is no verb second movement; in (52-b) there is insertion of a complementizer. The question then arises of whether these structures are similar enough to be in the same RS as their topicalized counterparts. Let us first discuss clausal structure. As has been shown in Müller & Sternefeld (1993), there is reason to believe that sentence structure in the Germanic languages is uniform in that every CP contains a Topic-Phrase. Accordingly, the underlying structures of the above sentences are all identical; they differ only in that in (52) there is an empty topic projection, whereas in (50) and (51) the projection of C is empty. Now, with the structures being identical, we can turn to the more specific considerations of economy, beginning with identity of numeration. Assuming that daß is an "inserted element," the situation parallels the one already discussed with respect to (10), where ¿/«-support, like insertion of daß, has to be ignored in the numeration. Moreover, we must, for the sake of the argument, ignore everything that triggers verb second movement. Under these circumstances, RS.y already gives the correct result: In (52-a) S-structure and LF coincide. This is also the LF of (50-b), which is attained by moving the w/i-phrase from SpecTop into SpecC and by undoing verb second movement. The LF of (52-b) is (53): (53)

w e r j wen, [„. t; sagte [ n . [„. Achim t, geküsst habe ]]]

This is also the LF of (51-b). The transition from (51-b) to (53) again requires reconstruction of head movement; moreover, the trace of LF movement must be deleted. The transition from (52-b) to (53) requires deletion of the complementizer daß. We have shown, then, that in these cases identical LFs can be reached without topicalization of w/i-phrases. Such topicalization would require an additional step ruled out by economy. Note that this argument crucially relies on the assumption that there is real LFmovement; without such movement we would have to establish a scope relation, but this would not be sufficient to put the structures to be compared into the same RSs. As noted above, real LF movement is not necessarily inconsistent with the choice of RS,y for English, but rather is so only under the revised Shortest Paths metrics that does not counterbalance the effect of LF movement. On the other hand, this modification

104 cannot work for German, which lacks superiority. It seems, then, that we are trapped in a contradiction. Moreover, our present assumptions beg the question of why topicalization as such, which is optional, is possible in the first place. Granted that truth conditions are identical the difference between a topicalized and a non-topicalized structure can only be traced back to a difference in the distribution of features, but this is exactly what we cannot allow to enter into the definition of RSs if economy is to able to play a significant role independent of Greed and convergence. For a language like German, there is no problem, since we have chosen RS//, so that topicalized and non-topicalized structures differ. However, if we assume meaning invariance for topicalization in English, as seems plausible, we are back to our initial problem with RS,« and the role of features in such a theory. To summarize, many of Epstein's arguments are valid only under precisely his presuppositions, namely LF movement and RS/,f. On the other hand, we have Fox's arguments in favor of RS,v; and against RS/,/,'. Combining these arguments with our account of superiority lead us into conflicting requirements. Besides further conceptual problems with respect to RS,v/ and Greed, there are additional empirical problems with Fewest Steps that will be discussed in the next section. These data suggest a much simpler theory in terms of derivational rather than transderivational economy, as we will see in the following sections.

8

From Transderivational to Derivational Constraints

As shown in Miiller & Sternefeld (1996), w/z-phrases can, under certain well-defined circumstances, undergo optional S-structure movement in languages like German, Russian, Korean, Iraqi Arabic, and Ancash Quechua. For example, as discussed by Wahba (1992), w/z-movement in Iraqi Arabic may front a w/j-phrase, as shown in (54-a), or may not apply, leaving the w/z-phrase in situ, as shown in (54-b): (54)

a.

[ a . meno,

Mona raadat

[ (T t" tijbir

Su'ad [rI> t' tisa'ad t; ]]] ?

b.

whomllal Mona wanted to force Su'ad to help [( ,, - Mona raadat [,T - tijbir Su'ad [ n . - tisa'ad meno, ]]] ? Mona wanted to force Su'ad to help whomdM

In (54-a) w/i-movement of meno ('whom') applies in successive-cyclic fashion via two embedded [-wh] SpecC positions of infinitival clauses to the matrix [+wh] SpecC position. In (54-b) overt w/i-movement does not take place at all. As discussed with respect to French, one might account for this situation by stipulating a strong checking feature in the highest C in (54-a) and a weak checking feature (or no checking feature) for (54-b). But now consider (55): (55)

a. b.

[ that [„» John likes who 2 ]] ? b. Whoi t| said [ (t',) that John came ti ]] ?

Thus, the question arises of whether the derivation in (50) can still be ruled out in the revised Economy approach introduced in the previous section that dispenses with the concept of Form Chain. If we consider more closely the two movement operations applying to the wh-item worüber\ in (50), it turns out that it can. Recall that the first operation in (50) ("W/z-Extraction, Part I") is adjunction to VP, i.e., given the above assumptions,

135 scrambling. Thus, what is involved here is actually a combination of scrambling to VP and w/z-movement to SpecC applying to the same category. Therefore, it does not come as a surprise that the present system excludes (50) as involving improper movement, in more or less the same way as it derives the prohibition against w/i-scrambling (cf. subsection 4.4). Again, there are two 'sub'derivations to be considered: One possibility is that a [scr] feature is absent in the numeration; then, intermediate scrambling of the w/z-phrase violates Last Resort, but Fewest Steps is respected. Alternatively, a [scr] feature shows up in the numeration; then, intermediate scrambling fulfills Last Resort, but the derivation is ruled out by Fewest Steps because there are competing derivations that generate the same LF output with one feature checking operation fewer - viz., the ones in (45) and (47). 27 Thus, the only relevant difference between illcit chain interleaving as in (50) and illicit w/z-scrambling as in (37-a) is that scrambling of the w/z-phrase is undone by overt w/z-movement in the first case, and by covert w/z-movement in the second. 2 8

5.3

Ambiguous Binding

5.3.1

The PUB

In Miiller & Sternefeld (1993), a Principle of Unambiguous Binding (PUB) is introduced as a general prohibition against improper movement. This principle is defined as follows: (52)

Principle of Unambiguous Binding (PUB): A variable that is n-bound must be ¿¡-free in the domain of the head of its chain (where n and ¡1 refer to different types of positions).

Basically, the PUB has the effect that after A-bar movement to a certain kind of position, subsequent movement (applying to the same item) must go to a position of the same type. The types of A-bar position that are recognized are i.a. these: SpecC (the landing site of w/z-movement at S-structure in languages like German and English), SpecCAdj (a landing site for w/z-movement at LF if SpecC is already filled), SpecTop (the landing site of topicalization in the Germanic languages), and XP-Adj (the landing site of scrambling, where X is subject to parameterization and includes the verb and the functional heads of its extended projection in German). Here are some illicit combinations of movement types that are shown to involve PUB violations (i.e., occurrences of ambiguous binding) in Miiller & Sternefeld (1993; 1995): (53) a. b. c. d. e.

SpecTop XP-Adj SpecC SpecC XP-Adj

SpecC-Adj: Mz-Topicalization SpecC-Adj: W/z-Scrambling SpecC-Adj: W/z-Movement at LF from a [+wh] position Specl: Super-Raising SpecC: Chain Interleaving

I have already given ungrammatical sentences that instantiate these restrictions in the preceding sections. It is here that the Greek letters and arrows come into play that accompany many of the earlier examples (recall note 16); non-identical Greek letters indicate PUB violations. As we have seen, all these cases of improper movement are

136

Müller:

Optional

Movement

also ruled out by the conspiracy of Fewest Steps (cf. (32)) and Last Resort (cf. (33)) in the revised Economy approach. In contrast, the following list contains legitimate combinations of m o v e m e n t that are permitted by the PUB. (54) a.

SpecC

SpecC: Successive-cyclic VWi-Movement, Partial W/i-Movement

b.

SpecV

Specl: Raising

c.

Specl —-> Specl: Raising

Again, as shown above, these combinations are systematically allowed under the revised Economy approach. Clearly, there is a redundancy, and the question arises of whether a constraint like the P U B that blocks improper movement can be dispensed with completely, with all its effects now being derived from the interaction of Economy constraints. Indeed, by and large I think that this is the case. In what follows, I consider two more cases where the PUB has been invoked, and show that they are also ruled out by Last Resort and Fewest Steps. 5.3.2

Two Further Cases of Ambiguous Binding

T h e first case involves the well-known prohibition against long-distance scrambling f r o m finite clauses in German. A relevant example is given in (55): (55)

*daß Antje H y g r o m e t e r sagt [ r P t', daß niemand t| mag ] that Antje hygrometers says that no-one likes

1

1

11

"

I

If m o v e m e n t does not apply successive-cyclically, via the embedded SpecC position, the M L C is violated. But what rules out long-distance scrambling via SpecC? As argued in Müller & Sternefeld (1993), this combination of movement operations violates the P U B - movement to SpecC is followed by adjunction to VP, and ambiguous binding of the initial variable results. Interestingly, however, successive-cyclic long-distance scrambling as in (55) is also ruled out by Economy, in more or less the same way as successive-cyclic super-raising (cf. (43)): T h e N P Hygrometerj in (55) has a [scr] feature, but no [+wh] feature. Hence, intermediate substitution in S p e c C violates Last Resort - this position is a typical checking position for the feature [+wh], but not for the feature [scr]. Thus, the PUB is not needed anymore to derive the prohibition against long-distance scrambling in German. 2 9 T h e second case of ambiguous binding that I would like to discuss here concerns an asymmetry between w/j-movement and topicalization in the case of wft-islands in G e r m a n . As shown by (56-a) vs. (56-b), w/z-islands are strict for w/z-movement (even of arguments) in German, whereas topicalization of an argument from a w/i-island typically has an intermediate status (cf. Fanselow (1987) and Bayer (1990)). (56) a. *[ Was] C [,„pi> - weißt du nicht [ n . wie 2 [Tl,Pi> (t,) man t 2 ti repariert ]]]] ? whatilLL. know you not how one fixes 'What don't you know how to fix?'

Müller: Optional Movement

137

b.'??[a> - C [T„pP Radios, weiß ich nicht [n> wie 2 [T„Pi> t', man t 2 ti repariert ]]]] radios.1L.c know I not how one fixes 'As for radios, I don't know how to fix them.' It is argued in Müller & Sternefeld (1993) that the key to a solution of this problem is that SpecTop is an escape hatch, in the sense that w/z-islands in German can be circumvented only if extraction takes place via this position. Suppose that something along these lines is correct, and that it can be made to follow from the theory of locality. With this in mind, consider now schematic structures of the sentences in (56) under the assumption that extraction takes place in a successive-cyclic manner, via the embedded SpecTop position. (57) a. *[- [scr] y [Case]

Based on (61), I would like to suggest that movement to a typical checking position for a feature 7 is possible only if there is no unchecked lower-ranked feature 3)): [...,+HT, right]

But the crucial question, now, is how we identify the relevant head for (17). This is why (17) has to be parameterized. For English, as the parameter is set in (18), the head must be in terminal position of its constituent, and this position is to the right, thus book is selected as the relevant VP asterisk. Given all these assumptions, the derivation in (16) goes through, giving the right result for English. It is not a trivial matter to define the parameters for a 'mixed' language like Dutch. If we define it as left-headed, we will get correctly the stress of (15b), since the VP leftmost stress (boek) will be projected. But with an intransitive sentence, the leftmost stress will be the subject, which may then get the main stress, incorrectly. Cinque's insight is that, in fact, no parametrization of the stress rule is needed. Apart from the empirical problems of such parametrization, it is doing nothing more than an unneeded duplication of the mechanism which governs, independently, word order variations in syntax. Assuming that we need independently to know what is the direction of recursion in a language, the same (and better) results will be obtained with applying the one universal stress rule, starting with the most embedded constituent of the sentence. The basic idea is as follows: let us assume that the first cycle of the stress rule is the most deeply embedded stress, i.e. a category containing only one (word-level) stress. The stress rule now needs no mention of heads or their order, and it can be stated with a slight simplification in (19). As far as I can see, the rest follows with no further assumptions. I should mention that I am not fully loyal to Cinque's actual execution. He assumes a greater machinery than I do here, though I think I capture correctly his intuition. Nothing here hinges on this being the case, and if my presentation is mistaken, one can go back to Cinque's precise formulation. 5 Let us see how the derivation of the stress of (15a) follows. (19)

Generalized stress rule Locate the stress (asterisk) of line N on line N + l .

(20) a. b. d. c.

line line line line

1 2 3 4

(= word line 3): (NP cycle): (VP cycle): (IP cycle):

[Max [ read the book ] ] ] [ * [ * * ]]] * ]]] [ [ * ]] [ [ [

Let us assume that the most deeply embedded constituent is the object (a point I return to directly). The first cycle-line, (20b), is then the NP (or N). Since there is only one stress

Reinhart: Interface

Economy

153

for this cycle in the previous line, it is this stress which projects to the present line. From then on, there are no more options, and each cycle projects this same stress. Thus, the gist of the analysis is that the main stress of the sentence will always be on its most embedded constituent, namely, on the node we started stress-processing with. Of course, everything depends now on the correct identification of the most embedded node. Specifically, the problem arises in the case of sisters (both carrying stress). Cinque argues that the answer lies in the order of recursion. Given two sisters, the most embedded one is that occurring on the recursive side of the tree. At first sight, this may seem like begging the question, but Cinque's point is that the order of recursion, or whatever determines word order, is a problem independent of stress, the answer to which is the goal of current syntax. Once the answer is found, the stress pattern should follow. Thus, in a right branching language like English, in the VO structure, in (21), the most embedded node is the object. In a left-branching language, like Dutch (in this relevant structure), it is again the object. (21)

Asymmetry of sisters English: V' V

Dutch:

V'

O

O

*

*

V

Zubizarreta (1994) argues that, in fact, it is not correct to talk about just order of recursion here, and depth of embedding is determined by head-complement relations. With this assumed, the Dutch (15b), repeated in (22) is derived as in (23). (22)

(dat) ik het boek las that I the book read

(23) a. Word stress: b NP cycle: c. VP cycle: d. IP cycle:

[ ik [* [ [ [

[ [ het [[ [[ [

boek ] las * ] * * ] * *

]] ]] ]] ]] ]

The intransitive case appears non-problematic, at this stage: Given a sentence like (dat) ik las / / read , the first cycle assigns stress to V (or to VP—nothing hinges on this, in this case). Since the VP and the subject are not sisters, the issue of embedding does not arise, and it is clear where the stress-processing starts. Hence, the main stress will fall on the verb. More problematic are structures where the subject (or another adjunct or specifier) is a complex constituent, containing more embedding than the VP. The main stress still falls in this case on the deepest constituent of the VP, and the question is how this happens. Cinque assumes that the subject constitutes a cycle of its own. In this, he follows Halle and Vergnaud, who noted, independently of this problem, that the subject always gets secondary stress (higher than non-stressed nodes in the VP). The issue, then, becomes that of how to merge two cycles, each carrying its own main stress. Cinque defines, for that,

154 the notions of major and minor paths of embedding. The main stress always falls on the major path, but when a minor path joins it, it gets a secondary stress (one asterisk). Zubizarreta (1994) offers a different formulation of this merging, sensitive to the complement/adjunct distinction, but for our purpose here these details are not crucial. Cinque argues that his stress rule applies directly to syntactic constituents and no notions like a phonological or prosodic phrase are needed. The question of what the relevant constituents for phrasal stress are, has been a subject of much debate. Cinque's line contrasts with the view developed by Selkirk (1984), where it applies to phonological phrases, related, but not isomorphic to syntactic constituents. If Cinque's analysis can be maintained, it is clearly advantageous, being the more minimal one. In any case, Zubizarreta (1994) points out that Cinque's analysis can also be stated to apply to phonological constituents. 6

2.2

Main Stress and Focus

2.2.1 The analysis of sentence stress outlined so far is independent of any discourse considerations: it is impossible to utter a sentence with no prominent stress, so the PF rule we examined—(19)—determines where this stress will fall. The main stress of the sentence, which is assigned by this rule, is just a particular instance of stress assignment which is needed independently (e.g. for units smaller than a sentence). However, sentence accent interfaces with the theory of discourse, via the notion of focus. Focus, which is roughly viewed as the most informative part of an utterance, is usually identified by prominent stress. The gist of Cinque's proposal is that the set of possible (neutral) foci in a sentence is determined by its main stress, i.e. by the same rule of phrasal stress. I return directly to how precisely this works. On this issue of the relations between main sentencestress and focus, there exist two conflicting positions: the one that Cinque returns to is that possible focus selections are restricted by an independent PF stress rule, and the other is that there is no such thing as a (neutral) PF stress, and the main stress of the sentence is determined solely by its relations to discourse, i.e. by focus. Cinque surveys common counterarguments to the position he defends and concludes that discourse considerations may at times interfere with the results of the phrase-stress rule, assigning a different stress-prominence. But he assumes that the two types of prominence can be distinguished. For him, the relevant distinction is that between sentence grammar and discourse grammar. The latter can change the output of the computational system: if in a given context, it is appropriate to use as a focus a constituent which was not assigned the main stress by 'sentence grammar', 'discourse grammar' assigns an additional stress to this constituent, or destresses the original prominent stress. Zubizarreta (1994) develops this line, and argues that the relevant distinction is that between a neutral focus and a marked one. Neutral focus intonation is often characterized as that intonation under which a sentence could be uttered 'out of the blue', namely, the whole sentence is asserted (as ' n e w ' ) and none of its constituents need to be pre-assumed in the context (no 'presupposition'). Zubizarreta argues, then, that what Cinque's stress rule determines is the neutral focus intonation of a sentence. When a sentence with this intonation is uttered 'out of the blue', the full sentence can be viewed as the focus phrase. 7 But the central point of Cinque's and Zubizarreta's analysis is that, under the same neutral-

155 focus intonation, a sentence can be used also with only one of its constituents as the focus (and the rest pre-assumed). Crucially, the full set of the possible (neutral) focus constituents of the sentence is determined by the same rule of phrasal stress. Cinque's generalization is given in (24). (24)

The focus of IP is a(ny) constituent containing the main stress of IP, as determined by the stress rule. (This is Cinque's 'sentence grammar' focus, and Zubizarreta's 'neutral focus'.)

To see what is the set of possible foci allowed by (24), let us look at the sentence (25), whose main (neutral) stress falls, as predicted on the object, a desk. This stress is determined, as we saw, cyclically, by assigning each new cycle the main stress of the previous one. There are three cycles: the NP, the VP, and IP, and each of them has the same main stress. Each of them, then, can be said to carry the main stress of the sentence. (25)

[ My neighbor [ is building [ a desk ] ] ] *

a. NP cycle: b. VP cycle: c. IP cycle:

*

*

[ [ [

* * *

] ] ]

The focus generalization (24) now determines that each of these constituents can serve as the focus. This means that with this main stress, the sentence can be uttered in contexts in which it is appropriate for any of these three constituents to serve as focus. This is illustrated in (26). The notation I will use throughout is bold-face to mark the word which carries the main stress, and underlining for the constituent which is the focus selected in the given context. (26) a.

(What's this noise?) - My neighbor is building a desk.

b. (What's your neighbor doing?) - My neighbor is building a desk. c. (What's your neighbor building?) - My neighbor is building a desk. d. (Has your neighbor bought a desk already?) # - My neighbor is building a desk. e.

(Who is building a desk?) # - My neighbor is building a desk.

In (26a) we have an instance of 'out of the blue' context. Here the option (25c) is selected in the answer, with the whole IP as focus. (26b,c) illustrate contexts for the selection of (25b,a), respectively. The crucial point is that in all three contexts precisely the same

Reinhart: Interface Economy

156

main stress is used. But the same main stress cannot be used in (26d,e). In (26d), the context determines that the relevant focus should be only the verb. But the verb is not one of the constituents that (24) selects as possible foci for this structure, since it does not, itself carry the main stress. The same is true for (26e), where the context forces the selection of the subject my neighbor as the focus. As Cinque notes, his analysis goes back, in its essence, to the view of focus in Chomsky (1971). Another way to check the prediction that any of the constituents dominating the main (neutral) stress can serve as focus, is checking the set of possible substitutions. E.g. in the context of a yes/no question in (27), modelled after Chomsky's example, the different answers correspond to different selections of focus in the question. The focus constituent in each answer, which is underlined, substitutes one of the possible foci in the question, namely one of the constituents dominating the main stress of the question. (27)

Are you [looking for [a passenger with [a red [shirt]]]]? a. No, I am looking for a passenger with a red tifi. b. No, I am looking for a passenger with a coat. c. No. 1 am looking for a member of the crew. d. No. I am just wandering around.

2.2.2 Although Cinque may not have stated it precisely in the same way, I would like to elaborate a bit on the picture which, I think, underlies this line of analysis. At the interface, sentences must be fit to context and purpose of use. One of the means relating sentences to discourse is focus. The computational system should provide us with sufficient means to identify the focus. This need has been often addressed by syntacticians with the idea of encoding focus syntactically: either by movement (QR), or by attaching a focus feature to nodes in the syntax, or both: attaching a focus feature to allow movement (which, interestingly, is viewed by some as more minimal than doing just one of these two). While certainly possible, this does not take us very far in addressing any of the problems discussed here, since we still need to know first, what the restrictions are on possible focus selections, and next, which focus selection is appropriate for which discourse. 8 I will pursue, instead, a line suggested in Reinhart (1981), for the analysis of topics. 9 Each derivation is associated not with an actual focus, but with a set of possible foci, namely, a set of constituents that can serve as the focus of the derivation in a given context. This set is determined by the computational system at the stage where both the syntactic tree and stress are visible, namely, the focus selection applies either to a PF structure, or to a pair , of sound and configurational structure. 10 The focus generalization (24) can, then, be stated as the definition of the focus set associated with each derivation, as in (28a). (This is the first approximation of the focus-set definition. More details are discussed in Reinhart (1995), part 3.) Sticking to the basic structure SVO in English, or SOV in Dutch or German, the focus set defined by (28a) is (28b).

Reinhart: Interface (28) a.

b.

Economy

157

Focus set The focus set of a derivation D comprises all and only subtrees (constituents) which contain the main stress of D. [ I P S [VP V O j ] / [ I P S [ V P O V ]]

Focus set: {IP, VP, O}

At the interface, one member of the focus set is selected as the actual focus of the sentence. At this stage, it is up to discourse conditions, rather than syntax, to determine whether a derivation with a given focus is appropriate for a given context. If no member of the focus set can be used as focus in the given context, this derivation is unusable in that context. The basic idea, then, is that the main stress assigned by PF enables a sentence to be used in a variety of contexts, since it permits a large set of possible foci, from which the context can select the appropriate one. Nevertheless, there may be contexts requiring a constituent not in this set to serve as the focus. E.g. constituents not included in the focus set in both structures of (28b) are V and S. This means that sentences leaving PF with the standard main stress cannot be used with their subject or verb as the (only) focus. That this is indeed so, was witnessed by the inappropriateness of (26d,e), repeated in (30a) and (31a). For such contexts, stress relocation operations (which is what Cinque labelled 'discourse-rule') have to apply. We may state this, for the moment, as the stress shift procedure (29), and I will return to some more details in section 3.2. (29)

Stress shift Relocate the main stress on a constituent you want to focus.

With stress shift applied, the sentence can be used in the context of (30) and (31), as illustrated in their answers (30b) and (31b). (30) a. b. (31)

(Has your neighbor bought a desk already?) # - My neighbor is building a desk. - My neighbor is building a desk.

(Who is building a desk?) a. # - My neighbor is building a desk. b. - My neighbor is building a desk.

Under this analysis, then, the focus use in (30b) and (31b) is viewed as marked, since it is obtained by a special operation undoing the results of sentence stress. This, in a way, is the heart of the analysis, and the center of the debate concerning focus and sentence stress. The idea that a systematic distinction can be drawn between marked and neutral stress at the sentence level has been often challenged (with the alternative view being that stress at this level is determined by focus, and not conversely). The crucial question is whether an appropriate definition, and further supporting evidence, can be found for this distinction. This issue is addressed in Reinhart (1995). Let us assume, for now, that this distinction can be maintained, and look at some of its consequences.

158

Reinhart: Interface Economy

On this view, using marked stress is costly and uneconomical, involving an additional operation. We would expect that this would be done only for a good reason, namely, when there is no other way to express the intended focus relations. English, with a rather restricted word order, does not have too many choices here. But languages with more word order options, may find ways to express more focus-structures with neutral stress. Cinque compared the following sentences in English and Italian: (32) a. Johnson died.

b. Johnson died.

(33) a. Johnson e morto.

b. E morto Johnson.

c. # Johnson e morto.

In English, to create a focus structure with focus on the subject, one must use the marked stress rule, to yield (32b). In Italian, there is an option of raising the subject, as in (33a), or not, as in (33b). In the first case, neutral stress will fall on the verb, as the most embedded constituent. In the second, it will fall on the subject. Thus, Italian allows expression of both focus structures of the English (32) with no appeal to marked stress. Next, Cinque observes that the use of marked stress on the subject, as in (33c), is inappropriate (even in the right context, which he provides, following Schmerling). This is so, since there is no reason for this option—it does not give us any option that could not be obtained with an alternative derivation with a neutral stress. Another way to describe the inappropriateness of (33c) is that the function of subject raising is precisely to exempt the subject from the focus role which the main stress forces on it, in embedded position. (This is consistent with the observation, analyzed in depth in Pinto (1994), that when the subject is D-linked, its movement is strongly preferred—D-linked constituents are not particularly happy foci.) Hence it appears self-defeating to then apply a special marked operation to give this stress back to the subject. I will return to the type of economy calculation which underlies this informal description in section 4. This idea is taken much further in Zubizarreta's pioneering research on the relation of focus and movement. Her generalization, based on an extensive study of Romance, is that movement out of VP may be due to phonological reasons—namely, to change the stress pattern, hence the focus structure of a sentence. With this assumed, let us go back to the analysis of object-scrambling in Dutch.

3 3.1

A Focus Account for Object-Scrambling The Focus Set of Scrambled Structures

As we saw already, in the discussion of (22), when V1 in Dutch contains V and O (or another complement), the main sentence stress falls on O (in the standard SOV order). When it contains only V, it falls on V. We may note that the predictions of Cinque's analysis were confirmed, independently in Gussenhoven's (1984) study of Dutch stress. Let us see one further illustration, originally noted by Gussenhoven."

Reinhart: Interface

Economy

(34)

dat ik [ y op een bankje [ y wacht ]] that I on a bench wait 'that I am waiting on a bench'

(35)

dat ik [ y op een bankje wacht ] that I on a bench wait 'that I am waiting for a bench'

159

When the stress falls on the verb, as in (34), the PP has only the locative adjunct reading. This follows, since it is only if the PP is an adjunct, that the verb becomes the most embedded constituent, so it can get the neutral stress. In (35), where the stress falls on the PP, the most available interpretation is that in which it is a complement. This is so, since under this interpretation, it is the most deeply embedded constituent, so it gets neutral stress.12 Turning now to scrambling structures, the scrambled object is not in a complement position, but it is higher than V'. Hence, the most embedded node in this structure is the verb, just like in (35b), or in sentences with intransitive verbs. Indeed, we see below that the scrambled sentences in (36b) and (37b) have a different stress pattern than their nonscrambled counterparts in (36a) and (37a), with neutral sentence stress shifting to the verb. It does not matter, in this regard, whether the object is definite, as in (36), or indefinite. What determines the stress pattern is the fact that the object is not a sister of V in the scrambled version. 13 (36) a.

dat ik gisteren het boek las

b.

dat ik het boek gisteren las 'that I read the book yesterday'

(37) a.

dat ik altijd een brief verscheur

b. dat ik een brief altijd verscheur 'that I always tear up a book' Let us see, now, what difference in focus options is entailed by this stress system, given Cinque's line on focus analysis. The stress in both structures, summarized in (38), is neutral stress. Hence, it determines the focus set in the way we observed in the previous section. (38) a.

O

V

ADV

V *

Focus set: {IP, VP, O}

{IP, VP,V}

160 The difference is that in the 'base' structure (38a), the object is included in the focus set, but the verb alone is not. In (38b), on the other hand, the verb is in the set, but the object is not. It follows, then, that a major reason to prefer a scrambling structure over the nonscrambled one could be to allow the V as the focus, which it cannot be otherwise. As we saw, in English, the only way to obtain this result is to apply the marked stress rule, which shifts the stress to V. Let us examine this again in (39). (39)

Editor: Any progress on the book we sent you for review? Reviewer: I read the book yesterday, and I will review the book (/it) tomorrow.

In the given context, the appropriate focus of the reviewer's answer is the verb. But the verb is not in the focus set obtained by neutral stress. Hence stress shift applies, giving extra stress to the verb. In Dutch, the same result can be obtained with scrambling, as seen in the translation of the reviewer's reply in (40a). (40)

(Hoe gaat het met de review van Jan's boek?) how goes it with the review of Jan's book a. - Ik heb het boek gisteren gelezen. b. # - Ik heb gisteren het boek gelezen.

In the unscrambled version of this sentence, in (40b), stress falls on the object, hence the verb alone is not in the focus set, and the sentence cannot be used in this context. But in the scrambled version (40a), neutral stress falls on the verb. Hence, the focus set includes also the option of the verb alone being the focus, and the sentence can be used in this context with no appeal to the marked stress rule. As we shall see, the option of applying stress shift rather than scrambling is strongly dispreferred in this case (i.e. using the word order (40b), with the stress shifted to the verb as in English). With this, we may return to the contrastiveness generalization observed by de Hoop (1992), repeated here: (9)

A descriptive generalization In Dutch, scrambling of the object yields the same semantic effect as the predicates with stressed verbs in English. (dH:165)

This descriptive generalization now follows from the analysis of stress and focus, and it is what we have just seen: scrambling in Dutch does a similar job to that which the marked stress rule does in English. Contrastiveness is not necessarily involved here, hence I omitted here the reference to it in the original formulation of (9). We will see directly that even as stated here, the generalization is not fully precise, and, in fact, scrambling and the stressing of the verb via the marked-stress rule do not always have the same semantic effects. However, it is correct for cases of the type we have examined so far. Let us see how these follow from the different focus sets defined for scrambled and non-scrambled structures in (38).

Reinhart: Interface

Economy

161

We saw in section 1 that there are cases where scrambling of a definite NP seems obligatory, and others where it is not allowed. (6), repeated in (41) is an example of the first. Under its neutral, unmarked intonation, the main stress of the unscrambled version (41a) falls on the object, hence its focus set is (38a). However the context here signals the verb as the focus. Since this focus construal is not in the focus set, we get a mismatch between the stress and the focus needed in this context. In the scrambled version (41b), main stress falls on the verb, hence the focus set is (38b), and the verb can be the focus, as required. (41) a. *Ik heb

gisteren

het boek gelezen en

I have yesterday the book read

niet verscheurd.

and not torn up

b. Ik heb het boek gisteren gelezen en niet verscheurd. (42) a. * IIkhave heb the de krant niet gelezen,maar heb the het boek wel gelezen. paper nog not yet read, but Iikhave book al already read b. Ik heb nog niet de krant gelezen, maar ik heb al wel het boek gelezen. In (7), repeated in (42), by contrast, the context selects the object as the focus. Opting for the scrambled version here, as in (42a), the option of the object alone construed as focus is not in the focus set, as we saw in (38b). Hence scrambling is disallowed in this context. In the non-scrambled version (42b), neutral sentence stress falls on the object. Hence, among the focus options in the focus set (38a), we find the one with the object alone as focus, which is appropriate here. Generally, we then expect scrambling not to be possible, when the verb is an unlikely candidate to be stressed or serve as the focus. While in (42) this was precluded by the context (which happens to be contrastive in this specific example), there may be other, context independent, reasons to avoid stressed verbs. Recall de Hoop's (8a,b), repeated in (43), which motivated her generalization (9). (43) a.

omdat ik altijd een kat heb because I always a cat have

b. * omdat ik een kat altijd heb

(dH:163, ex. (72))

The verb have is a light verb that will require a very special context to serve as focus. The scrambling in (41b) puts the main stress on it. With no special context, the sentence is as weird as its English counterpart (12) with stress on the verb (because I always have a cat). Though I focused so far on the contrast between the verb and the object as foci, it does not entail that scrambling in Dutch is allowed only when we want the verb alone to serve as focus. As we saw in (38), the focus set of scrambled structures is not restricted to the verb as focus. In fact, the same is true for stress shift in English VPs. To see more precisely what motivates scrambling in Dutch, let us examine in more detail the operation of stress shift. This will reveal that, in fact, scrambling only intersects, but is

162

Reinhart: Interface

Economy

not identical, with the (full range of) the output of English stress shifts, and will also enable us to return to the issue of definiteness effects in scrambling.

3.2

Stress Shifts

Cinque (1993) and Zubizarreta (1994) argue that, in fact, stress shift involves two distinct operations, which can operate independently of each other (i.e. they can either both apply to a given derivation, or only one of them). The one is destressing of a stressed element, and the other is strengthening the stress of a given element which does not bear the main stress. In the case of stress shift in the VP (from the object to the verb), it may be difficult to the untrained ear to distinguish the two, since in both one hears a stronger stress on the verb than would be assigned by the nuclear stress rule. But Zubizarreta surveys in detail actual phonetic analyses of the two patterns. The most obvious instance of the first procedure is the case of anaphoric destressing, which applies when an NP (or another constituent) denotes an entity previously mentioned in the discourse. 14 This is often the case with definite NPs, but it is most noticeable with pronouns. In the case of definite NPs, whether the NP is anaphoric depends on previous context, but pronouns are mainly used anaphorically, hence, they are almost obligatorily destressed. Consequently, the stress of the verb becomes the prominent stress in the VP, as illustrated in (44). In the case of it, it is virtually impossible to find contexts where it is not destressed. (44) a. * Max saw her/ it. b. Max saw her / it. The other stress shift procedure assigns an extra stress to the verb, without a direct destressing of the object. The result is that the object carries less stress than the verb, but some secondary stress can still be traced on it. It is easiest to note it in cases like the following. (45) a. I am waiting for someone, b. I have to eat something. The object here is certainly not anaphoric. But it is devoid of any specific content, so it is an unlikely focus, alone. Although I am not aware of any discussion of such cases, they appear related to the contrast Bolinger (1972) found between the sentences in (46) (quoted by Zubizarreta and Cinque). In (46a), the candidate for neutral stress does not merit a focus-status because it is semantically Tight', or uninformative. (46) a. b.

I have a point to make. I have a point to emphasize.

In such cases, the verb's stress is strengthened, but the object still carries traces of the stress assigned to it by the sentence stress rule, i.e. it carries secondary stress. Most of the examples of stressed verbs with indefinite objects cited in focus studies fall under this type.

Reinhart: Interface Economy

163

A systematic explication of the effects each of these have on the focus structure is provided in Williams (1995). He argues (based on a detailed analysis of more elaborate examples) that the second type creates a new focus, but does not eliminate the previous focus structure. Typically, in such cases, the 'presupposition' part itself contains a focus and presupposition, namely, there is a subordinate focus. Anaphoric destressing, on the other hand can be viewed as independent of the focus requirements of the context. 15 This is a procedure necessary to enable anaphora resolution. By destressing constituents whose antecedents are accessible in discourse, the speaker enables the hearer to relate new expressions to existing discourse entities. But obviously, the anaphoric status of expressions may have an effect on their focus structure. I should mention that destressing is not restricted to anaphoric expressions. Another function of destressing is to signal the scope of adverbs of quantification (like always, often, or sometimes.) Typically, only destressed elements can serve as the restrictive term of such operator. But 1 will not discuss these cases here.

3.3 Scrambling and Definiteness In English, both stress shift procedures have, inside the VP, the effect of stronger stress on the verb. Hence, it is easy to confuse them. 16 But in Dutch, which allows the scrambling option, the two are distinguishable. The scrambled object is not in a position to be assigned any stress by the nuclear stress rule. Hence, it can be used only if it is appropriate for the object to be fully destressed. Notably, a pronoun object must scramble in Dutch, as in (47). (I will return to the question why the stress shift option of English is not available for (47a) in section 4.) (47) a. * Ik heb gisteren het gelezen. b. Ik heb het gisteren gelezen. I have it yesterday read But those cases of English where stress strengthening still leaves traces of the original stress on the object, cannot be captured by scrambling in Dutch, because the scrambled object does not get any stress. Thus, in cases like (45), Dutch too has to resort to a stress strengthening operation, as in (48) rather than scrambling. (48) a.

Have you eaten anything already? Heb je al iets gegeten? *Heb je iets al gegeten?

b. Have you seen anybody here? Heb je hier iemand gezien? As a further example of the difference between the option of scrambling and of verb strengthening, we can look at the case of (30), repeated here. As we saw, the focus set provided by the neutral stress in (30a) does not contain the focus construal appropriate for the context. Hence, some stress shift operation must apply. The noun desk is not

Reinhart: Interface Economy

164

anaphoric in this context (the context does not establish a desk-entity that we keep referring to—had it been the case, a pronoun or a definite NP would have been used). Hence, anaphoric destressing cannot apply, but stress strengthening of the verb, to allow it to be the focus, can apply, as in (30b). (30)

(Has your neighbor bought a desk already?) a. # - My neighbor is building a desk. b. - My neighbor is building a desk.

(49)

( H e e f t j e buurman al een buro gekocht?) has your neighbor already a desk bought a.

# - No, hij heeft in de tussentijd no, he has

een buro getimmerd.

in the meanwhile a desk

built

b.

- No, hij heeft in de tussentijd een buro getimmerd.

c.

# - No, hij heeft een buro in de tussentijd getimmerd.

In Dutch, in the same context, the neutral stress (49a) is also inappropriate, for the same reason as in (30a). The scrambling option (49c) requires a full destressing of the object, which, just as in English, is impossible. Hence, the option left is using the same verbstrengthening in (49b), as in English. 17 We may turn now to the definiteness issue that motivated the study of scrambling. De Hoop's generalization (5), repeated here, captures the fact that it is much easier for definite (or d-linked) NPs to scramble than for indefinite ones. (5)

de Hoop's generalization:

Only strong NPs can 'scramble'.

But to derive this result, a heavy syntactic machinery had to be assumed. Our question was whether the same could not follow without assuming this machinery. Given what we just saw, scrambling is appropriate only in a context which enables full destressing of the object. The most typical context allowing that is that of anaphoric NPs, and most typically, definite, but not indefinite NPs can be anaphoric. 18

4

The Concept of Markedness: Focus and Economy

Cinque's view of focus is striking in its simplicity and elegance. If it can be maintained, then focus is, essentially, a PF issue. Independent considerations of the computational system determine that stress must be assigned to a sentence. At the interface, this property of sentences is used to facilitate communication, using stress as focus. As we saw, this was, essentially, the view of focus in Chomsky (1971). We should note, however, that the analysis is based on a revival of the concept of markedness, i.e. the idea that a distinction can be drawn between the neutral procedure of

Reinhart: Interface Economy

165

sentence stress, and other procedures which are marked. This distinction has been challenged extensively. It was repeatedly argued against the nuclear stress rule, or Chomsky's (1971) focus analysis, that in the appropriate context, main stress can fall anywhere, with effects hardly distinguishable from that of the neutral stress. This was particularly emphasized by Selkirk (1984). The crucial problem here is the same as has been observed in the case of QR and quantifier scope, namely, whether any content can be given to the concept of markedness. If it is just as easy to construct examples with 'marked' stress, as with neutral stress, and there is no obvious way to distinguish them, we run into the danger of vacuity—having a theory which excludes nothing. The facts that follow from its rules are labelled 'neutral', and everything else—'marked'. (This type of theory is always true, regardless of what its rules are.) A more realistic conclusion appears to be that there is no sentence-level generalization governing the selection of possible foci, and any expression can be a focus, subject only to discourse appropriateness. This, in fact, seems to have been the winning hypothesis for years, until Cinque reopened the issue. Possibly, this is also the reason why Chomsky (1976) departed from his earlier view, and took the position that focus-scope is determined just by QR. Any constituent permitted to raise by QR can, thus, serve as focus. However, I argue in Reinhart (1995) that it is a mistake to hunt the evidence for it in the realm of direct intuitions. A marked derivation is a derivation violating economy. When this is done with no reason, the result is visibly awkward. But if using the uneconomical derivation is, decisively, the only way to satisfy a certain interface need, the result sounds perfectly fine, and it is only indirectly that we can see that it is nevertheless marked, or uneconomical. (In the case of QR, Fox (1994) provides ellipsis evidence for QR not taking place when not needed for interpretation.) In the case of focus, we have already at our disposal some way to test the markedness hypothesis, when we look across languages. One of the findings of Cinque, and, mainly, Zubizarreta, is that if a language has the means to get a certain focus structure without applying the marked stress rule (say, by choosing an alternative permissible derivation), then its application yields visibly bad results in that language. The parallel case in a language like English, with very limited word-order options, may sound perfectly fine, with no visible evidence for markedness. One example was mentioned already in (32) and (33), repeated. (32) a.

Johnson died.

b. Johnson died.

(33) a.

Johnson e morto.

b. E morto Johnson.

c. #

Johnson e morto.

As observed by Cinque, the Italian (33) sounds incomparably more awkward than its English counterpart (32b), although in both the marked stress rule has equally applied. This is so, since in Italian, the same focus needs could be satisfied with the structure (33b), with no application of the uneconomical operation, but in English, there is no other way to turn the subject into focus. We may note now that the same is true also for the structures we examined in Dutch. Whenever scrambling can apply, deriving the same focus set using the marked stress shift

Reinhart: Interface

Economy

instead is visibly marked—namely, it sounds ungrammatical. This can be witnessed, first, in the case of pronouns. (51) a. I have seen him yesterday. b. #Ik heb gisteren hem gezien. c. Ik heb hem gisteren gezien.

(marked stress) (marked stress) (neutral stress)

While destressing of the pronoun in the English (51) is completely natural, the same is hardly possible in Dutch, as in (51b). Scrambling should be used instead, as in (51c). With het ('it'), scrambling is strictly the only option. But this is more generally true whenever the object is anaphoric, hence should be destressed. Obviously, the mechanism for destressing a definite NP, as well as that for destressing pronouns, exists in Dutch, and it applies, e.g. when there is no intervening adverb or PP, hence no scrambling option. But if the derivation enables a scrambling choice, then opting for destressing instead is noticeably odd. This, in fact is true for all the examples I discussed in section 3. Let us examine this with the case of (39) and (40), repeated. (39)

Editor: Any progress on the book we sent you for review? Reviewer: I read the book yesterday, and I will review the book (/it) tomorrow.

(40)

(Hoe gaat het met de review van Jan's boek?) how goes it with the review of Jan's book - Ik heb het boek gisteren gelezen.

a.

b. # - Ik heb gisteren het boek gelezen. (52)

# - Ik heb gisteren het boek gelezen.

In this context, the book is clearly anaphoric. English has here only the option of stress shifting, as in (39). As we saw already, Dutch has also the option of scrambling, in (40a). (The neutral stress pattern in the unscrambled version (40b), is inappropriate in Dutch, as in English, because its focus set does not contain the focus relevant for this context. See the discussion in section 3.) But we may note now that this is not just an option, but an obligatory choice. Applying the same stress shift as in English, yields here the highly marked result in (52). The case of Dutch is particularly interesting for the economy view, since, following Neeleman (1994), scrambling cannot be viewed as a costly choice. Scrambled structures differ from non-scrambled ones only in the adjunction-site of the adverb, and adjunction is a free operation in the derivation. There can be no economy difference related to where we choose to place the adverb. Hence, the choice here is between stress shift, which is an optional, hence uneconomical operation, or not applying it. Given that applying it does not satisfy any interface need that we could not have met also without it (by adjoining the adverb differently), it is ruled out.

167

Notes 1 2

3

The analysis of the Dutch scrambling in this part is based on work together with Ad Neeleman. I would also like to thank Hubert Haider, Helen de Hoop and Eric Reuland for many helpful comments. "Special . . . processes of a poorly understood sort may apply in the generation of sentences, marking certain items . . . as bearing specific expressive or contrastive features that will shift the intonation center . . ." (Chomsky, 1971:199) De Hoop attempts to derive the descriptive generalization (9) from a broader principle. For this she introduces the following theoretical account, and a new principle, the POC: (i)

The theoretical account: "If an object receives a strong reading, predication needs to be contrastive."; "This principle holds more generally for all NPs of type " (i.e. generalized quantifiers) (dH:165).

(ii)

The Principle of contrastiveness (POC): "A strong NP needs a contrastive predicate. (dH:168, stated formally in (80), p. 166)."

While the descriptive generalization in (9) is important and correct, it is not easy to understand the intuition underlying the POC, from which it is supposed to be derived. As stated, it appears to wrongly entail that, universally, we cannot find a strong NP with a non-contrastive verb or predicate; e.g. that (iii-a) and (iii-c) are ill formed, and only (iii-b) is allowed in English: (iii)

4 5

6

7 8

a. Max read every book. (normal stress on book) b. Max R E A D every book. c. Max has already read every book.

However, de Hoop is using the term 'contrastive' here not in the familiar sense of contrastive stress. The notion is taken to be semantic, rather than phonological. Hence, a contrastive predicate need not be realized by contrastive stress. Rather, the relevant notion is having a set of alternatives. Capturing this idea in a precise way may be tricky, but since I will offer an alternative account for (9), there is no need to examine the details here. That Zubizarreta's approach may be useful for the analysis of scrambling in Dutch was proposed also, briefly, in Delfitto & D'hulst (1994). Cinque's stress rule (10), (p.244) still includes the formulation in (3) (p.241), which assumes heads. It includes also an additional requirement that an asterisk on line N must correspond to an asterisk on line N - l . In his actual analysis, he starts with the next XP cycle (e.g. VP), just like H&V. But curiously, he omits the requirement that the cycle contains at least two asterisks, and he adds that "this simplification is crucial to obtain the correct results" (footnote 7, p.244). Indeed, this omission enables the analysis to work also without the previous assumptions, which is why I think this is what he actually intended. In any case, I do not think that there is anything at stake here apart from whether the machinery can be reduced. And I assume that the way I present Cinque's analysis is precisely equivalent, empirically, to his. Zubizarreta points out (footnote 14) that the question whether the prosodic phrase is determined semantically or syntactically does not have much empirical content. Selkirk argues that the intonational phrase must form a sense unit, where two constituents constitute a sense unit if they stand in a modifier-of-head or an argument-of-head relation. But the notions assumed in this definition: modifier, argument and head, are anyway syntactic notions. This is assumed under different wordings since Chomsky (1971) and Jackendoff (1972), but has recently gained more attention in work by Vallduvi, Engdahl, and Herman Hendricks. Once these questions are answered, the line of encoding a focus feature in the syntax is a possible implementation. This, e.g., is the specific implementation chosen by Zubizarreta, who states the focus

168

9

rule as a restriction on nodes marked +F(ocus). To deal with the problems I address here, this is unnecessary. I leave open here the question whether there are other reasons to assume that + / - F is a syntactically encoded feature, as Zubizarreta argues. I argued there that each sentence is associated with a set of possible pragmatic assertions (PPA-set). The set is determined within the syntax, but discourse selection procedures determine which of these options, if any, is appropriate to a given context. I proposed there an algorithm only for determining the set of possible topic-predicate relations, but obviously, the full set of PPAs should contain also the possible foci of a derivation.

10 To get more precise about this description, we need to know more about the product of spell-out (namely on the nature of PF, in the pair ). Recall that Cinque assumes that at least as far as stress is concerned, it can be determined directly on syntactic structures, with no need to construct additional phonological structure. This is clearly the most minimal approach, and thus, the starting hypothesis that we would like to maintain, unless confronted with massive empirical evidence to the contrary. Still, this leaves us with two possible views of what PF is—one, that this is just a sound string, the product of all spell-out procedures. The other is that just like LF, this is the full syntactic tree, derived up to the stage of spell-out, representing also further steps in the derivation required by spell-out operations like stress, erasure of features, and other phonological processes. If the second is the correct view, then we may say that the focus rule applies solely at PF, namely, it associates a set of possible foci with each PF. 11 This was pointed out to me by Ad Neeleman. 12 The analysis of adjunct stress in Cinque's framework is still incomplete. It has noted (also by him) that often an adjunct PP appears to be carrying main stress. (So, for many speakers, (35) can easily be understood also under the adjunct construal). This issue is discussed also in Zubizarreta (1994). 13 Zubizarreta (1994) surveys an unpublished paper of Truckenbrodt (1993), who found, essentially the same pattern in German, including contrasts like (35). 14 In fact, anaphoricity, or previous mention, are not a sufficient condition for this type of destressing. Rather, stress here is governed by the accessibility of the antecedent, as defined in Ariel's (1990) analysis of anaphora resolution. 15 Williams would not state it this way. For him, the whole issue of focus is an instance of anaphora. But this is nevertheless a possible way to construe his findings. 16 For this reason, I argued, mistakenly, in Reinhart (1995) that these are, in fact instances of the same stress shift procedure. 17 In Reinhart (1995) my argument was based on a wrong judgment of these sentences. I thank Helen de Hoop for correcting me on that. 18 The only residue is the case of generics, which de Hoop defines as 'strong'. Although I did not discuss these here, I believe that the issue is not genericity, but the scope of adverbs of quantification. As I mentioned, allowing an NP to be in the restrictive term of such an operator is another motivation for destressing.

References Ariel, M. 1990. Accessing Noun Phrase Antecedents. Routledge, London and New York. Bolinger, D. 1972. Acccnt is Predictable (if you're a mind-reader). Language 48: 633-644. Chomsky, N. 1971. Deep Structure, Surface Structure and Semantic Interpretation. In Semantics, An Interdisciplinary Reader in Philosophy, Lingustics and Psychology, ed. D. Steinberg and L. Jakobovits. Cambridge University Press. Chomsky, N. 1976. Conditions on Rules of Grammar. Linguistic Analysis 2.4. Reprinted in N. Chomsky, Essays on Form and Interpretation, North Holland, Amsterdam, 1977. Chomsky, N. and M. Halle. 1968. The Sound Pattern of English. New York: Harper and Row.

169 Cinque, Guglielmo. 1993. A Null Theory of Phrase and Compound Stress. Linguistic Inquiry 24.2: 239298. Delfitto and D'hulst. 1994. Beyond the Mapping Hypothesis. Ms., University of Utrecht. Diesing, M. 1992. Indefinites. Cambridge, Mass.: MIT Press. Enc, M. 1991. The Semantics of Specificity. Linguistic Inquiry 22: 1-25. Fox, D. 1994. Economy, Scope and Semantic Interpretation: Evidence from VP-ellipsis. Ms., MIT. Golan, V. 1993. Node Crossing Economy, Superiority and D-linking. Ms., Tel Aviv University. Gussenhoven, C. 1984. On the Grammar and Semantics of Sentence Accents in Dutch. Dordrecht: Foris. Halle, M. and J.-R. Vergnaud 1987. An Essay on Stress. Cambridge, Mass: MIT Press, de Hoop, H. 1992. Case Configuration and NP Interpretation. PhD dissertation, Groningen. Jackendoff, R. 1972. Semantic Interpretation in Generative Grammar. Cambridge, Mass: M I T Press. Kcenan, E., and L. F'altz. 1978. Logical Types for Natural Language. UCLA Occasional Papers in Syntax 3, UCLA. Ladd, D.R. 1980. The Structure of Intonational Meaning. Bloomington: Indiana University Press. Neeleman, A. 1994. Complex Predicates. PhD dissertation, Utrecht University, OTS. Pinto, M. 1994. Subjects in Italian: Distribution and Interpretation. Ms.,University of Utrecht, OTS. Reinhart, T. 1976. The Syntactic Domain of Anaphora. PhD, MIT, Cambridge, Mass. Reinhart, T. 1981. Pragmatics and Linguistics: An Analysis of Sentence Topics. Philosophica 27.1. Distributed also by Indianal University Linguistics Club, Bloomington, Indiana. Reinhart, T. 1983. Anaphora and Semantic Interpretation. London: Croom Helm. Reinhart, T. 1995. Interface Strategies. OTS working papers in Linguistics (to appear in MIT Press). Rooth, M. 1992. A Theory of Focus Interpretation. Natural Language Semantics 1: 75-116. Selkirk, E. 1984. Phonology and Syntax: The Relation between Sound and Structure. Cambridge, Mass: MIT Press. Vallduvi, E. 1990. The Informational Component. PhD dissertation. University of Pennsylvania. Williams, E. 1995. Blocking and Anaphora. Ms., Princeton University (to appear in Linguistic Inquiry). Zubizarreta, Maria Luisa. 1994. Word order. Prosody, and Focus. Ms., University of Southern California.

Formal and Substantive Elegance in the Minimalist Program* (On the Emergence of Some Linguistic Forms) Juan Uriagereka 1

General Considerations

The surprising fact about Minimalism, in my view, is not that we seek economy, but that w e actually find it. Biological evolution, to begin with, does not explain it, if seen in the realistic light that Gould (1991:59-60) provides: Those characteristics [such as vision] that we share with other closely related species are most likely to be conventional adaptations. But attributes unique to our species are likely to be exaptations. 1 . . . As an obvious prime candidate, consider . . . human language. The adaptationist and Darwinian tradition has long advocated a gradualistic continuationism . . . Noam Chomsky, on the other hand, has long advocated a position corresponding to the claim that language is an exaptation of brain structure. . . . The traits that Chomsky (1986) attributes to language— universality of the generative grammar, lack of ontogeny, . . . highly peculiar and decidedly nonoptimal structure, formal analogy to other attributes, including our unique numerical faculty with its concept of discrete infinity—fit far more easily with an exaptive, rather than an adaptive, explanation. [My emphasis.]

O f course, one must be careful about what is meant by 'non-optimal structure'. The structure of language is not functionally optimal, as garden paths show for parsing structure, and effability considerations (the grammar allows us to say less than w e otherwise could) for producing structure. Lightfoot (1995) stresses this aspect of Gould's view, in the familiar spirit of Chomsky's work. Then again, the issue arises as to whether the structure of language is non-optimal as well, as the prevailing rhetoric of the eighties presumed. The view at that time was that finding non-optimal structures is an excellent way of defending the specificity of the linguistic system as a biological exaptation (hence as a natural, independent phenomenon of mind in the strongest sense, with little or no connection to communication processes and all that). However, Chomsky (1986) already showed that the linguistic practice was far removed from this rhetoric. Thus, the working details of this piece of research showed an example of optimality in syntax, as exemplified by the notion of Movement 'as a last resort'. The piece also assumed categorial 'symmetry', a research program which had by then been developed into the X'-theory. And in fact the book made significant use of the research strategy of eliminating redundancy within the model, a trait of Chomskyan linguistics which makes its practice

Uriagereka:

Formal and Substantive

Elegance

171

closer to that of the physical sciences, rather than to the exercise of evolutionary biology. In the nineties, Chomsky explicitly asks '...to what extent [standard] principles themselves can be reduced to deeper and natural properties of computation. To what extent, that is, is language "perfect", relying on natural optimality conditions and very simple relations?' If the empirical answer is: 'to some extent', (realistic) biological evolution invoking exaptations (see fn. 1) will have nothing to say about it. In class lectures (Spring 1995), Chomsky partly addresses this issue when comparing linguistics to another special science, chemistry, rather than to standard biology. Chemistry faces the question of how unstructured matter assumes organized form; linguistics deals with how unstructured features are organized into syntagms and paradigms. The set of principles that is responsible for determining linguistic structure is, to some extent, comparable to the set of principles with parallel chemical effects. And although this is admittedly pushing it to a point which Chomsky might only very speculatively ever raise, it is imaginable that there is a deep level at which these sets of principles, which determine certain pockets of regularity, can be related in ways that go beyond the metaphorical. Recent work on molecular biology (where, for instance, Garcia Bellido (1994) literally talks about 'phonology and inflection in genetics', 'molecular sentences', or 'syntactic analyses') suggests that the connection might not be totally impossible, or implausible. Be that as it may, it is good to keep different sorts of economy in perspective. On the one hand, analyses in terms of what we may call static elegance have been very much part of the tradition of linguistics, and helped determine 'best theories' or 'learnable languages'. But on the other hand, Minimalism makes fundamental use of the notion of what we may call dynamic elegance, when deciding among alternative derivations that converge on the basis of their least action. Fukui (to appear) argues that this is an instance of the classic problem of calculus of variations, as is found in various areas of mechanics, electromagnetism, quantum physics, and so forth (see Stevens (1995)). Given various paths that, say, a ray of light may take from an initial point 0 to a final point / the path that light actually takes is determined in terms of it being the one involving least action. Likewise, given various derivational paths that may be invoked when going from a point 0 in a derivation to its final p o i n t / the path the derivation takes is determined in terms of its fewest computational steps. This behavior of the linguistic system does not follow from its static elegance (bare output conditions). All competing derivations converge, and in that sense are equally elegant. In fact, there is a global character to dynamic elegance that we do not witness in static elegance. This is a consideration arising in calculus problems, where the underlying issue is the examination of a set of variations and a way of deciding on the adequacy of the solutions they provide. 2 While there is a univocal solution to the problem 'Is this structure grammatical?' (yes or no), there is no univocal solution to the problem 'Is the derivation of this structure grammatical?'. Whether this solution is valid depends on whether that solution is better; what is one bad solution for a structural context may be the best we have for an alternative context. 3 The questions we face are not new. The substantive properties of water molecules, for instance, tell us little about the macroscopic behavior of a water creek, and the whirlpools of turbulence it forms. Examples of this sort abound, involving global properties of systems of some sort. To use a fashionable term, these are emergent patterns that, for some reason, obtain of complex systems—in unclear circumstances. There is much talk

172

Uriagereka: Formal and Substantive Elegance

these days about systems being comparable at some level, the behavior of some helping in the understanding of the behavior of others (see Meinzer (1994)). I am not really sure what this means, though, particularly if we are supposed to be talking (at least in some central instances) about systems whose behavior is predictably unpredictable. More to the point, I have not seen any proposal explaining any linguistic principle in terms of any socalled principles of self-organization. Even Fukui's interesting proposal that linguistic derivations involve the calculus of variations falls short of establishing what the relevant lagrangian would be. 4 Nonetheless, whether questions pertaining to the dynamic elegance of the system will be expressible in familiar terms is something we cannot determine a priori. Consider, in that respect, Chomsky's treatment of the Last Resort Condition (LRC) as a derivational requirement. A derivation that does not meet the LRC when generating a chain is impossible, and it is cancelled at the point of violation, without ever reaching a convergent or even a crashed LF. One may ask why the LRC should hold of derivations, and Chomsky (1995) speculates that conditions of this sort reduce the range of alternatives to calculate a solution from. The system does not even include in the reference set of derivations that compete for optimality those which are cancelled somewhere along the way, because they failed to meet the LRC. This reduction of possible variations of the system recalls Haken's (1983) Slaving Principle. Unstable modes in a system influence and determine stable modes, which are thereby eliminated at certain thresholds. 5 A new structure emerges which results from unstable modes serving as ordering devices, which partially determine the behavior of a system as a whole. We may thus think of the LRC as ordering a region within the macroscopic behavior of a derivation. That is, at any given stage t in a derivation there is a potentially very large derivational horizon ht to contemplate, if the system proceeds according to standard derivational laws. This derivational horizon is dynamic, in that the system could in principle proceed forward in various ways. But suppose that certain regions in the derivational horizon (those corresponding to domains of morphological checking) are less stable than others. In particular, let it be true that a strong feature is a 'virus' that the computational system must eliminate, immunizing the derivation from it as fast as possible. 6 There is a sense in which this area within the derivational horizon provides a point of instability (as derivational complexity increases), which according to the Slaving Principle should enslave other competing modes that lead to other horizons. That is, while derivations that move to 'immunize' an intruder feature are attracted to a point of instability, derivations whose movement is 'idle' (or existing just to reach an interpretation) are not attracted to a slaving vortex. Thus, the derivational system does not even try derivations that conform to less unstable modes, anymore than, given a particular vortex within a water creek, turbulence will have a high probability of emerging (as fluid velocity increases) anywhere other than within the confines determined by that very vortex. Frankly, I am not trying to give this speculation so much as an explanation for LRC, as an example of the sort of approach that we may attempt. It has advantages and disadvantages. The advantages are that invoking this sort of line, we cannot be accused of gradualistic continuism ('the LRC holds of derivational systems because, in this way, they are easier to use in speech production and processing, thereby giving us an argument that elegance within language is a result of an adaptation') or of Platonism ('the LRC

173 holds as a defining property of derivations because the nature of these processes is purely formal, and that is just how they work, thereby giving us an argument that language is a mathematical object'). But the disadvantages should be obvious. First, we run the risk of applying the latest complexity tool to whatever condition we may want to derive; we might even be tempted to bend the linguistic theory beyond recognition, for it to fit with the outside principle. However, it would be surprising if traditional linguistic principles all of a sudden start falling into place within those principles which are beginning to emerge in the emerging science of complexity. More likely, we will find nothing but similarities to use more or less metaphorically—and probably because many things resemble many other things out there. Second, we may be escaping some form of Platonism to fall into yet another form, albeit at a larger scale ('reality is a mathematical object'). Why the LRC should follow from the Slaving Principle is something that should worry us. Fortunately, here we are in the same boat as everyone else within the special sciences, and perhaps we should not worry about this any more or less than biologists should worry about whether the Slaving Principle played any role in the emergence of life, say. So I suppose the moral is the usual one: we have to keep doing our dirty work, assuming that the end of syntax is nowhere nearer than the end of chemistry or biology is. However, doing work now involves two different, legitimate jobs. One is the usual analytic one, perhaps revamped with new tools. The other is a bit more abstract. Suppose we find phenomenon P, and characterize it in terms of conditions x,y, z. Have we found x, y, z properties of the linguistic system? The sorts of issues raised in this section allow for the possibility that phenomenon P may be an emergent property—like the spiral pattern in a snail shell—which arises dynamically from various systemic interactions. W e may even have ways of coherently characterizing the properties of phenomenon P directly, by way of some function using x, y, z as variables—just as we can calculate the snail shell as a given logarithmic function. And yet, Minimalism invites the suspicion that phenomenon P may not, in itself, instantiate any axiomatic property of the linguistic system—any more than a logarithmic function per se instantiates any axiomatic property of growth in the biological system. 8 And to make life interesting, telling gold from glitter won't be nice or easy... In what follows, I raise this sort of question with respect to the phenomenon of Obviation—which I believe is partly emergent. It involves two structural relations, and one (perhaps more) interpretive relations. Let us call the latter Relation R. As for the former, full referring expressions (names and definite descriptions) involve only command; a must be obviative with respect to 8 only if 6 commands a. In contrast, pronominal elements involve command and locality. So this is a perhaps a candidate for phenomenon P above, obeying conditions x (R), y (command), and z (locality). These sorts of conditions are articulated into a theoretical whole in various places that I return to, but most notably in Chomsky (1981) and (1986). I want to propose, instead, that each of the relations involved in obviation, and the corresponding structural correlates that they involve, follow from different properties of the language faculty. If so, seeking a unified theory of obviation is akin to seeking a theory of shell patterning.

174

2

Uriagereka: Formal and Substantive Elegance

Command Paths and Multiple Spell-out

Let me start by summarizing the proposal in Uriagereka (to appear), where it is shown that command paths emerging in the PF and LF components are a result of multiple Spell-out. This is important in and of itself: Spell-out being just a rule (not a level), it should be able to apply anywhere, anytime. But the proposal is of relevance to us now because it makes command relations natural emergent properties of the system, thus not something that has to be re-stated at the PF or the LF components, even if playing a role there as well. To begin with, Uriagereka (to appear) deduces Kayne's LCA: 9 a

(1)

precedes 6 iff: (a) a commands B; or (b) 8 commands B and dominates a .

Throughout, I assume that command is a direct reflex of merger (M), adapted from Epstein (1995): a commands B iff a merges with 5, 8 reflexively dominating B.1

(2)

Then command is the only relation which is derivationally defined (via M) for a set of heads within a derivational block. Compare (3a) and (3b). The boxes in (3) define 'derivational blocks' or monotonic applications of M . " Given any two categories X and Y within a derivational block, either X and Y are merged, or otherwise X is merged with (the projection of) a category which Y has merged with, etc: (3)

a.

^{K{a,B}},8}} 8

{a, {a,B}} a

B

b.

{a,{{S,{S,n}},{cc,{a,B}}}} {5,{8,n}}

{a,{a,B}} a

B

Call the object that results from monotonic applications of M a 'command unit'. 12 Command units emerge within a system with usual assumptions. To deduce the base step of the LCA in (la), note that it encodes two formal and one substantive requirement. The latter is that unordered structural relations map to precedence ones. Chomsky (1995) assumes plausibly that this follows from bare output conditions on PF {contra Kayne (1994)). The first formal requirement expressed through ( l a ) is that the sequence of PF heads should be structured in terms of already existing structural relations; the second formal requirement, that the PF sequence should map specifically to familiar precedence relations. 13 Neither requirement follows from the substantive assumption, but they are optimal routes to take. First, Command is the only deduced structural relation that holds of heads which are structured within 'command units'; if the LCA attempts a mapping from already existing relations among heads, only command could be relevant without any additional cost. 14 Second, mapping the command sequence < a , B, 8,...> to a sequence of PF timing slots in a bijective fashion is

175 optimal; for a two-dimensional space, mapping the first sequence to the x axis and the second sequence to the y axis, the relevant function is the trivial x = y; alternatives to x = y involve more operational symbols. 15 Therefore the base-step of (1) follows from piggybacking on M's derivational history, given dynamic economy. Uriagereka (to appear) then proceeds to deduce the induction step in (lb) in a less direct fashion, since there is no trivial way of deducing the fact that domination is involved in this step. The proposal is that (lb) is a result of applying Spell-out multiply, each time according to the base step in (la). Assuming an LC Theorem (the base in (la) being deduced), only command units can be linearized by the application of a transformational procedure L at Spell-out. L is an operation L(c) = p, mapping command units c to intermediate PF sequences p (see fn.15), and removing phrasal boundaries from c representations: (4)

{ a,{ a,{ fi,{ B,{ n,{ n, 8 >>>>>>

->

{a,}

The output of L is a labeled word sequence, with no constituent structure—hence, not a phrase-marker. It is still a labeled object, though, and in that sense akin to a frozen expression whose internal structure is inaccessible to the computational system, but can nonetheless be part of a phrase-marker. We can then proceed to merge it further, which will yield a new command unit for L to linearize. This is multiple Spell-out, which makes ( l b ) true, but derived. A complex phrase-marker involving several derivational blocks must be spelled-out prior to M, or the derivation will crash at PF (once (lb) has no axiomatic status). But note, crucially, that partially spelled-out objects must be mapped to LF prior to the phrasal flattening induced by L. Thus generalized merger (of the sort in (3b)) has to produce two objects: a linearized one for PF, and a hierarchically unordered one for LF. This entails a dynamically bifurcated model, of the sort sketched by Jackendoff (1972), and Lasnik (1972), (1976, appendix), and recently revamped by Lebeaux (1991). The PF and LF components are not mapped in a single static step, but through a cascade of derivational events that end up converging (or not) on final representations. This property of the system is crucial, given that a dynamic bifurcated model must have incremental characteristics (so as to assemble partial structures until final representations are achieved). The ultimate assembling operation has to be substitution, as in Chomsky (1995:chapter 3). The spelled-out object serves the purpose of Chomsky's designated substitution target 0, for it has a label (allowing substitution of the right sort), but no phrasal structure. Structure is gained only by substituting into the partial p and / representations, already 'stored' in the LF/PF components upon previous instances of Spell-out:

176

(5) a.

Standard merger and Spell-out of first derivational block: {8,} {5,{8,n}}

L— {a, {{5, ...},{a,{a,B}}}} t {5,...} +{a,{a,6}} t a + 8

> {a,{{5,...},{«,{a,B}}}}

Generalized merger of first command unit and second command unit: Substitution S(p,,p 2 )

Substitution S(1, ,12)

{a,} t {5,}

{a,{{§,...},{a,{a,B}}}} t {5,{5,n}}

Final PF result: {a,}

Final LF result: {a,{{5,{8,(i},{a,{a,6}}}}

Note that PF processes must happen within intermediate p representations, or they will not be constructive processes. For instance, determination of a focus projection (as in Cinque (1993)), or comparable phonological processes of the sort discussed in Bresnan (1971), must take place prior to the integration of sequences into a final phonetic representation. Afterwards the representation will be in the interpretive A/P components. We can in principle talk about representational units within this phonetic object, but these have to be justified on the basis of substantive bare (A/P) properties. It is unclear what such property a focus projection, say, could follow from. In contrast, a focus projection path is a natural object to expect in partial representations p\ in fact, Cinque's characterization is essentially in terms of command (right) branches, as the dynamic bifurcated model correctly predicts. 16 Comparable issues arise for the LF branch. Just as PF processes must happen within intermediate p representations, LF processes must happen within intermediate / representations—or they will not be constructive processes. Then if only partial representations I can be the locus of LF constructive (i.e. formal) processes, it follows that only command units could be the locus of LF universals}1 Again, we could in principle talk about representational units within the final semantic representation; but these would need substantive (I/C) justification.

177 In sum, command is indirectly relevant for LF (or PF) units because these are dynamically mapped from partial representations where only command holds among word-level units. If this proposal is correct, it does not just happen to be the case that LF dependencies are sensitive to command (as opposed to other imaginable relations); matters could not have been otherwise. Relation R, the substantive relation involved in obviation, instantiates a sub-case of this general property of LF. Of course, we could define other relations, but whatever step we take in this direction has serious consequences, given these (rather crucial) assumptions: (6)

Assumption about The Inclusiveness of the LF Component All LF representations are the expression of lexical relations.

(7)

Assumption about The Uniformity of the LF Component All LF processes are observable in some form prior to Spell Out.

Given (6) and (7), simply stipulating a specifically covert relation like 'is-in-the-samephrase-marker-as', will be non-inclusive, since this hypothetical relation would be determining representations which are not the expression of lexical features. Suppose, then, that we want to deduce the relation in question from some constructive procedure. Saying that it is a procedure specific to the LF component would be non-uniform, since the procedure in question would not be observable in the overt component. This means the only possibility left is that 'is-in-the-same-phrase-marker-as' follows from the preSpell-out procedures. If the architecture I have discussed is anywhere near right, such procedures never yield a global relation of the sort we are considering; rather, they are limited to command relations (i\s-\n-the-same-derivational-block-as'' dependencies). I hasten to add that the semantic component, after LF, may in fact be able to invoke such global relations as 'is-in-the-same-phrase-marker-as'. But, ultimately, that is a matter of performance: whatever goes on after the levels of LF and PF which interface with the sensory-motor and intentional-conceptual components. Given the present architecture, if we find that phenomenon P is sensitive to a structural relation that cannot be deduced from syntactic procedures (in the narrow sense invoked here), this is an immediate indication that P is not narrowly syntactic. That possibly is the case with Weak Cross-over, of the sort illustrated in (8): (8)

a. b. c.

His son likes everyone. Who does his son like? His son likes a man.

There is a characteristic interpretive limitation in sentences of this sort: the pronoun cannot co-vary with everyone, who, or a man (in an indefinite reading). Crucially, though, there is no command relation between either of these elements, and hence by hypothesis whatever interpretive relation is relevant in these instances cannot be narrowly syntactic (i.e., takes place after LF). 19 Conversely, that P is best characterized by a relation which can be deduced on syntactic grounds makes it possibly, but not necessarily syntactic. For instance, in light of (9), we may assert (10) (see May (1977)):

178

Uriagereka: Formal and Substantive

(9) a. b. c. d.

John is believed by his mother to be a genius. His mother believes that John is a genius. Everyone is believed by someone to be a genius. Someone believes that everyone is a genius.

(10)

If quantifier a commands quantifier B at LF, then a has scope over B.

Elegance

Now, even if (10) indeed obtains, it may be possible for it to result from post-LF conditions. 20 This is all to say that the fact that command defines obviation does not in itself make obviation an immediate part of syntax. I will argue that at least local obviation is syntactic, but the point cannot be conceded a priori.

3

Locality Matters

Consider next locality. The first obvious question is why given elements should care about how far away other elements are. For relations involving movement, we have an answer: The Minimal Link Condition. This poses the same sorts of theoretical questions that the Last Resort Condition did, in terms of whether its violation leads to a derivational cancellation or rather a crash, and either way, why locality should be part of the phenomenon of grammar. It is worth stressing that, when we say locality in Minimalism, we mean something quite specific:

(11)

...

MinD(X) [a...[ B ...[8

...

MinD(Y) [ n . . . [71

Given a command unit including , and where MinD(X) = {a,B,...} and MinD(Y) = {|vc,...}, 8 is closer to the elements in MinD(X) than the elements in MinD(Y) are, but (i) the elements in MinD(X) are not closer to each other than 8 is, and (ii) none of the elements in MinD(Y) is closer to 8 than any other element in MinD(Y). Distance is relevant only within command units, which we expect given the present architecture. But locality voids distance: elements within the same minimal domain are equally far as targets or equally close as sources of movement to or from an element that is trying to move or be moved. Assuming the state of affairs in (ll), 2 1 I am concerned with the fact that elements standing in a 'super-local' relation of lexical (L) relatedness, do not interfere with one another for distance purposes. Curiously, various phenomena seem to be sensitive to the notion of L-relatedness, as in (12): (12)

a and B are L-related iff a projection of a is in the minimal domain of B.

179 (13) a. b. c. d. e. f. g. h.

Absence of distance (as in (11)). Head movement. Word formation (as in Hale and Keyser (1994)). Theta-role assignment (as in Chomsky (1995)). A-movement (as in Chomsky (1995)). Distribution of Case features (see below). Local anaphoric binding (see below). Local obviation (see below).

(13b) obeys the Head Movement Constraint. 22 (13c) (words cannot be derivationally formed without L-relatedness of all their constituents to a pivotal head) adds to (13b) a limit on successive head movement: it only happens twice, still creating a word. (13d) is true for role assignment (verbs are L-related to their internal argument(s), and to the specifier of the v-projection—after the verb raises to the v shell)). As for (13e), consider (14) below [see next page]. While (14a) involves A-movements which are never across domains of L-relatedness (O moves within the domain of V's L-relatedness; S, within the domain of v's Lrelatedness), the movement of O in (14b) is not within the domain of L-relatedness of any category—and impossible. This should bear also on the locality implied in the distribution of Case features (13f), although there is the issue altogether of what it means to have uninterpretable Case features that moved determiner phrases must check (see section 5). In turn, (13g) can be illustrated as in (15) (to ensure local binding, I use the Spanish clitic se)\ (15) a.

E l s e destruyo. he self destroyed

b. * El se destruyo fotos. he self destroyed photos ('he destroyed photos of himself)

Finally, local obviation (13h) arises in the same contexts: (16) a. b.

He destroyed him. He destroyed photos of him.

Whereas (16a) is obviative (the reference of he and him differs), (16b) is not. To the extent that all of the phenomena in (13) involve L-relatedness, we may think of this notion as determining a dimension of some sort, within which certain structural properties are conserved. 23 There are plausible candidates for conservation in most of these examples. (13a) conserves locality itself; (13b), the symmetry of head relations; (13c), lexical integrity; (13d), lexical dependencies; (13e), (A-)chain integrity. The last three, though, are less obvious. (13g) might be a sub-case of (13b) (as in the literature partly summarized in Cole and Sung (1994)) or a sub-case, essentially, of (13d) (as in Reinhart and Reuland (1993)—though see section 7). Why (13f) should be conserving Case relations is hard to understand, if we do not know what Case is. And as for (13h), we are essentially clue-lcss. In what follows, I try to relate the mystery in (13h) to that in (13f), by way of a detour.

180

Uriagereka: Formal and Substantive Elegance

Uriagereka: Formal and Substantive Elegance

4

181

Clitic Combinations

(17) Possible a r g u m e n t clitic combinations: 2 4

a.

b.

c. d.

e.

Dat

Acc

Dat

Acc

Acc

Dat

Acc

Dat

*me

me/nos

*te

me/nos

*me

me/nos

*te

me/nos

I.sg

I.sg/pl

Il.sg

I.sg/pl

I.sg

I.sg/pl

*nos

me/nos

*os

me/nos

*nos

me/nos

me/nos

I.sg/pl

I.sg/pl

II.pl

I.sg/pl

I.sg/pl

I-sg/pl

Il.sg *os II.pl

*me

telos

*te

te/os

*me

te/os

*te

te/os

I.sg

II.sg/pl

Il.sg

*nos

te/os

I.pl

II.sg/pl

II.pl

I.sg/pl

II.sg/pl

I.sg

II.sg/pl

Il.sg

II.sg/pl

te/os

*nos

telos

*os

te/os

II.sg/pl

I.pl

II.sg/pl

II.pl

II.sg/pl

*le(s) le(s)

*le(s)

lo(s)

lo(s)

III(pl)

III(pl)

III(pl)

III(pl)

me

lois)

te

lo(s)

*me

*le(s)

te

I.sg

III(pl)

Il.sg

III(pl)

I.sg

III.pl

Il.sg

nos

lo(s)

lo(s)

*nos

le(s)

*os

OS

I.sg/pl

IH(pl)

le(s)

I.pl

III(pl)

Il.pl

III(pl)

I.pl

III.pl

II.pl

III(pl)

*le(s)

me

*le(s)

te

*lo(s)

me

*lo(s)

te

III(pl)

III(pl)

I.sg

III(pl)

Il.sg

III(pl)

I.sg

*le(s)

nos

*le(s)

OS

*lo(s)

nos

III(pl)

I.pl

III(pl)

II.pl

III(pl)

I.pl

Il.sg

*lo(s)

OS

III(pl)

II.pl

T h e Spanish p a r a d i g m in (17) is idealized f r o m Perlmutter (1971) (see m o r e generally W a n n e r (1987)). (The basic sentence is always: he showed y to z (in the mirror), and j u d g e m e n t s are m y own.) T h e first surprising fact about this p a r a d i g m is that so few clitic c o m b i n a t i o n s are possible. S o m e ungrammatical c o m b i n a t i o n s m a y follow f r o m the distinction argued for in Uriagereka (1994) between [+s] and [—s] special clitics— I/Il clitics b e i n g [+s], and III clitics, definite determiners in nature, being [ - s ] . A s s u m i n g that analysis, c o m b i n a t i o n s of the f o r m < [ - s ] , [ + s ] > are impossible in a l a n g u a g e like S p a n i s h as a result of syntactic restrictions on m o v e m e n t and clitic p l a c e m e n t , irrelevant n o w . That eliminates (17e). In turn, Uriagereka ( 1 9 8 8 : 3 6 9 a n d ff.) provided an a c c o u n t of w h y in Spanish the clitic ordering < A c c u s a t i v e , Dative> is not possible (along t h e lines of w h y */ gave the book him is out in English). S u p p o s i n g this too, t h e p a r a d i g m w o u l d reduce to (18) below. A n d surely there are w a y s in w h i c h to r e d u c e s o m e of t h e e x a m p l e s in (18) to violations of C h o m s k y ' s B i n d i n g C o n d i t i o n B. H o w e v e r , the reduced paradigm in (18) s h o w s that this cannot be the w h o l e story:

182

JJriagereka: Formal and Substantive

Elegance

(18)

a.

b.

c. d.

Dat *me I.sg *nos I.pl *me I.sg *nos I.pl *le(s) IH(pl) me I.sg nos I.pl

Acc me/nos I.sg/pl me/nos I.sg/pl te/os II.sg/pl te/os II.sg/pl

Dat *te H.sg *os II.pl *te Il.sg *os II.pl

Acc me/nos I.sg/pl me/nos I.sg/pl te/os II.sg/pl te/os II.sg/pl

te Il.sg

lo(s) III(pl) lo(s) HI(pl)

lo(s)

IH(pl) lo(s) III(pl) lo(s) III(pl)

OS

II.pl

In the highlighted examples, the reference of the two clitics is necessarily different, given their lexical interpretation; so while a binding theoretic account exists for the rest of the paradigm, it clearly does not generalize. To try a different take on these problems, consider restrictions as in (19): (19) a. b. c. d.

the the the the

lions yankees scissors bus

/ / / /

* the Houses * the yankee.sev * the scissorse.s the buses

Why can lionses not denote a plurality of pluralities of lions? Or yankeeses denote the plurality of teams which carry the name and/or description (the) yankeesl Or why is scissorses not the way to denote two or more simple objects? Apparently morphology is to blame: when a plurality marker appears in lions, yankees, and scissors (unlike in bus), a second one is not tolerated. This restriction is known to hold only for inflections, not for derivational affixes: (20) a. re-attach, re-reattach, re-rereattach... b. 5M¿-optimal, 5M¿-jw¿optimal, sub-subsuboptima\... c. transforma/;««, transformaí/onalize, transformaí/onalizaí/o«... It is at first not clear what Minimalism can say about (19). Certainly, an inflectional feature must be checked (and erased if uninterpretable) in the course of the derivation, unlike a derivational feature. Assuming this, though, consider the Spanish (21c) below, which is analogous to (19a), but more conspicuous. First, note that in (21a) one of the two plural features must be uninterpretable. The data in (21c), which do not involve plurality markers on the nomináis, suggest that the interpretable feature is the one in the

Uriagereka: Formal and Substantive

Elegance

183

determiner. 25 Hence we assume that la 'the' has the interpretable feature against which the plurality feature of leonas 'lions' is checked: (21) a.

[[[la]s] the [+pl]

[ [leona] s] ] lion [+pl]

b.

* [ [ [la] s] the [+pl]

c.

la Ridruejo, Sartorius, Habsburg... 'the person who is a member of the Ridruejo, Sartorius, and Habsburg families'

d.

las Ridruejo, Sartorius, Habsburg... 'the persons who are all members of the Ridruejo, Sartorius, and Habsburg families' or 'the persons who are each a Ridruejo, a Sartorius, and a Habsburg.'

[[[leona] s] as]] lion [+pl] [+pl]

The question is why the checking in (21b) cannot proceed. Note that the problem is not the fact that we have TWO uninterpretable features to check against the, since the's feature is interpretable, and hence does not erase upon checking an uninterpretable feature that moves to its checking domain. The problem is not either that the most deeply embedded [+pl] feature would not be in the checking domain of the interpretable feature, particularly if at LF each uninterpretable feature can move (together in a feature bundle or separately) to the relevant target. Likewise, it does not seem that one of the features is a closer source for attraction than the other one is: both are within the same local domain. To avoid this impasse, consider the definition of Minimal Domain: (22)

Definition of Minimal Domain: For a , a feature-matrix or a head #X#, CH a chain ( a , t ) or (the trivial chain) a : i) M A X ( a ) is the smallest maximal projection dominating a . ii) The domain D(CH) of CH is the set of features dominated by M A X ( a ) that are distinct from and do not contain a or t. iii) The minimal domain MIN(D(CH)) of CH is the smallest subset K of D(CH) such that for any x belonging to D(CH), some y belonging to K dominates x.

There is a crucial modification in (22) to the standard notion of minimal domain, which I have underscored. We must grant this extension in the current version of Minimalism because, otherwise, matrices of formal features which move by themselves at LF will never be part of a minimal domain. The extension is trivial; checking is about features, not about categories. Checking domains should be about features and not about categories. 26 However, now consider the following fundamental question. How does the grammar know one feature from another? Two identical features should be taken to be only one feature once they are lumped into a set—and a minimal domain is nothing but a set. This is to say that if the two [+pl] features in (21b) reach the same checking domain (which is a sub-set of a minimal domain), they will be identified as only one

Uriagereka: Formal and Substantive Elegance

184

feature in that set, and hence only one checking will be possible. This leads to a direct crash; perhaps even to a derivational cancellation. 27 A similar analysis can be given to the bad examples in (18), if we are able to show that the crucial feature that motivates clitic movement is identical in two different clitics which, following Chomsky (1995), seek checking within the checking domain set of the same hosting head. For all of the (a) and (b) examples in (18) we can say that the relevant feature is [+s], whose substantive content is 'pertaining-to-the-pragmatic-axis' (that is, I and II). For the (c) examples the relevant feature is [—s], whose substantive content is that of a definite article. Of course, the clitics do differ in phonological shape, but this is irrelevant: PF features never enter into syntactic computations. It may also be suggested that even if all III clitics have the same substantive character, they differ in the uninterpretable feature of Case. Below, I propose that it is because arguments in the same checking domain are in fact non-distinct that they must be marked with Case. 28 Finally, why are I and II not distinguished? The grammar distinguishes the fact that these are speech-oriented clitics, which I code through feature [+s]. What I do not think to be the case is that the grammar codes differences between I and II—even if pragmatics does. 29 If these ideas are on the right track, then not surprisingly, when a combination of [ - s ] and [+s] features is at issue, the result is perfect, as in (18d); each feature is indeed identified as distinct in the checking domain. One may be tempted to favor a semantic analysis, but the examples in (23)— corresponding to the bad sequences in (18a)/( 18b)—do not make it advisable. 3 0 (23) a.

b.

Me mostró a mí/nosotros, me showed to me/us ' H e showed me to me/us'

Te mostró a mí/nosotros, you showed to me/us 'He showed you to me/us'

Nos mostró a mí/nosotros, us showed to me/us ' H e showed us to me/us'

Os mostró a mí/nosotros. you.pl showed to me/us 'He showed you guys to me/us'

Me mostró a tí/vosotros, Te mostró a tí/vosotros. me showed to you/you.pi you showed to you/you.pi ' H e showed me to you/you guys' 'He showed you to you/you guys' Nos mostró a ti/vosotros, us showed to you/you.pi ' H e showed us to you/you guys'

Os mostró a tí/vosotros. you.pl showed to you/you.pi 'He showed you guys to you/you guys'

The fact that some (as opposed to none) of these combinations arise with full pronouns indicates that there is no semantic problem with (18). But the fact that all of the examples are grammatical is even more interesting: it suggests that a Condition B treatment of some of the ungrammatical examples in (18) is on the wrong track. This is welcome, because otherwise we would be ruling out some of those examples twice: through Condition B, and through a failure in checking. One other example is significant. Compare (18c) now repeated, to (24b):

185

(24) a. * Le lo mostró . him him showed

b. Se lo mostró . se him showed 'He showed him to him'

This example usually gets what I believe to be a misleading analysis: it is glossed as (24a), claiming that se in (24b) is just a phonological variant of le. But why should le get a phonological variant in just this instance is never discussed. Likewise, the surprising fact in (25d) usually goes unnoticed: (25) a.

Aquí se encarcela hasta a uno mismo. here se(impersonal) jail even to one same 'Here, one even jails oneself b. *Aquí se se encarcela. here se(impersonal) se(reflexive) jail c. Aquí se envía dinero a los familiares. here se send money to the relatives 'Here, one sends money to one's relatives' d. *Aquí se se lo envia. here se(impersonal) se('DATIVE') it send

Bouchard (1984) discussed paradigms in which incompatibilities of the sort in (25b) arise: impersonal se cannot co-occur with reflexive se. This has lead various authors to treat all instances of se as being the same se (see Raposo and Uriagereka (forthcoming) for discussion and references); suppose this is correct. Whatever indirect object se is, it cannot co-occur with impersonal se (25d). Thus it looks as if this 'dative' se is se after all.31 If so, we immediately see why (24b) is grammatical, assuming (see section 6) that se, unlike le, does not have a feature [—s], thus being compatible with the [-s] lo.32 Setting aside until section 8 how (24b) comes to mean something similar to what (24a) would mean, (24b) is as good a structure as the grammar of Spanish generates when trying to express the meaning implied in (24a). It is plausible that other Romance languages avoid the sequence le lo in other ways, thus for instance creating merged clitics like the Galician-Portuguese llo a single clitic coding the meaning of lie and (l)o, or the Italian glielo, which collapses le and lo. Why the language would need to do this is what matters for our purposes: two clitics with the same [-s] feature hosted in the same head lead to an LF crash.

186

5

Uriagereka: Formal and Substantive

Elegance

Uninterpretable Features

Recall now the English paradigm in (26)-(30) (where ! marks obviation): (26) a. * I like me

b.

.. you

c.

... him

d.

.. us

e.

...them

(27) a.

you like me

b. * .. you

c.

... him

d.

.. us

e.

...them

(28) a.

he likes me

b.

.. you

c. ! ... him

d.

.. us

e.

...them

(29) a.

we like me

b.

.. you

c.

... him

d. * .. us

e.

...them

(30) a.

they like me

b.

.. you

c.

... him

d.

e. ! ...them

.. us

If the approach in the previous section is correct for (18), can we get it to handle these familiar facts? 33 In Chomsky's (1995) system, the Case features of a direct object move to the domain of T at LF, where they are checked against the sublabel V of T (which is created upon the independent movement of V's features to T).34 The point is: at some stage in the derivation the Case features of the subject are in the checking domain of T (its specifier), and later on the Case features of the object are in the checking domain of T (adjoined to it).35 How do we keep track of those features, if they are identical and included in a set? We may argue that the grammar does not get 'fooled' because the object Case features are [+accusative], while the subject Case features are [-accusative]. Yet, there is something peculiar in this reasoning. Why are there different Case features for elements within the same domain? Because we need to distinguish subject Case checking from object Case checking. And why do we need to distinguish those? Because both subjects and objects have Case features! In a nutshell, we are positing a distinct Case feature with the sole purpose that an object or a subject moves to a configuration that gets rid of it. The reason that Chomsky (1995) considers Case an uninterpretable feature is empirical. Argumental movement by LF itself would also be forced even if the Case features were interpretable, so long as the elements that host the Case feature at LF have themselves an uninterpretable feature. However, consider (31): (31)

* The man seems that t is smart.

The man moves to check the matrix D feature in T, after having checked and erased Case in the embedded clause. The only thing that goes wrong in (31) is that the matrix 'Case hosting' feature is not checked, because the regular Case feature in the man has been erased downstairs; but that holds only if Case is uninterpretable, for otherwise Case could be checked several times. However, while these mechanics work, they are unsatisfactory. The real question was and still is why there should be uninterpretable Case features in the grammar. Note that the existence of uninterpretable features per se is not particularly troubling. Take, for instance, uninterpretable D features of various sorts. They code a dependency between

187 an interpretable D feature in a determiner or Tense and some associate. Why there should be such long-distance dependencies is an issue, but once this is assumed, it is perhaps even natural that we should code them through this particular device. What is harder to see is what is at issue in Case checking, since both the source and the target of the movement involve uninterpretable features. Is this an irreducible quirk? Consider Case features in some detail. Unlike intrinsic features (like D or Wh-), Case features are relational. A D feature, for instance, has a value '+' which needs no further validation, and allows the D feature to appear in certain checking domains. Case is different. If a Case feature has value 'accusative', that value needs to be checked, specifically, against a 'Case hosting' feature 'I-check-accusative'. In fact, Chomsky argues that checking an 'accusative' feature with a feature 'I-check-nominative' leads to an feature mismatch, and an immediate derivational cancellation. But the very notion of 'feature mismatch' presupposes not just feature checking, but furthermore feature matching. Why can we not just say that, in the same way that some feature in T attracts, specifically, a D feature (and not a Wh- one, say), some other feature in T attracts, specifically, an accusative feature (and not a nominative one)? We cannot, because that denies the phenomenon of Case altogether. Accusative or nominative are not features, but values of a particular sort of feature, even if these values are relationally determined, through matching. It must be emphasized that matching is a grammatical process, unlike checking, which is just a non-unified phenomenon. Under certain checking configurations (given domains as in (22)), features may stand in checking relations. When they do, certain things may happen; for instance, erasure ensues if a given feature is uninterpretable. Matching, in contrast, is a derivational process that sanctions certain values of features; if this process fails, the derivation is cancelled. No derivation is cancelled for a failure in checking; there are no such failures—what fails is the resulting representation. It is thus important to understand what happens when matching succeeds: (32)

Checking Convention When a relational feature [/?-F] is attracted to match a feature [F-/?], the FF-bag containing the attracted feature is R-marked.

A simple look at an actual morphological paradigm shows that (32) is accurate: (33)

Spanish Pronominal clitics:

Their features and their

morphology:

Acc. sg. I me II te III lo/la

Dat me te le

Acc. I me II t e III l O

Dat. sg. pi. ms. e O s o e e

pi. I nos II os III los/las

nos os les

I n o II O o III 1 0

o o e

fm. a

Uriagereka: Formal and Substantive

Elegance

The key here is this: there is no unified morphological realization for an accusative or dative feature. For I/II-sg., it is e. For I/II-pl., it is o. For III-sg./pl., it is 0 for the accusative and e for the dative. Paradigms of this sort are quite common, and tell us something simple and well-known: we need complex features. We must allow for features like [I, sg., acc] or [III, sg., ms., acc]. 36 Or differently put: the value accusative or dative is a value for a whole bag of formal features, as the Checking Convention predicts. But even if (33) indicates that things are this way, why should they be so? I suggest that the Checking Convention allows featural distinction, for FF- bags which typically end up in the same set. That is, assume that all person, number, gender features are interpretable, in accordance to Chomsky's (1995) analysis. Now imagine a situation (within a given minimal domain) where the features in each argument have the same values. This should be very common, once we set aside Case: most statements about the world involve III subjects and III objects. Then it is natural for grammars to have a purely formal device to mark FF-bags for distinctness. At that point, no issue ever arises, regardless of specific values.

6

Obviation Revisited

The question is then what effect this grammatical fact has on speaker interpretation, a part of performance within Minimalism. We have a feature 'I-check-accusative' associated to v, and it has the effect of erasing the Case feature in him, and correspondingly 'accusative'-marking the FF-bag that contained the erased Case feature. I suggest that we bite the bullet and relate the mystery of Case to the mystery of local-obviation, in the following way: 'accusative'-marked FF-bags are disjoint from 'nominative'-marked FF bags. 37 Chomsky (1995) assumes, essentially, that FF-bags carry referential features, and in that respect it is not surprising that bags marked for distinctness should be interpreted differently as well. I assume with Postal (1966) that pronouns are hidden definite descriptions; him has the import of the one. With Higginbotham (1988), I take definite descriptions to invoke a context variable whose range is confined by the speaker. Then the one is roughly, though more accurately, the one I have in mind. In sum, (34a) has the informal logical form in (34b): (34) a. b.

He likes him. [the x:one(x) & X(x)] [the y:one(y) & Y(y)] x likes y

And the key factor is this: Can the context variables X and Y have the same value? If not, the two unique ones invoked will come out distinct: one is a unique one salient in context X, the other one is a unique one salient at context Y. On the other hand, if X = Y, then the two ones will be the same. My specific claim, then, can be technically expressed as follows:

Uriagereka: Formal and Substantive Elegance (35)

189

Transparency Condition In the absence of a more specific indication to proceed otherwise, where FFbags a and 8 are grammatically distinct, the speaker confines the range of a's context variable differently from the range of B's variable.

I do not purport to be claiming that (35) follows from anything, only that it is more natural than traditional obviation conditions, starting with Lasnik's (1976) Disjoint Reference Rule. Why should pronouns have to be locally obviative? Within minimalism, that is a fair question to ask. (35) answers it thus: because they are grammatically distinct, the most transparent mapping to their semantics is also in terms of semantic distinctness. 1 emphasize also that, from this point of view, the phenomenon of obviation is taken to be speaker-dependent, thus invoking a mode of presentation from a perspective. This, though, is not crucial to my proposal, which is more modest than understanding the detailed implications of the obvious mapping in (34): I simply want to know why local obviation is there, to start with. If I am right, it is the semantic price of Case. Consider next some potential problems that the rest of the examples in (26)-(30) may raise. First, Case marking also distinguishes you and he in he likes you, or you and I 'm I like you, raising the question of why this should be so if these are already distinct pronouns. (36) poses a related question: (36) a. He has arrived. b. *He to arrive was a mistake. While he does not have to be different from any other argument here, to think that it should not involve Case—when it does otherwise—is to force the grammar into two sorts of paradigms for arguments: Case-marked and Case-less ones. This is more costly than having a single paradigm where all noun-phrases are Case marked. We do predict, however, that when a single argument is at stake (unaccusative constructions), grammars should not ascribe much significance to what form of Case is employed. This is what we find: where English employs a subject Case in (36a), Basque employs an object Case in this same situation. Mixed paradigms also abound. 38 Similarly, we may say that the you/he and you/I combinations, and others, are still Case marked because that is the simplest decision for the grammar to take, and it has no semantic consequences. 39 Of course, the sequences , , or in (26)-(30) now follow. The Case mark will force either ungrammatical or uninterpretable results for the first two sequences (depending on whether checking proceeds), and given the fact that II cannot be disjoint from II or I from I. As for III, the feature will yield grammatical and interpretable results, albeit obviative ones. Why local domains where Case is checked should correlate with the domains where local obviation obtains is also entirely expected: they are aspects of the same phenomenon. This means, first of all, that (37), and similar examples, should face the same sorts of problems that the paradigm we have just seen does:

190 (37) a. b. c. d. e.

! * ! ! !

A friend of mine likes a friend mine. The best friend of mine likes the best friend of mine. John likes John. John likes him. He likes John.

Certainly, all of these expressions are obviative, and (37b) is either ungrammatical or uninterpretable, assuming we are forcing two expressions singling out the same unique individual not to co-refer. As for why the last three examples should be obviative, the matter is trivial if only FF-features are relevant in the syntactic computation, as assumed by Chomsky. 40 Now we must also concern ourselves with why the Case marker could not save all the examples in (18)—with the exceptions of combinations of the same personal feature, uninterpretable under disjointess. I deal with this below.

7

Two Types of Anaphora

But first, we must deal with coreference. Given my analysis, coreference should be a marked instance for arguments within the same local domain. Typical local anaphoric instances support this intuition, given Pica's (1987) insight that they are bimorphemic. A statement of self V-ing is generally possible only if the second argument carries an added morpheme like self or same (see Safir (1992)). Then we may say that the syntax distinguishes each argument in the way we saw above, by Case-marking. In turn, the semantics is now dealing with two formal facts with an apparently contradictory interpretive import. Case differentiation normally entails difference, while the anaphoric element is instantiating some sort of statement about sameness. This need not be a problem, though, although we must separate two instances. Consider the expression of anaphoricity by way of a regular pronoun followed by a morpheme expressing contextual sameness, as in the Danish (38): 41 (38) a. b.

Peter fortalte Anne om hende selv Peter told Anne about her self

/* ham selv. him self

[the x:Peter(x) & X(x)] [the y:Anne(y) & Y(y)] [the z: one(z) & same-as-before(z)] x told y about z

I take names not to pose any particular problems as definite descriptions, once they involve a context variable, 42 so Peter is akin to the Peter I have in mind. In turn, I take selv to determine sameness of contextual confinement. When attached to hende (=the one), selv makes the context variable predicated of the variable of the definite description be the same context variable as a previously introduced one. This is not just any variable, though, but the closest one, 43 as a result, (38a) is ungrammatical when the pronoun ham (which normally would hook-up to Peter) is introduced. This is because the context variable which is closest to ham is that of Anne, and confining the range of ham in terms of Anne leads to an absurdity (assuming she is not a transvestite).

Uriagereka: Formal and Substantive Elegance

191

Intuitively, the Transparency Condition would, by default, make all x,y and 2 in (38) be different variables. However, there is in this sentence a more specific, explicit indication to the effect that y and z have the same value, since they pick out a salient, unique individual in two identical contexts. While the puzzle above is solved by declaring the Transparency Condition a default interpretive strategy (and treating pronominal anaphors as logophors, in the spirit of Reinhart and Reuland (1993)), instances of local anaphors involve a different process. Compare the Danish (39a) to the Galician-Portuguese (39b): (39) a.

b.

Peter fortalte sig selv om Anne. Peter told SIG same about Anne Pedro dixose algunha cousa a si propio sobre de Ana. Pedro told-SE something to SI same about of Ana 'P. told himself (something) about A.'

I take Lebeaux's (1983) insight, particularly as pursued by Chomsky (1986), to simply be that (39b) is pretty much the way (39a)'s LF should look like. If so, some feature from within the Danish sig selv must raise at LF to wherever it is that se is in GalicianPortuguese. 44 But what is the relation between se and the double si (and therefore between whatever moves covertly in Danish and sig)? The Galician-Portuguese propio, literally 'own', gives us a clue, if we take it to be invoking 'possession' as in Szabolcsi (1981), Kayne (1994), or more specifically as extended by Hornstein, Rosen, and Uriagereka (1995)—see (40a). I propose in this light (and much in the spirit of Pica (1987)) that the (se, si) relation invokes something like 'x's own persona' (40b): (40) a.

[ John

be+in(=have) [ X P t [ t [ A g r P a head Agr [ s c

T b.

*

It

1

t

[ ... CLITIC ... [ X P D O U B L E [ t [ A g r P pro Agr [ s c t

'

t t

]]]]]

11

t

t 1

t ]]]]] I

(40b) is intended to cover all instances of clitic doubling (modifiying an analysis presented in Uriagereka (1994)), and nor just anaphoric clitic doubling. This is desirable on two other grounds. First, it unifies the paradigm in (41): (41) a.

Le levanté a él la mano. him raised, lsg to him the hand

'I raised his hand'

[ ... le ... [ X p a él [ t [ A g r P [ la mano Agr [ s c t t t

T

I

1

]]]]]

192

Uriagereka: Formal and Substantive b. Lo levanté a él (mismo) him raised, lsg to him same

Elegance

'I raised him (himself)'

[ ••• lo ... [ X p a él [ t [ A g r p [ (mismo) [pro]] Agr [ s c t t ]]]]]

t c.

1—1

t

Se levantó a si (mismo) se raised.lsg to si same [... se... [ X p a s i

t

1I 'He raised himself

[ t [ A g r P [ (mismo) [pro]] Agr [ s c t t ]]]]]

1—J

T

1 I

(41a) is an inalienable possessive relation; the possessor clitic raises, leaving behind the small-clause predicate la mano 'the hand'. Identically, in (41b), a normal transitive construction involving clitic doubling, the 'possessor/whole' clitic raises, leaving behind the small-clause predicate pro. I take pro to be an individual classifier, as in Muromatsu (1995), for East Asian languages; we can discern the presence of pro by optionally adding the adverbial mismo 'same', as argued in Torrego (to appear). Finally, (41c) is almost identical to (41b), except that the head of XP is the anaphoric clitic se. The second reason the proposal above is desirable is that it explicitly makes us treat (e.g. Romance) special clitics rather differently from (e.g. Germanic) regular pronouns (cf. (18) vs. (26)-(30)). Following Chomsky (1995), I take regular pronouns to be both heads and maximal projections; if so, they will not involve the elaborate structures in (41). I think that precisely those structures account for why so few clitic combinations are possible in (18), and perhaps also why special clitics must be overtly displaced. Because clitics are non-trivial, functional heads as in (41), they are morphologically weak—in fact, rather more so than the idealized picture in (33) implies. Thus me, te, nos, os can be accusative and dative in all dialects; le(s) can be third person or formal second person in all dialects, and in Latin-American dialects any second person (particularly, when plural); in most Peninsular dialects there are no distinctions between accusatives or datives (they can all be of the le or of the la type); in most informal registers, le can be singular or plural; in Andean dialects lo can be just any third person (number and gender irrelevant); and in various sub-standard dialects los is used for first person plural. Regular pronouns, in contrast, are paradigmatically distinct, which we may associate to their being whole phrases, and thus being robust enough on morphological grounds to support stress, emphasis, heavy syllabic distinctions, and so forth. In the spirit of Corver and Delfitto (1993), I propose that, given their morphological defectiveness, arguments headed by special clitics are not safely distinguished by the Case-checking mechanism. More particularly, I propose that the Checking Convention is inoperative for special clitics, simply because the FF-bag of the clitic is incapable of hosting a Case-mark. And I should emphasize that I am not saying that the clitic does not check Case; rather, that it is not appropriately Case-marked (as per (32)) as a consequence of this checking. In a nutshell, if the clitic does not engage in a further process, the relevant FF-bag will not be distinguished from other FF-bags in the same checking-domain-set. We may see overt cliticization as a process aimed at

Uriagereka:

Formal and Substantive

Elegance

193

explicitly marking, through syntactic positioning, certain distinct relations. For example, a strong clitic in Spanish comes before a weak clitic, making the two distinct. 46 At the same time, any combination (regardless of number, gender, and morphological case) of weak or strong clitics leads to undistinguished FF-bags, and a subsequent LF crash. We can then return to what covert relation is relevant in the Danish (37a): by hypothesis one in which sig is subject to [selv[pro]], and the features of a null se raise to the domain of Peter. I should say that standard se has peculiar placement properties (see fn. 47), which may relate to the usual assumption in the literature, forcefully expressed in Burzio (to appear), that se is in essence an expletive element, with no features other than categorial ones (it is neither [+s] nor [-s], and thus behaves like neither of the clitics thus defined; besides, se is unspecified for gender, person, number, and Case). Ideally, matters of PF and LF interpretability decide on successful placements for se. For instance, as the clitic that it is, it should group with other clitics at PF, albeit as a wild card. In turn, the fact that it is expletive in character may relate to its interpretive possibilities, as follows.

8

Some Thoughts on 'Sameness'

I set aside the ultimate semantic import of the relation between sig and pro, assuming (to judge from (41b)) some form of identity, with the rough content o f ' h i s person'. The issue is the relation between the null se heading the complex anaphoric construction and its antecedent. By hypothesis, se cannot be marked for distinctness in terms of Case. Thus, although it is distinguished from [+s] and [—s] clitics in that it is a clitic with no [s] value, it is harder to see how it could be syntactically distinguished from a non-clitic expression whose features end up in the same checking-domain-set. So suppose the grammar in fact does not tell se apart from such a non-clitic expression. This leads to no LF crash if, in fact, se only has categorial features, and hence no uninterpretable feature to erase. 47 In turn, the expression that se 'collapses' with can be seen as its antecedent, with the semantics in (42) for the examples in (39): (42)

[the x:Peter(x) & X(x)] [the y:Anne(y) & Y(y)] x told [x's person] about y

The fact that se gets taken by the grammar to be the same item as the element it shares a checking domain with entails, ideally, the formation of an extended chain with two roles having been assigned configurationally before movement. It must be stressed that although (42) comes out as anaphoric as (38) does, they achieve this interpretive result in radically different ways. In (42), anaphoricity is expressed through a single operator binding two variables, and a single chain receiving two roles. In (38), there are two separate chains, each involving their own operator, which nonetheless come out 'anaphoric' (or logophoric) because of a lexical particle demanding contextual sameness. The sort of analysis delineated above trivially extends to languages, like Basque, where anaphoric relations are expressed as in (43):

Uriagereka: Formal and Substantive (43)

Elegance

a. Jonek bere burua ikusi du. Jon-S his head-the-0 seen has 'John has seen himself.' b. [the x:Jon(x) & X(x)] x saw [x's head]

Obviously, the relation here between bere 'his' and 'burua' is not the same that exists between sig and [[projselv], Yet this is arguably a low level lexical fact, assuming that 'head' is the relevant classifier in Basque for persons. What is again crucial is that the null clitic se relates to its antecedent by being syntactically indistiguishable from it, leading to a fused chain. I stress that nothing is wrong, a priori, with fused chains receiving two roles. Of course, the question is whether or not we should have a Thematic Criterion, a matter that Chomsky debates and does not resolve in (1995). Certainly, nothing that I have said allows (44a) (meaning John used himself): (44) a.

[ John T [ t v [ used t ]]] t II I

b. [ John [ clitic-T [ t v [ used [... t... ] ] ]]] 1

1 = !

I

Note that the first movement violates the Last Resort condition (Jairo Nunes, p.c.). What allows the similar structure in (44b) is that the clitic placement does not violate Last Resort, by hypothesis. In general, fused chains should be possible only when Last Resort is appropriately satisfied. 48 Explicitly: (45)

Chain Fusion Situation Let a and 6 be different chains. If a's head is non-distinct from B's head within a given checking domain, and a's tail commands B's tail, then a and B fuse into an integrated chain 5, subsuming properties of a and B.

I take (45) to be the source of true anaphoric coreference, which should be limited to se (and similar elements) 'collapsing' within a checking domain. 49 It is now easy to see that nothing in the properties of se makes it be anaphorical, specifically. Consider (46) (see Raposo and Uriagereka (to appear)): (46)

pro se mataron se killed.Ill

'They killed themselves/each other.'

Aside from group readings, this has three transitive, distributive readings. It can have the rough import of 'they unintentionally caused their death'. It also has two causative readings: a reciprocal one, and an anaphoric intepretation discussed above. Of course, readings like these are disambiguated in terms of 'doubles', as they are in English with the overt elements themselves and each other. What I am saying, however, is that the

Uriagereka: Formal and Substantive Elegance

195

'doubles' are not crucial to the dependency, in a way that se is, inasmuch as it induces chain fusion. (47a) provides an argument that only chain fusion induces anaphoricity: (47) a. * Juan se ha sido encomendado t a si mismo. Juan se has been entrusted to si same b. Juan le ha sido encomendado t a el mismo. Juan him has been entrusted to him same 'Juan has been entrusted to himself.' Rizzi (1986) discusses an Italian example similar to the Spanish (47a) to argue for his Local Binding Condition. He suggests that what goes wrong with (47a) is that Juan is trying to form a chain over se, which has the same index as Juan. Surprisingly, however, the Spanish (47b) is quite good. Here, we have replaced se for the indirect object le. Generally, that should give the indirect object an obviative reference with respect to Juan, but we have loaded the dice by adding a logophoric double a el mismo, 'to him same', which forces the reference of le to be that of Juan. If we only had indices to establish referential links, (47b) would fall into Rizzi's trap, being predicted ungrammatical—contrary to fact. However, what I have said above provides a different account. Compare:

(48) a.

[TP Juan se [yp t [ levanto [xp ..t....] ] ] ] T

!

i—| b. * [ j p J u a n s e [ ha sido [ encomendado [v] [yp t

*

'J. raised himself

'

r~ x i [XP —t....] [ t t ] ] ] ] ] = ! — '

I

A fused chain arises in (48a), since all the relevant elements stand in a command relation. In contrast, a fused chain does not arise in (48b), since the third and fourth elements do not command each other. 50 Nonetheless, the featureless se cannot be told apart from the element it shares a checking-domain-set with (technically, the D features of Juan). This is a paradox: se cannot head its chain, and cannot be in a fused chain. The result is ungrammatical. Compare finally (48b) to the grammatical (24b), repeated now as (49a). In this instance, we should note, a reflexive reading is actually possible (49b); however, a reading is also possible whereby se is taken to be some indirect object, a matter noted in section 4 which we left pending until now.

Uriagereka: Formal and Substantive Elegance

196 (49) a.

Juan se lo mostro. 'Juan showed it to him.' / 'Juan showed it to himself.' Juan se it shown

b.

[ T P Juan se lo [[ mostró[v]] [ V P t [ [ X p ...t....] [ t t ] ] ] ]]

c.

X—I [ T p Juan se lo [[ mostrófv]] [ V P t [ [ X p - t . . . . ] [ t t ] ] ] ]] t _ I l I

When se is considered within the same checking domain as Juan, (49b) is as straightforward as (49a) (the fused chain succeeds in this instance, because command obtains throughout the members of the chain). In turn, (49c) involves the consideration of se within the same checking domain as lo. In this instance, a fused chain cannot be formed because, just as we saw for (49b), neither the third nor the fourth elements command the other. However, se can be told apart from lo, if what I have said in section 4 is true: se is a different clitic from lo, in that the latter is [ - s ] , while se is not specified for this feature. Whereas this property of se does not make it distinguishable from the D features of a regular subject like Juan, it does make it distinguishable from lo, la, etc. Therefore, se does not collapse with lo in (49b), and then it can indeed form its own separate chain. It will not be an anaphoric chain, but this is in fact desirable: we want the reading of the se chain to be one invoking a third person. 51

9

A Word on Long-distance Obviation

One can think of other potential problems for my approach, but by handling anaphoricity I have tried to show that there are reasonable ways of keeping the system to the sort of bare picture that Minimalism expects. While I will not be able to extend myself at this point to many other instances that immediately come to mind, I will, however, say a word about the fact that my analysis does not predict the obviation in (50), and many similar instances: (50)

! He thinks that John is a smart guy.

Now, consider an analysis of along the lines of Reinhart (1995) (p.51): The coreference generalization. . . is that two expressions in a given LF, D, cannot corefer if, at the translation to semantic representations, we discover that an alternative LF, D', exists where one of these is a variable bound by the other, and the two LFs have equivalent interpretations. I.e. D' blocks coreference in D, unless they are semantically distinct.

Uriagereka: Formal and Substantive

Elegance

Reinhart lets structures like (50) converge; her economy analysis of an example of this sort is thus strictly not related to any of the derivational concerns raised here. Her intuition is that variable binding is a more economical means of identifying referential identity, provided that assignment of reference requires relating an expression to the set of entities in the discourse. This is post-LF economy in the performance systems, about which I have nothing to say. 52 Suppose we were to accept this description of (50)—should it extend to all instances discussed here? While this is a fair question, the point cannot be conceded a priori, given the sort of issue this paper is raising: a pattern of formal behavior does not immediately demand a unified, minimalist explanation. More concretely: What do (50) and the examples in (26)-(30) have in common? The fact that command matters for all is not significant beyond the fact (given the model of the grammar argued for in Section 2); this commonality is necessary to just any LF process. Locality is in fact not common to these examples, obtaining only in the (26)-(30) paradigm, but not in (50). Then the only significant commonality is disjointness of reference, Relation R. However, as I have noted, it is plausible that the grammar only codes sameness and, by default, difference. If so, how surprising is it really that we find two unrelated phenomena which have in common one of these two effects? To demand commonality between longdistance and short-distance obviation would be like demanding commonality between long-distance and short-distance anaphora, which Reinhart and Reuland (1993:658660)—in my view, correctly—are careful enough to distinguish. Once it is taken as an empirical question, are there in fact any advantages to keeping the local and the long-distance phenomena separate? I think there may be a couple. First, non-local relations are immediately suspect when attempting a standard syntactic analysis: they should not exist in derivational syntax. Then again, one may try to argue that what I have called local obviation is not a phenomenon of grammar either. While I do not have any special reasons to want obviation in the grammar, the analysis I have provided here gives a simple way of dealing with it, when it is local. My treatment follows trivially from an independent property of the system: the fact that it makes use of sets that we call checking domains, which are there in some form, whether we agree in the rest or not. As I have tried to argue, obviation is just one among several possible results of the marking-for-distinctness involved in tagging, specifically, different FFbags within a checking-domain-set. Making these minimalist assumptions, we were able to account for the facts in (18), which involve FF-sameness, but not obviation; in turn, we were forced to look into the matter of anaphoricity, which in itself has interesting consequences: a simple distinction between logophors and anaphors, and an account of Rizzi's Local Binding Condition effects in terms of a mechanism of chain fusion; finally, the present account gave us a motivation for what the mysterious uninterpretable Case features are. If we insist that whatever underlies long-distance obviation should predict the short-distance facts we will lose all of these rather natural results.

198

10

Uriagereka:

Formal and Substantive

Elegance

Concluding Remarks

As I see it, true LF properties are the result of constructive processes. I have not said this before, but now I can be loud and clear: the dynamic model that I have summarized in section 2 is in effect saying that LF and PF do not exist as levels, although they certainly exist as components (the locus of Full Interpretation in a radically derivational system). If LF and PF are not levels, there should not be any formal relations obtaining there, other than the ones that the derivation provides by way of its mechanisms of constructing structure. From this perspective, anything long distance should be post LF. In turn, perhaps everything short-distance is LF, in the sense that a constructive process of grammar is being invoked. In this respect, local obviation looks hopeless at first: why would the grammar 'bother' to code such a weird requirement? As a matter of fact, the very existence of obviation conditions is not a bad argument for Gould's position on language: prima facie, obviation hinders communication, in that it limits the class of possible thoughts that are expressed by way of grammatical combinations. Nonetheless, this is a point about language function, not language structure. Structurally, it is still possible that obviation has a minimalist place in the grammar. I have suggested that this place is a mere reflex of something deeper: the matter of deciding on identical or different structures, given set-theoretic dependency constructs. The grammar tags symbols for distinctness and by default assumes they are also semantically different, in some way relating to mode of presentation or intended reference. In sum, the Minimalist program forces us to ponder the nature of principles themselves, over and above the nature of phenomena. The latter are obviously important. Yet without a program to guide us into reflecting on what principles we are proposing (it may be stressed: what properties we are ascribing to the human mind), we may find ourselves redescribing the structure of something like a snail shell, without learning something deeper about what is behind that very pretty pattern.

Notes *

This is a version of a talk delivered at The Role of Economy Principles in Linguistic Theory, Max Planck Institute, Berlin. I wish to thank the organizers of the conference for their invitation and their useful editorial comments, and the audience for their very helpful comments. I also thank my students and colleagues at College Park for their cooperation when sorting out some of these ideas in my Spring seminar on Minimalism. 1 am indebted to Elena Herburger, Norbert Hornstein, David Lightfoot, and Jairo Nunes for their comments on a draft. Usual disclaimers apply. This research was partly financed by a Summer Research grant from UMD at College Park.

1

Exaptation 'Evolutionary theory lacks a term for a crucial concept—a feature, now useful to an organism, that did not arise as an adaptation for its present role, but was subsequently coopted for its current function. I call such features "exaptations" and show that they are neither rare nor arcane, but dominant features of evolution [serving] as a centerpiece for grasping the origin and meaning of brain size in human evolution.' Gould (1991 :abstract). This is not to say, of course, that procedures to integrate, for instances, go through each possible variation. That is a matter of implementation. For instance (i), which is ungrammatical as part of (ii) (because (iii) is a better solution), but grammatical as part of (iv) (because there is no better, convergent alternative).

2 3

Uriagereka: Formal and Substantive Elegance

199

See Chomsky (1995, chapter 4). (i) [a man to be t here] (ii) * [there was believed [a man to be t here]] (iii) [there was believed [t to be a man here]] (iv) [I believe [a man to be t here]] 4

In standard problems in dynamics, we can define a quantity, called a lagrangian L, which ranges over velocity and position, and equals the kinetic energy minus the potential energy. This quantity can be used to rewrite Newton's law, by way of the Euler-Lagrange equation to describe a particle's motion in one dimension (an idealized version of the problem, which involves more than one particle; see on these matters Stevens (1995:27-39 and 59-68)): d

3L(x,x)

dt

dx

3L(x,x) dx

=

0

The particle path that satisfies the Euler-Lagrange equation makes the function A (the action) a minimum. t-f A [x(t)] = S L(x, x)dt t-i 5

6 7 8

9 10 11

12

13

This corresponds to the procedure of 'adiabatic elimination' of fast relaxing variables. The procedure is used, for instance, in reducing degrees of freedom within a probabilistic equation (see Meinzer 1994:66 and ff.). Uriagereka (forthcoming, Chapter 6) offers a speculation as to what it means for a feature to be 'viral' (as assumed in Chomsky (1995)). This is not meant metaphorically. The Slaving Principle is surely a clearer determinant factor in the behavior of turbulence than in linguistic examples. Why the growth of the snail shell proceeds the way it does is a complex matter involving genetic information and epigenetic processes of various sorts. On an early, extremely insightful view on this matter, see Thompson (1945, Chapter VI). This version of the Linear Correspondence Axiom is discussed in Uriagereka (forthcoming, Chapter 3), and is adapted to fit a bare-phrase structure theory. Domination can be defined in terms of set-inclusion of constituent elements within terms, as in Nunes and Thompson (forthcoming). (3b) cannot be monotonically assembled into a unitary phrase-marker, given a 'bare' X'-theory; instead, the system allows the merger of structures which have been previously assembled, by way of generalized transformations. A command unit roughly corresponds to one of Kayne's (1984) 'unambiguous paths', appropriately adapted to the present system. As Dave Peugh and Mike Dillinger independently point out, command units can be generated in terms of Markovian systems, and are thus iterative (not recursive) structures. In the present system, recursion is obtained through a generalized transformation. In Kayne's terms, the correspondence is of the following sort (for A an abstract root node,: a sequence of ordered terminals, and a sequence of time slots in the A/P components): (i)

A > A b > Abe > A b e d —> A b c d ... >

tl t2 t3 t4 t5

200

Uriagereka: Formal and Substantive Elegance

What we must determine is why this correspondence obtains, as opposed to other possible mappings (an issue first raised by Samuel D. Epstein, as far as I know). 14 Otherwise, one would have to define a new structural relation, deduce it, and show this to be a better alternative to command; I do not see what that could be. 15 Generally: y = f(x) (for f a variety of procedures). For instance: y 1 is mapped to the x value three-times removed from 0; y2 is mapped to the x value prior to x l ; y3 is mapped to the x value three-times removed from x2; and so on.

16

17 18

19

(i) converges (the hierarchical ordering is appropriately mapped to some sequence of PF slots); but is not a simpler realization of y = f(x) than y = x. More generally, the proposal has empirical consequence whenever we find structures that do not depend on merger (such as discourse representations or paratactic dependencies involving adjuncts). See Hoffman (1995) on this. LF component being internal, here we can be bold and talk not just of structural properties, but in fact of universal structural properties. The flattened word-like object that results from L has to be understood as a word-level unit. Following a suggestion by Jairo Nunes, Uriagereka (to appear) deduces from this the impossibility of relating subject/adjunct internal structure to the rest of the phrase-marker (Huang's (1982) CED effects). If a proposal first noted in Chomsky (1964), and attributed to Klima, is on the right track, perhaps a discourse representation line can be pursued for these examples. Klima's suggestion was that Whexpressions hide an indefinite predicate of existence; who in (8b) is akin to which x and exists x. This essentially indefinite predicate of existence should be sensitive to Wasow's (1972) Novelty Condition, with the pronoun his introducing a more familiar expression inducing an odd interpretation. The same can be said about everyone in (8a), if this element too contains a hidden indefinite one, as its morphology indicates.

20 I do not make any commitments, however, as to whether this is the case. 21 Just as the Last Resort Condition reduces computational complexity in derivations, so too does the Minimal Link Condition, if derivations that violate it are cancelled. I do not see how this condition might follow from something like the Slaving Principle, but it might conceivably relate to other conditions on systems imposing locality and predicting field and 'domino' effects. 22 In this guise: A head a moves to a head B only if a and B are L-related. 23 We are talking about structural properties which are conserved across derivational processes. It should be remembered, though, that quantity conservation laws in physics have helped in the understanding of particle families, and predicted new particles that were later on discovered. 24 These are argument clitics. In contrast, sentences such as (i) are possible: (i)

te me vas a resfriar you me go to get.a.cold

'You're going to get a cold on me.'

However, me is not an argument of resfriar 'get a cold', but is something more akin to an argument of the assertive predicate introducing the speaker's perspective. An analysis of this and related cases would take me too far afield.

201 25 This is welcome: the nominal carrying an uninterpretable feature must then raise to check the number feature in the determiner (see Longobardi (1994)). Had the feature in the nominal been interpretable, this raising would be unmotivated. We would then have to say that in Spanish the determiner has a strong feature for the nominal to check, unlike in English—which would lead to two rather different LFs for each language. Interestingly, Jairo Nunes points out that several variants of Brazilian Portuguese overtly encode number only in determiners. 26 Nunes & Thompson (forthcoming) modify the notion 'dominates' so as to have it hold of features. In current class lectures (Fall, 1995). Chomsky abandons the concept of 'checking domain' altogether in favor of a theory of sub-labels. This is in the spirit of everything I have to say here, where it is crucially features, and not categories, that matter for various syntactic purposes. See also Nunes (1995) for much related discussion. 27 See Halle and Marantz (1994:129 and ff.) for a recent treatment of why forms like *oxens are not attested. This case is slightly different from the one I am discussing, *lionses, in that the latter involves two obviously identical plurals. It may be significant that infants do produce *oxens but never * lionses. 28 This idea was suggested in Uriagereka (1988:54). 29 That is, while the grammar codes the presence of a context variable (in essence, [+s] is a contextual feature), it does not assign a value to it (I or II), any more than it assigns values to other context variables. 30 These were noted in Uriagereka (1994, section 4). 31 As Viola Miglio points out, (25d) contrasts with the perfect Italian (i): (i)

Qui glielo si invia. 'Here one sends it to one's relatives.'

Thus, (25d) cannot be out for semantic reasons. In contrast, (ii) (provided by Jairo Nunes to illustrate a phenomenon which Eduardo Raposo also notes) indicates that the impossibility is not merely phonological either. Thus, while sentences involving the sequence are impossible in Portuguese, (ii) is perfect: (ii)

Se se morrer de amor... 'If one were to die of love...' if se (impersonal) would.die of love

Crucially, though, the first se here is not a pronoun, but a complementizer. 32 I am not implying with this that se has a [+s] feature; see below. 33 I am abstracting away from the exact status of (29a). What follows is a version of an analysis I attempted in (1988), but was not able to put together. It owes much to the seminar on binding and Case taught by Luigi Burzio at College Park. Although I do not follow, specifically, his approach to these matters, they have greatly influenced my way of looking at the problem. See Burzio (1994), (to appear). 34 That is, the direct object features do not directly move to v, particularly because in many languages involving V movement, this would have to imply incorporation onto a trace, plausibly barred by Chomsky under the view that chains are integral objects whose parts cannot be transformationally targeted. 35 In the version of the theory being explored by Chomsky in class lectures (Fall 1995), only heads are targetted for featural movement, specifiers being involved as a morphological side-effect. Then the checking domain reduces to the sub-labels of a head, the set of dependents of this head which associate via adjunction. Everything else said here remains unchanged, provided that the relevant domain of checking (technically, not a 'checking domain') is a set. 36 Mathematically, this does not go into orders of complexity different from those involved in standard features; matrices are needed either way. 37 The intuition is to relate local obviation to switch reference phenomena, of the sort extensively studied by Finer (1985). 38 This parameter is otherwise extremely hard to motivate, if Case is an uninterpretable feature pertaining, ultimately, to the covert component.

202

Uriagereka: Formal and Substantive

Elegance

39 T h e Mojave example in (i) (attributed by Lasnik (1990) to Langdon and Muro (1979)) suggest that this view is correct, in light of what is said on fn. 36: (i)

40

41 42

43 44

45 46

47 48

?inyecpap ?- A kxi:e-m Judy-c saiyi:-k I.sg potato I-peel-D/? Judy-subj fry-Tense ' A f t e r I peeled the potatoes, Judy fried them'

A s Lasnik observes: the whole switch reference system is still exhibited even with I and II pronouns. While this is surprising from a semantic point of view, it is natural from the purely formal perspective that I have just discussed. Regardless of this, the issue is moot if only D features make it to the same checking domain of T that the formal features of the pronoun do. This is particularly so if we assume that names, just as any other arguments, are headed by D, which is what gets to be in the checking domain of T. See Longobardi (1994), w h o builds on the essentials of Higginbotham (1988). T h e Danish data are courtesy of Sten Vikner, to whom I am indebted for an insightful discussion of these issues. See for a full presentation Vikner (1985). These ideas go back to Burge (1973), who took name rigidity to follow from an implicit demonstrative. Higginbotham (1988) shows that it is best to capture Burge's insight in terms of second-order context variables, as in the text. Why this should be so is, in and of itself, interesting, but I have nothing deep to say about it, other than it fits well with the rest of the system. See Cole and Sung (1994) for this sort of analysis. Uriagereka (1988:chapter 4) presented an analysis along these lines as well, with empty operator movement to Infl = Tense. This specific analysis is more in the spirit of what I have to say immediately below, since I do not think it is either sig or selv that moves. For some reason that I do not understand the adverbial mismo is obligatory. Templatic conditions of this sort are known to be relevant in various areas of morphology, across languages. Schematically, for Romance [+s] tends to come before [-s] (notorious reversals exist in Aragonese and Old Leonese); in turn, the unspecified [s] clitic (se) is a bit of a wild card. In Spanish, for instance, it comes first, before strong and weak clitics. In Italian, in contrast, it comes after weak clitics (see fn.33), but before locative clitics (Wanner (1987)). In Friulian, it comes as a verbal prefix (see Kayne (1991:664) for an analysis, and in archaic Italian, as a verbal suffix (Kayne (1991:663)). For discussion on how this affects the Case system see Raposo and Uriagereka (forthcoming), where it is argued that structures involving se may involve Case reversal situations, as expected. N o n e of the arguments that C h o m s k y gives for the Thematic Criterion carry through. For instance, it is said that without a Thematic criterion, (ia) should outrank (ib), by involving one transformation less: (i) a.

[John T [v [used Bill]]]

b. [John T[ t v [used Bill]]]

However, at the point of moving John, the sentences involve two different partial numerations, and are thus not even comparable for optimality purposes. Chomsky also wants to prevent (ii) in Thematic terms: (ii)

I believe [t to be a great man]

But as John Frampton (p.c.) points out, it is not obvious how a great man receives Case if the sort of believe that allows raising (selecting for the relevant sort of infinitival) is essentially unaccusative. 49 These same mechanics can be extended to (expletive, argument) pairs, without needing to stipulate that the former are morphemically related to the latter; all that matters is that the associate's features end up in the same checking-domain-set as the expletive features, as argued in Chomsky (1995). 50 In (47b), I am assuming a simplified version of Larson's (1988) analysis.

Uriagereka: Formal and Substantive Elegance 51 About this reading, I have nothing to add to what is said in Raposo and Uriagereka (forthcoming). I should note, however, that our interpretation of indefinite se creates an apparent problem, since the interpretation of se in dative sites (47c) is not necessarily indefinite. I suspect this relates to another fact about dative clitics which is discussed in Uriagereka (1994): their double can be definite or indefinite, something which is peculiar (the double of accusative clitics cannot be indefinite). Arguably, then, the interpretation of dative clitics is simply unspecified for definiteness. 52 Reinhart's proposal is in many respects rather different in spirit from Chomsky's. She believes that 'interface economy. . .determines the shape of the numeration:. . .it is at this stage of choosing the "stone blocks" that speakers pay attention to what it is they want to say.' (p.49) In contrast, Chomsky asserts that 'there is. . . no meaningful question as to why one numeration is formed rather than another. . . That would be like asking that a theory of some formal operation on integers—say, addition—explain why some integers are added together rather than others. . . Or that a theory of the mechanisms of vision or motor coordination explain why someone chooses to look at a sunset or reach for a banana. The problem of choice of action is real, and largely mysterious, but does not arise within the narrow study of mechanisms.' (p.237).

References Bouchard, D. 1984. On the Content of Empty Categories. Dordrecht: Foris. Bresnan, J. 1971. Sentence Stress and Syntactic Transformations. Language 47: 257-281. Burge, T. 1973. Reference and Proper Names. Journal of Philosophy 71: 205-223. Burzio, L. 1994, Anatomy of a Generalization. Ms., Johns Hopkins University. Burzio, L. To appear. The Role of the Antecedent in Anaphoric Relations. In Current Issues in Comparative Grammar, ed. R. Freidin. Dordrecht: Kluwer. Chomsky, N. .1964. Current Issues in Syntactic Theory. The Hague: Mouton. Chomsky, N. 1981. Lectures on Government and Binding. Dordrecht: Foris. Chomsky, N. 1986. Knowledge of Language. New York: Praeger. Chomsky, N. 1995. The Minimalist Program. Cambridge, Mass.: MIT Press. Cinque, G. 1993. A Null Theory of Phrase and Compound Stress. Linguistic Inquiry 24.2: 239-297. Cole, P., and L. Sung. 1994. Head Movement and Long-Distance Reflexives. Linguistic Inquiry 25.3: 355406. Corver, N., and D. Delfitto. 1993. Feature Asymmetry and the Nature of Pronoun Movement. Paper presented at the GLOW Colloquium, Lund, 1993. Epstein, S. 1995. Un-Principled Syntax and the Derivation of Syntactic Relations. Ms., Harvard. Finer, D. 1985. The Syntax of Switch Reference. Linguistic Inquiry 16.1: 35-55. Fukui, N. To appear. On The Nature of Economy in Language. Ms., U.C. Irvine. To appear in Cognitive Studies. Garcia Bellido, A. 1994. Towards a Genetic Grammar. Paper presented at the Real Academia de Ciencias Exactas, Fisicas, y Naturales, Madrid. Gould, S. 1991. Exaptation: A Crucial Tool for Evolutionary Psychology. Journal of Social Issues 47.3: 43-65. Haken, H. 1983. Synergetics. An Introduction. Berlin: Springer. Hale, K., and S.J. Keyser. 1994. On Argument Structure and the Lexical Expression of Syntactic Relations. In The View from Building 20, ed. K. Hale and S.J. Keyser, 53-109. Cambridge, Mass.: M I T Press. Halle, M., and A. Marantz. 1994. Distributed Morphology and the Pieces of Inflection. In The View from Building 20, ed. K. Hale and S.J. Keyser, 116-176. Cambridge, Mass.: MIT Press. Higginbotham, J. 1988. Contexts, Models, and Meanings. In Mental Representations. The Interface between Language and Reality, ed. R. Kempson. Cambridge: Cambridge University Press. Hoffman, J. 1995. Syntactic andParatactic Word-order Effects. Ph.D. thesis, Maryland.

204

Uriagereka: Formal and Substantive Elegance

Hornstein N., S. Rosen, and J. Uriagereka. 1995. Integral Predication. To appear in Proceedings of WCCFL XIV. Huang, J. 1982. Logical Relations in Chinese and The Theory of Grammar. Ph.D.thesis, MIT, Cambridge, Mass. Jackendoff, R. 1972. Semantic Interpretation in Generative Grammar. Cambridge, Mass.: M I T Press. Kayne, R. 1984. Connectedness and Binary Branching. Dordrecht: Foris. Kayne, R. 1991. Romance Clitics, Verb Movement, and PRO. Linguistic Inquiry 22.4: 647-686. Kayne, R. 1994. The Antisymmetry of Syntax. Cambridge, Mass.: MIT Press. Langdon, M., and P. Muro. 1979. Subjects and Switch Reference in Yuman. Folia Linguistica 13. Larson, R. 1988. On the Double Object Construction. Linguistic Inquiry 19.3: 335-391. Lasnik, H. 1972. Analyses of Negation in English. Ph.D. thesis, MIT, Cambridge, Mass. Lasnik, H. 1976. Remarks on Coreference. Linguistic Analysis 2: 1-22. Lasnik, H. 1990. Pronouns and Non-coreference. Paper presented at the Princeton Conference on Linguistic and Philosophical Approaches to Anaphora. Lebeaux, D. 1983. A Distributional Difference Between Reciprocals and Reflexives. Linguistic Inquiry 14.4: 723-730. Lebeaux, D. 1991. Relative Clauses, Licensing, and the Nature of the Derivation. In Perspectives on Phrase Structure (Syntax and Semantics 25), ed. S. Rothstein, 209-239. San Diego: Academic Press. Lightfoot, D. 1995. The Evolution of Language: Adaptationism or The Spandrels of San Marcos? Paper presented at Developments in Evolutionary Biology, Instituto di Arte e Scienzia, Venice. Longobardi, G. 1994. Reference and Proper Names. Linguistic Inquiry 25.4: 609-666. May, R. 1977. The Grammar of Quantification. Ph.D. thesis, MIT, Cambridge, Mass. Meinzer, K. 1994. Thinking in Complexity. Berlin: Springer. Muromatsu, K. 1995. The Classifier as a Primitive: Individuation, Referability, and Argumenthood. Paper presented at GLOW, Tromsö. Nunes, J. 1995. The Copy Theory of Movement and Linearization of Chains in the Minimalist Program. Ph.D. thesis, University of Maryland. Nunes, J., and E. Thompson. Forthcoming. Formal Appendix to Uriagereka. [Forthcoming]. Perlmutter, D. 1971. Deep and Surface Constraints in Syntax. New York: Holt, Rinehart, and Winston. Pica, P. 1987. On the Nature of the Reflexivization Cycle. In Proceedings of NELS 17, Vol. 2, 483-499. Postal, P. 1966. On so-called "Pronouns" in English. In Report of the 17th Annual Round Table Meeting on Linguistics and Language Studies, ed. F.P. Dineen. Georgetown University Press. Raposo, E., and J. Uriagereka. Forthcoming. Indefinite se. To appear in NLLT. Reinhart, T. 1995. Interface Strategies. OTS Working Papers, Utrecht. Reinhart, T., and E. Reuland. 1993. Reflexivity. Linguistic Inquiry 24: 657-720. Rizzi, L. 1986. On Chain Formation. In The Grammar of Pronominal Clitics (Syntax and Semantics 19). ed. H. B o r e r , 65-95. San Diego: Academic Press. Safir, K. 1992. Implied Non-coreference and The Pattern of Anaphora. Linguistics and Philosophy 15:152. Stevens, Ch. 1995. The Six Core Theories of Modern Physics. Cambridge, Mass.: MIT Press. Szabolcsi, A. 1981. The Possessor that Ran Away from Home. The Linguistic Review 3: 89-102. Thompson, D. 1945. On Growth and Form. Re-edited in 1992. Cambridge. Uriagereka, J. 1988. On Government. Ph.D. Thesis, University of Connecticut. Uriagereka, J. 1994. Aspects of Clitic Placement in Western Romance. Linguistic Inquiry 25.1: 79-123. Uriagereka, J. Forthcoming. Rhyme & Reason, a Minimalist Dialogue. Cambridge, Mass.: M I T Press. Uriagereka, J. To appear. Multiple Spell Out. In ed. M. Browning (TTBA). Vikner, S. 1985. Parameters of Binder and of Binding Category in Danish. Working Papers in Scandinavian Syntax 23, University of Trondheim. Wanner, D. 1987. The Development of Romance Clitics from Latin to Old Romance. Berlin: Mouton De Gruyter. Wasow, T. 1972. Anaphoric Relations in English. Ph.D. thesis, MIT, Cambridge, Mass.

Economy in Syntax is Projective Economy

Hubert Haider 1

Claims

The following claims will be discussed and defended: 1. UG is a complex cognitive capacity of symbol processing recruited for representational, that is, projective robustness. It enables the learner to assemble the knowledge system called core grammar. The system of representations and principles of core grammar is the recursive solution of the projection problem for a given natural language L, that is, the function from one-dimensional expressions (= a string of terminals of L) to an at least two-dimensional expression (= the grammatical structure of the string). The core grammar determines the projection of a syntactic structure onto a given string of terminals of L. The solution of the projection problem is T H E criterion of empirical adequacy for grammar theory. The solution is the algorithm that maps strings of L onto wellformed structures. 2. In general, a projective system with economy principles promises to be empirically more adequate than a derivational system with its economy principles. A projective system is a model which conceives of the grammar as an algorithm that projects a grammatically wellformed structure onto a string of terminals. In other words, a successful projective system is the algorithm for the projection problem. The constraints of grammar are constraints on representations. A derivational system, as for instance the Minimalist Program, models core grammar as a derivation algorithm: Constraints of grammar are viewed as the reflex of constraints on derivational operations (Chomsky 1994:391). 3. The economy principles of the Minimalist Program (fewest steps, shortest move) have to be enriched with axiomatic constraints to cut down the redundant derivational power to the empirically adequate limits. A projective system offers a direct approach without resort to restrictions that are alien to the principal design of the system of grammar. 4. Optionality—despite its ubiquity, a stumbling-block for derivational economy—is consistent with a projective system if it is a consequence of underspecification.

2

Derivational or Representational Economy?

Concepts of economy rest conceptually on a notion of limited resources and a selection device. If the selection device selects the most resource saving options out of a pool of

206

Haider: Projective

Economy

alternative options, the selected set is justly characterized in terms of economy. The selection device is a bookkeeper that compares the costs of alternative options and chooses the most cost saving one. What is the limited and therefore costly resource in grammar? According to Chomsky (1992:6), linguistic expressions are to be characterized as optimal realizations of the PF-LF interface conditions, where optimality is determined by UG as economy conditions on derivations. To be optimal, a derivation must satisfy all the principles of derivational economy. One such principle 1 selects the shortest derivation: The number of steps in a derivation is subject to a principle of economy, namely the requirement that shorter derivations are preferred over less short ones (cf. Chomsky 1991:426). If this is indeed an economy effect, it cannot be—and in fact is not—attributed to a limited computation capacity for the length of derivations. If there were a limit, a speaker should be unable to master a sentence that involves more than a given limiting number of steps in the derivation. This limiting number would exceed by far the number of steps necessary for the more or less economical derivation of a simple sentence. Economy considerations of the kind used in the Minimalist Program do of course not distinguish between simple and complex sentences. So there is no reason to seek for an absolute limit in number of steps. If the generative capacity is not strictly limited in its resources, that is, in the number of applications of its generative devices, it does not embody the selection device responsible for the economy effects. In this case, the economy principle "minimize the number of derivational steps" is axiomatic, and therefore unexplained, and unexpected. The Minimalist Program takes economy to be an element of UG and illustrates its effects with reference to grammatical constraints in languages like English, that is, with reference to a given core grammar of a language. UG, in this view, determines the core grammar of a given language L as the derivational minimum in the set of alternative grammars for the language L. If the UG-facilitated choice of a core grammar for L is determined by economy principles on the derivational complexity, this does not necessarily entail that the core grammar of L contains economy principles. UG simply selects the most economic core grammar. Once chosen and fixed, the derivational machinery will be at one's disposal. It is important to be more precise on this question. UG enables the learner to acquire a knowledge system referred to as core grammar. The core grammar is the knowledge basis for language perception (viz. parsing) and production (viz. generation). If we conceive of the core grammar as the cognitively compiled system of grammar, and language acquisition as the process of compiling the grammar, the core grammar need not contain economy principles at all. The core grammar of L is the system of principles and parameters that amounts to the most economical grammar modulo UG for the given language L. If, however, the Minimalist Program envisages economy principles as constitutive principles of the core grammar, this is a different claim with different empirical consequences. A core grammar with economy principles is the grammar whose principal wellformedness constraint should be the derivational minimum for L consistent with the principles of UG: the grammatical derivation of an expression E should be the most economical derivation in the reference set of E. In the MP, economy is a technical notion (cf. Chomsky 1994:432). It is a function on triples consisting of a numeration (= a set of pairs , where 1 is a lexical item and n

Haider: Projective

Economy

207

is a counting index; cf. Chomsky 1994:393), a stage of the derivation, and a set of alternative convergent continuations. Economy picks the derivationally shortest continuation. The expression E is the spell-out of the winning derivation. E is wellformed if there is a derivation D that is the convergent minimum for the numeration N of E, with E as its spellout. Let us consider an illustrative case. If UG were to guarantee that the core grammar of English is but the absolute derivational minimum for PF-LF mapping, (la) should be the only grammatical sentence out of (1). All other sentences in (1) involve more movement or structure building steps before spell out. An absolute economical system should prefer a grammar that characterizes (la) as grammatical and (lc) as ungrammmatical, because (la) converges at the same LF as (lc,d) with less lexical choices, less structure building, and less movement. Obviously, English is not economical in the absolute sense. (1) a. b. c. d.

Out of *Out of Out of Which

which barn raced a horse? which barn did race a horse? which barn did there race a horse? barn did a horse race out of?

But English is not economical in the relative, that is, numeration-based sense, either. If do-insertion is a syntactic device (cf. Chomsky 1991:427), the derivation of (la) is more economical than (Id). Since (Id) is grammatical in English, the economy principles must be relativized to a given choice of lexical items, that is, to the numeration: If do and there are elements of the lexical choice, (lc,d) are minimal convergent derivations, if not, it is (la). But why should do be chosen at all? Obviously, the choice of do in the derivation (lb) does not result in a convergent derivation. In derivational terms, it is difficult to factor out why (Id) is, while (lb) is not, admitted under the regime of derivational economy. It is sufficient for the present purpose to conclude that the specific notion of derivational economy embodied in the Minimalist Program must be a property of the core grammar. Hence, derivational economy is a primary factor of grammatical wellformedness. If this is so, it is not immediately evident what the limited resource could be that enforces strict economy. Let us change the perspective now. The Minimalist Program, and in fact much of the generative theory of grammar, is framed in the perspective of structure generation, that is, in a speaker oriented perspective. But generation is not the cognitively predominant aspect of the human language faculty. The receptive processing capacity, that is, the hearer oriented perspective is the primary aspect, for several reasons: First, crossindividually robust processability is the precondition for the emergence of grammar in an evolutionary perspective. Second, the acquisition of grammar depends on processed input. Third, processability is subject to a strictly limited resource, namely processing time. UG is a system of cognitive routines recruited for language processing in the human species. Language processing refers to the computation of the data structures of natural languages in the process of language acquisition as well as in the interaction between grammar and the systems that control actual language usage. If UG is 'wired-in', that is,

208

Haider: Projective

Economy

a neurobiologically determined capacity, its processing potential can be described as a computation capacity for specific data structures and relations defined on these structures. The core grammar of a language is a knowledge system that consists of the UG-compatible, that is, UG-processable, and language-specific structures and relations. In terms of a cognitive processing capacity, the grammar provides the interface for mapping one-dimensionally structured PF-expressions on more-dimensionally structured LF-representations. LF-representations are hierarchically organized structures, hence at least two-dimensional. The grammar of a language has to provide suitable data structures for the effective processing of the mapping functions. The grammar of a language is the knowledge base for the cognitive computation capacity tailored to the transmission of box-in-box structures through a serial interface. Its criterion of success in the evolutionary perspective is the crossindividual stability of the processing function: The structures admitted by a given grammar must be processable, and each individual that acquires the grammar must arrive at the same structure for a given expression. A crossindividually stable and effective generation capacity for syntactic structures would be useless without a crossindividually stable and effective receptive processing capacity. The corresponding grammar could not be acquired. The receptive processing capacity is the filter. What cannot be decoded cannot be used in the encoding system. From a cognitive point of view, the derivational approach focuses on the generation capacity. A grammatical expression is characterized as the result of a convergent derivation. A projective grammar focuses on the receptive processability. A linguistic expression E is grammatical if a convergent structure can be projected on the string of terminals of E. In sum, primacy of the hearer-oriented perspective over the speaker-oriented one favors a projective approach. Considerations on the logical problem of acquisition lead to the same result. If the child is unable to process a given syntactic structure type in the input, it will be unable to detect the relevant parameter for the core grammar system. Again, the receptive processing capacity is the filter. Therefore, the adequate model of grammar is a model based on the receptive capacities. Finally, and most importantly, the receptive capacity is subject to an inherently limited resource, namely time. A given utterance must be processable within a limited time span because the subsequent utterance will follow, and so on. Obviously, it is only the receptive capacity that is subject to a limited resource, not the generation capacity. Coding is in principle free to consume as much time as necessary. From this point of view, it is evident, that UG should favor structures that lend themselves to effective and rapid processing given the cognitive capacities for string-to-structure mapping. Decoding projects a hierarchical structure onto an array of elementary expressions. Since this is the primary aspect of the human language processing capacity, the theory of grammar should capture this aspect directly. Modelling the grammar in terms of a generative capacity suggests a decoding-by-encoding model: Competence is modelled as a derivational, generative capacity. An expression E is characterized as wellformed by computing a convergent derivation whose spellout is E. The adequate model, however, seems to be an encoding-by-decoding model: An expression E is wellfomed if there is a convergent projection that covers E. Actual generation is monitored and controlled by a projection system. In a sense, the speaker is the first hearer.

Haider: Projective

Economy

209

Decoding and encoding, or generation, refer in this context to the flow of information between the knowledge system 'core grammar of L', that is, competence, and the systems that control actual language use, that is, performance. The problem addressed in the Chomskyan question "What is the structure of the grammar?" is directly connected with the question "How is the grammar put to use?". The grammar is to provide optimal data structures for actual usage. This implies that UG is the system of cognitive routines that guarantee this result, that is, grammars that determine optimal data structures for actual usage. This is so for trivial reasons: UG (and consequently core grammar) is the ensemble of symbol computation capacities that proves successful for this purpose. The grammars of natural languages comprise just the class of data structures that are easily processed by the wired-in processing routines. The assumption that structures of grammar could persist in natural languages if they were hardly tractable is highly implausible. To put it in a simple clause, grammar and usage systems must conspire. At this moment, this is not obvious for derivational systems (cf. Chomsky 1991:448). Effective and fast processing is the filter criterion. Cognitive routines and data structures determine each other: data structures determined by the grammar and a fortiori by the cognitive routines behind core grammar that support processing, that is, by UG, meet at the same criterion, namely effective and fast processability, both in acquisition and usage. It ought to be a tautology in grammar theory that UG does not sabotage processing and language processing cannot ignore UG. The primacy of receptive processing capacities calls for a representational rather than a derivational characterization of syntactic structures. Despite the fact that recent theorizing centers exclusively around derivational concepts," there are good reasons to conclude that a cognitively adequate model of grammar is not derivational. The derivational approach is constructive in terms of steps in a derivation. Wellformedness is characterized as a property of the derivation, not of the representation. A given string S is a wellformed expression of the language L, if there is a wellformed derivation, that is, construction process, whose output is S. The cognitively adequate model of grammar is a system that determines for a given string the minimally convergent syntactic structure of the string of elements. The string is mapped onto its structural representation. Let us call this a projective system. The constructive steps in such a system are structure projection and chain forming operations, that is, coindexation of projected empty categories and their structural antecedents. Economy in this respect is representational economy: Don't project more structure onto a string than is required for a convergent structure assignment. A projective grammar must be economical just because the processes that are supported by the grammar, that is, the parsing processes, are subject to a limited resource, namely limited processing time. It is not an absolute limit that matters but it is the relative uniformity of the timespan across individuals that sets the limit.

3

Locality and the Cycle

In a strictly derivational system, locality constraints have to be framed in derivational terms. Local movement is dictated by the axiom of shortest move (Chomsky 1992:21).

210

Local antecedent gap relations are guaranteed by cyclic application. It will be argued in this section that locality effects of this type are inherent properties of a projective system. A derivational system, however, does not provide an inherent source for these effects. In fact, locality constraints have to be superimposed in order to cut down the inherent power of the derivational system. Locality constraints like the cycle are unexplained axioms in a derivational model. First of all, the axiom of shortest move is in conflict with the economy principle that favors derivations with fewer steps. A derivation with long movement will have fewer steps than a competing derivation with shorter steps. Chomsky (1992:21) introduces a representational solution for this apparent dilemma. He assumes that the basic transformational operation is not move-alpha but form-chain, and that it applies in a single step. What this amounts to is chain formation after movement. Of course, this requires a syntactic level of representation for the output of movement. In G&B terms, this is the level of S-structure. In the MP, there is only one syntactic level of representation, namely LF. Properties that relate to a genuine S-structure cannot be expressed in the MP anymore, because the point of spell-out does not define a level. Moreover, LF-movement obliterates pre LFmovement, so S-structure conditions on chain formation cannot be formulated in the MP (cf. Chomsky 1994:391). In section 5.2 it will be argued that there are S-structure constraints. In a projective grammar, S-structure is the only syntactic level of representation. It is the interface level between phonetic form (PF) and semantic form (SF), that is, the model theoretic representation. Grammar defines the syntactic structure (S-structure). This is mapped both on PF and on SF by phonological and semantic principles, respectively. Chain formation is a projective operation: In a given structural projection, chain links are coindexed. It seems that in the Minimalist Program, the assumption of chain formation independent of movement renders movement before spell-out redundant. The system of structure-building is in principle free to build structures with empty categories. Chain formation introduces the antecedent-gap dependencies necessary for LF-computations. Features are checked on the representation at which chain formation applies. In a projective model, chain formation is the counterpart of movement in a derivational system. A syntactic structure is projected onto a given string according to principles of projection. Obligatory base positions as well as functional positions have to be projected. If there is no terminal element for a base position, an empty category is projected. This empty category must be fully interpretable. So it is either an empty pronominal element or a category that has to be bound by an antecedent. Chain formation applied to (2b), for instance, introduces the antecedent-gap coindexation (2c). (2) a. b. c.

He seems to be likely to win. [He seems [e [to be likely [e [to [e win ]]]]]] [He¡ seems [e¡ [to be likely [e¡[to [e¡ win ]]]]]]

The projective system ends up with the same result but without a movement operation. Structure-building or structure-projection plus chain-formation is sufficient. If movement is considered to be an essential derivational part of the system there must

Haider: Projective

Economy

exist strong arguments in favor of it. This does not seem to be the case, however. Strong evidence for a movement approach would come from shunting operations that amount to non-cyclic application of movement. But this is exactly what we do not find in natural languages. Movement systems, unlike projective representational systems, should in principle invite non-cyclic shunting operations: If there is a single pathway for movement (e.g. only a single escape hatch for non-local movement), the array of elements after movement should reflect the order of application of movement operations. Assume for the sake of illustration that (3a) is well-formed. In this case the steps in the derivation can be read off from the resulting landing sites. First, what must be moved to the inner and then to the outer Spec,CP position. Subsequently, how is moved to the inner Spec position. Consequently, the movement system should prove more flexible than the projective system because the freedom of choice between alternative movement operations should lead to alternative movement results. But contrary to this expectation, the movement system has to be cut down to the rigidity of a representational system. This is the effect of the constraint of cyclic application, as the following well-known example of a w/z-island violation illustrates. (3) a. * What! did he wonder [howj I fixed e e j ] ? b. Move-alpha 1: Whatj did he wonder [I fixed e{ how]? c. Move-alpha 2: Whatj did he wonder [howj I fixed e, e j ? Chomsky (1992:32) rules out the non-cyclic step in the derivation of (3c) subsequent to (3b) by invoking an extension requirement: Substitution operations always extend their targets (cf. Kitahara (1995:61) for a detailed reconstruction). This requirement is empirically adequate but unexplained and against the spirit of a derivational model. It is exactly this kind of choice between first moving how or first moving what that is available in a derivational approach. The extension requirement is necessary for eliminating an inherent possibility of a derivational system. The brute fact that (3a) is ungrammatical forces the elimination of the noncyclic derivation. A truly derivational grammar should offer the shunting operation that would lead to (3a): Short movement should not block long distance movement, because long distance movement should be applicable before short movement. In addition, the extension requirement is a projective, not a derivational requirement: Each constructive step should extend the already projected structure. Similar considerations apply to subject island violations and extractions out of adjoined positions or spec-positions. The non-cyclic derivation leads to unacceptable results and must be eliminated. Again, this empirical fact is unexpected in the derivational system. The fact that the constraint of cyclic application of move-alpha must be added as an axiom to the system proves that the derivational system provides the inherent possibility of non-cyclic derivations. (4) a. * WhOj did you say that [pictures of e¡]j were stolen e} ? b. *I wonder who, [pictures of e¡]j he bought e-sl

Haider: Projective

212

Economy

For the sentences in (4), there is a noncyclic derivation which does not violate any constraint. First, the w/z-element is moved to the root Spec and subsequently the phrase that contains the trace of the moved w/j-element is moved. Browning (1991:545) proposed a representational account. A subjacency requirement for chain links on Sstructure rules out chain links that are not O-subjacent. Kitahara (1995:51) remarks that Browning's account cannot be adopted in the Minimalist Program because of the elimination of S-structure as a level of representation in the Minimalist Program. He tries to capture the cycle by means of redefining the basic unit of derivation. He (1995:58,71) subsumes three operations (structure building, movement as substitution, and deletion of the copy) as suboperations of a general operation target alpha. The application of these operations count as one rule-application as long as they apply subsequently and as two or three applications otherwise. In the cyclic derivation of (3a), target alpha builds the embedded CP and substitutes its spec by how in one application. In the non-cyclic derivation one more application is necessary. First, the embedded CP-node is built, then the object is moved, and then the spec of the embedded CP is substituted. The movement of the object intervenes between two suboperations of target alpha with the embedded C' as the target. Hence this counts as two distinct applications. Therefore, the cyclic application is chosen as the shorter derivation. It is deviant, of course, because of the familiar w/z-island violation. Kitahara's attempt to eliminate the extension requirement that guarantees cyclic rule application by subsuming its effects under an independently motivated economy principle is questionable, both on conceptual as well as empirical grounds. First, if target alpha should be more than a reduction-by-definition, it must be shown to be a macro-routine of UG. Once this macro-routine is called up, its subroutines become available. The economy measure only counts the calling of the macro-routine. It is not self-evident, however, that structure-building with or without subsequent movement of an element to the newly created position and deletion of copies constitute a single complex routine of the derivation system. If target alpha is a complex routine, there should be cases of derivations in which one application of target alpha may use its full potential, that is, all its subroutines. This will never happen, however, because the definition contains an inherent disjunction: (5) a. b. c.

Target alpha (Kitahara 1995:75) Build a new phrase structure X immediately dominating the category alpha Substitute a category Y for a newly created empty 0 external to alpha Delete alpha

Application of (5a) amounts to structure building without movement. The combination of (5a) and (5b) yields structure building plus movement. (5c) cannot be combined with (5a) or (5b), however: If alpha is the category to be deleted by (5c), this category normally is not identical with the category that is the target of structure building by (5a,b).' Hence, (5c) is an autonomous step, independent of (5a,b), and should not be treated as part of (5). Deletion would be an integral part of (5) if (5c) were replaced by (6). In this case, structure building plus movement plus deletion of the copy at the trace position would be a conceptually coherent macro-routine.

Haider: Projective (6)

Economy

213

Delete the copy of Y

If (5c) is replaced by (6) Kitahara's account is in conflict with empirical evidence. In a derivational system with a spell-out operation as the point of demarcation between movement with PF effects and LF-movement without PF effects, optional movement is a potential problem. Kitahara (1995:72-74) claims that target alpha correctly accounts for the optionality of object shift in Icelandic (Thrainsson 1993). (7) a.

b.

Risanir [ a t U j [ekki [