215 44 9MB
English Pages 410 [416] Year 2000
Competition in Syntax
W DE G
Studies in Generative Grammar 49
Editors
Harry van der Hulst Jan Köster Henk van Riemsdijk
Mouton de Gruyter Berlin · New York
Competition in Syntax
Edited by
Gereon Müller Wolfgang Sternefeld
Mouton de Gruyter Berlin · New York
2001
M o u t o n de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter G m b H & Co. K.G, Berlin.
The series Studies in Generative G r a m m a r was formerly published by Foris Publications Holland.
© Printed on acid-free paper which falls within the guidelines of the A N S I to ensure permanence and durability.
Die Deutsche Bibliothek — Cataloging-in-Publication
Data
Competition in syntax / ed. by Gereon Müller ; Wolfgang Sternefeld. Berlin ; New York : Mouton de Gruyter, 2001 (Studies in generative grammar ; 49) ISBN 3-11-016945-2
© Copyright 2000 by Walter de Gruyter G m b H & Co. K G , D-10785 Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Printing: Werner Hildebrand, Berlin. Binding: Lüderitz & Bauer, Berlin. Printed in Germany.
Contents The Rise of Competition in Syntax: A Synopsis Gereon Müller & Wolfgang Sternefeld Let's Phrase It! Focus, Word Order, and Prosodie Phrasing in German Double Object Constructions Daniel Biiring
1
69
Remarks on the Economy of Pronunciation Gisbert Fanselow & Damir Cavar
107
On the Integration of Cumulative Effects into Optimality Theory Silke Fischer
151
Quantifier Scope in German and Cyclic Optimization Fabian Heck
175
Experimental Evidence for Constraint Competition in Gapping Constructions Frank Keller
211
Word Order Variation: Competition or Co-Operation? Jürgen Lenerz
249
OT Accounts of Optionality: A Comparison of Global Ties and Neutralization Tanja Schmid
283
The Interpretation of Object Shift and Optimality Theory Sten Vikner
321
Case Conflict in German Free Relative Constructions: An Optimality Theoretic Treatment Ralf Vogel
341
The Optimal Linking of Arguments: The Case of English Psych Verbs Anja Wanner
377
Index of OT-Constraints
401
Index of Subjects
403
The Rise of Competition in Syntax: A Synopsis Gereon Müller & Wolfgang Sternefeld
1 Local vs. Competition-Based Approaches Syntactic theories differ with respect to how they determine the wellformedness or illformedness of a given sentence S, in a given language. One possibility is that the decision of whether S, is grammatical or not can be made by exclusively considering properties of S,·; properties of other sentences S j, S*, ...are irrelevant. Another possibility is that properties of other sentences S j, Sk,... do play a role in deciding whether S, is grammatical or not in addition to S, 's own properties. The first possibility, which we may call a local approach, can arguably be viewed as the standard one; this strategy is pursued in, e.g., most versions of government and binding theory (principles and parameters theory), head-driven phrase structure grammar, lexical-functional grammar (until recently), and in certain versions of the minimalist program. The second possibility presupposes a competition of sentences; hence, we will refer to it as a competition-based, approach. This strategy is the one that this book is about; it is chosen in certain versions of the minimalist program (in particular, in earlier manifestations), in theories that incorporate the Blocking Principle (the Elsewhere Condition), and, last but not least, in optimalitytheoretic syntax. In what follows, we will illustrate fundamental differences between and points of convergence among local and competition-based approaches by considering government and binding theory (section 2) and its development into the minimalist program (section 3), blocking syntax (section 4), and optimality-theoretic syntax (section 5).
2 Government and Binding Theory Chomsky's (1981, 1986a,b) theory of government and binding is a typical instance of a local approach. In this theory, syntactic objects are viewed as
2
Gereon Müller & Wolf gang Sternefeld
(D-structure, S-structure, LF) triples which are created by phrase-structure rules (X-bar theory) and the transformational rule Move a . Syntactic constraints can take various forms. First, they can be representational filters in the sense that they apply at one or more of the three levels D-structure, Sstructure and/or LF (cf. the principles of Ö-theory, the binding theory principles, or the Empty Category Principle (ECP)). Alternatively, constraints can be derivational, which implies that they do not apply at any specific level, but restrict the movement operation itself (cf. the Subjacency Condition). Finally, government and binding theory envisages the possibility that syntactic constraints can be global (in Lakoff's 1971 terminology). Global constraints are neither representational nor derivational; rather, they relate non-adjacent representations in a complex derivation. In the case at hand, the Projection Principle (see Chomsky 1981:38) is a global constraint that relates the three levels of representation by demanding that the subcategorization properties of lexical items be respected from one level to the next. Still, all these types of constraints, including the global one, can be checked by exclusively looking at a given syntactic object S, .1 Properties of other syntactic objects are irrelevant. Thus, all these types of constraints (representational, derivational, and global) can be viewed as local in this sense.
2.1 Representational Constraints To see this, consider first the effects of representational constraints like the binding principles A and B, which are given here in a simplified form. (l)a. b.
Principle A\ An anaphor is bound in its binding domain, Principle B: A pronominal is free in its binding domain.
By assuming that these principles apply at S-structure, we can derive that (2-a,c) (more precisely, the (D-structure, S-structure, LF) triples of which (2-a,c) are simplified S-structure representations) are grammatical, whereas (2-b) and (2-d) are ungrammatical, due to violations of Principle A and Principle B, respectively. Crucially, however, the determination of grammaticality of the four sentences proceeds on a local basis. Thus, the illformedness of (2-d) is not related to the wellformedness of (2-a), and the illformedness of
The Rise of Competition in Syntax
3
(2-b) is not related to the wellformedness of (2-c), even though the two strategies - pronominalization and reflexivization - seem to be in complementary distribution for the most part in English. (2) a. b. c. d.
Johni *Johni Johni *Johni
likes himselfi thinks that Mary likes himselfi thinks that Mary likes himj likes himi
Another representational constraint in government and binding theory is the ECP, which is assumed to apply to LF representations only (see Chomsky 1986a, Lasnik & Saito 1992). (3)
ECP ("Empty Category Principle"): Every trace must be marked [+y].
A trace is marked [+y] if it is properly (antecedent or lexically) governed, and [—y] if it is not properly governed. Once assigned, a /-feature is irreversible. By assumption, the complementizer that blocks antecedent-government of a subject trace. Consequently, ti is marked |—y] in (4-a), which exhibits a familiar complementizer-trace effect, and [+yj in (4-b). Even though these features are assigned at S-structure, the actual ECP violation in (4-a) takes place at LF. There is a reason why the ECP cannot be assumed to hold at S-structure already: If it did, (4-c) would also be ill formed, due to [—y |marking of the intermediate subject trace t',. Being an argument trace in an Α-bar position, this intermediate trace is freely deletable on the way to LF (an option that does not exist for ti in (4-a,b,c)). With it goes the [—y]-marking, and (4-c) is correctly predicted to be well formed despite a lack of proper government at S-structure - the ECP is respected at LF. (4) a. b. c.
*Whoi do you think [ C p t,([+yJ) that [i P t i ( t - y j ) will leave J] ? Whoi do you think [ C p t',(|+yj) — [IP ti([+yj) will leave |] ? Whoi do you think [CP t"([+y]) that Mary said Icp t',(|—y |) — |IP ti(L+yJ) will leave ]]] ?
It is again worth noting that the account of the illformedness of (4-a) in no way relies on the fact that (4-b) is well formed - the grammaticality status of the two sentences is determined on a local basis.
4
Gereon Müller & Wolf gang Sternefeld
2.2 Derivational Constraints Basically the same situation arises with constraints like the Subjacency Condition in (5). (5)
Subjacency Condition: Movement must not cross two bounding nodes.
The standard assumption in government and binding theory is that Subjacency is a derivational constraint on (overt) movement, rather than a representational constraint applying at S-structure. Theory-internal motivation for this assumption comes from the consideration of the data in (6) (cf. Sternefeld 1991). (6) a. b.
Whoi do [ip you think [cp t', that [IP Mary loves ti ]]] ? Whoi do [IP you believe [¡p John to be in love with ti ]] ?
IP is a bounding node for movement. The two wA-movement steps in (6-a) cross one IP each, and given that no other bounding node intervenes, the wellformedness of this sentence is compatible with both a derivational and a representational interpretation of the Subjacency Condition. The case is different in the exceptional Case marking context in (6-b). As it stands, ti is separated from its antecedent who \ by two IP bounding nodes at S-structure, since there is no intermediate trace tj. Thus, under a representational interpretation of the Subjacency Condition, we would in fact wrongly expect (6-b) to be ungrammatical. No such problem arises under a derivational interpretation, though. On this latter view, (6-b) is derived from a D-structure representation with a CP shell present in the embedded clause. VWz-movement can precede CP deletion (as required for exceptional Case marking of the embedded subject NP), and the Subjacency Condition is correctly predicted to be satisfied by the movement operation itself. This is shown in (7), which outlines the relevant part of the derivation of (6-b). (7) a. b. c. d.
— do [IP you believe [CP — C [ip John to be in love with whoi ]]] — do [IP you believe [cp whoi C [IP John to be in love with TI ]]] whoi do [IP you believe [cp tj C [IP John to be in love with ti ]]] whoi do [IP you believe [IP John to be in love with ti ]]
Among other phenomena, the Subjacency Condition derives Complex Noun Phrase Condition (CNPC) islands. The contrast between (8-a) and (8-b) follows, given that wA-movement crosses two bounding nodes in (8-b), but not
The Rise of Competition in Syntax 5 in (8-a). Again, the fact that the movement strategy seems to be in complementary distribution with a resumptive pronoun strategy is not theoretically reflected by relating the illformedness of resumptive pronouns as in (8-c) to the wellformedness of movement in (8-a), and the (relative) wellformedness of resumptive pronouns in (8-d) to the illformedness of movement in (8-b): 2 Independently of how exactly an account of the contrast in (8-c,d) looks like in government and binding theory, it seems that it must rely on a local constraint that is violated if a resumptive pronoun has an antecedent that is too close. 3 (8) a. b. c. d.
the man [cp who(m)i I saw ti ] *the man [CP who(m)i [IP I don't believe INP the claim fcp t', that anyone saw ti ]]]] *the man [cp who(m)i I saw himj ] ?the man [cp who(m)i [IP I don't believe [NP the claim [cp that anyone saw himi ]]]]
2.3 Global Constraints Finally, the Projection Principle can always be checked by looking at the properties of a given sentence, and does not necessitate a consideration of other sentences. The Projection Principle is given in a simplified form in (9). (9)
Projection Principle: a. b.
If a selects β in γ as a lexical property, then a selects β in γ at some level L( . If a selects β in γ at level L,, then a selects β in γ at level L,·.
To find out whether a given sentence S, respects the Projection Principle, we have to take into account the sentence's representations at D-structure, Sstructure, and LF. This way, the presence of traces of overt movement in argument positions can be enforced at S-structure and at LF, among other things. Clearly, such a global constraint is quite complex; however, it does not yet rely on the notion of competition - properties of other sentences (i.e., other (D-structure, S-structure, LF) triples) are irrelevant for determining whether a given sentence violates or respects the Projection Principle.
6
Gereon Müller & Wolfgang
Sternefeld
2.4 An Exception: The Avoid Pronoun Principle To sum up so far, government and binding theory has different types of constraints, with varying complexity, but all of them are local in the above sense; i.e., they do not involve competition. Interestingly, though, there is a notable exception: the Avoid Pronoun Principle of Chomsky (1981). The empty pronominal PRO and lexical pronouns come close to being in complementary distribution: As stated in the PRO-theorem, PRO is confined to positions that are ungoverned (e.g., the subject position of control infinitives). In contrast, overt pronouns typically show up in positions that are governed. This is so because overt pronouns must be assigned Case, and Case is normally assigned under government only. However, there is one position in which government and binding theory permits both an empty pronominal PRO (because the position is ungoverned) and an overt pronoun (because Case can be assigned without government by a special Case assignment rule). This is the subject position of English gerunds. Consider first the possibility of PRO in this position: (10) a. Johnj would much prefer [ PRO] going to the movie ] b. *Johni would much prefer [ PR02/ar¿> going to the movie ] As shown by the contrast between ( 10-a) and ( 10-b), PRO must be co-indexed with the matrix subject here; it cannot bear a different index or be interpreted arbitrarily. Obligatory control follows from the control rule in (11) (see Manzini 1983).4 (11)
Control
Rule:
If PRO is minimally dominated by a declarative clausal object a , then it must be controlled by an antecedent within the minimal CP dominating a. Next consider overt pronouns in English gerunds: (12) a. *Johni would much prefer [ hisi going to the movie ] b. Johni would much prefer [ his2 going to the movie ] c. Johni would much prefer [ hisi book ] (12-b) shows that an overt pronoun is possible in the subject position of English gerunds. However, an overt pronoun cannot be co-indexed with the matrix subject, in striking contrast to PRO; cf. (12-a). The illformedness of
The Rise of Competition in Syntax
7
(12-a) does not follow from any of the relevant local constraints in government and binding theory. In particular, (12-c) strongly suggests that principle Β of the binding theory is not violated in (12-a) (the pronoun his occupies SpecNP in both cases, according to Chomsky's 1981 assumptions), and the wellformedness of (12-b) proves that Case can be assigned to his in (12-a). In view of this situation, Chomsky (1981:65) suggests that the illformedness of (12-a) is not to be traced back to a violation of some local constraint. Rather, it should be related to the wellformedness of (10-a): The two sentences compete, and (12-a) is ungrammatical because (10-a) is grammatical. This idea is implemented by adopting the Avoid Pronoun Principle in (13). (13)
Avoid Pronoun Principle: Lexical pronouns are blocked by empty pronouns if possible.
This implies that the grammaticality of a sentence S, with an overt pronoun cannot be checked on a local basis anymore: To find out whether such a sentence S, is grammatical or not, a minimally different sentence S 7 has to be considered in which the overt pronoun is replaced by PRO. If S¡ is grammatical, S, is ungrammatical by the Avoid Pronoun Principle; this effect occurs in the case of (10-a) vs. (12-a). If, on the other hand, S¡ is ungrammatical (e.g., because PRO violates the PRO-theorem, or because PRO has an index that leads to a violation of the Control Rule), S, can be grammatical (provided that no local constraints of grammar are violated); this situation arises in the case of (10-b) (which violates the local Control Rule) vs. (12-b). The question arises of what type of constraint the Avoid Pronoun Principle is. Clearly, it does not belong to any of the three types of local constraints discussed above: To find out whether a given sentence S, respects or violates this constraint, it does not suffice to consider properties of S, alone. Rather, the properties of another sentence Sj must be considered. For the present purposes, let us refer to constraints like the Avoid Pronoun Principle that rely on a competition of sentences as translocal constraints. 5 It seems that the Avoid Pronoun Principle has been widely accepted in government and binding theory. However, conceptually it has arguably always been an alien element in that syntactic approach. In line with this, it has never been fully clear whether it should best be viewed as a purely syntactic constraint, or indeed as a pragmatic constraint derivable from, e.g, Gricean maxims. 6 Such a question does not arise with the use of translocal constraints in the minimalist program.
8
Gereon Müller & Wolfgang Sternefeld
3 The Minimalist Program Translocal constraints are employed in various versions of the minimalist program that are developed in Chomsky (1991, 1993, 1995, 1998). In general, translocal constraints can be viewed as selection instructions: Out of a given set of syntactic objects, one (or more) is chosen according to a metric that is specified by the constraint. The set of syntactic objects that participate in the competition is called the reference set. The minimalist program dispenses with D-structure and S-structure, retaining only the interface levels LF and PF as levels of representation. Consequently, representational constraints that apply at D-structure or S-structure are dispensed with (as well as global constraints). The remaining local constraints are either derivational, or they apply at LF or PF ("bare output conditions"). With respect to the issue of competition, this derivational orientation of the minimalist program has an immediate consequence. The competing syntactic objects are derivations (not representations or η-tuples of levels of representations). In line with this, translocal constraints in the minimalist program are often called transderivational constraints. A common property of most (if not all) translocal constraints in the minimalist program is that they can be conceived of as economy constraints in some sense; i.e., a translocal constraint choses the most economical derivation in the reference set, according to some metric of economy. This certainly holds for the first translocal constraint suggested in the minimalist program, the Fewest Steps condition.
3.1 Fewest Steps 3.1.1
V-to-I Movement in Chomsky ( 1991 )
Chomsky (1991) is concerned with deriving the difference between French and English with respect to V-to-I movement (cf. Pollock 1989): French has overt V-to-I movement of finite verbs; English does not have such movement (except for auxiliaries). This is shown in (14). (14) a. Jean embrassei souvent [vp ti Marie ] b. *Jean souvent [vp embrassej Marie ] c. *John kissesi often [yp t] Mary ] d. John often [yp kissesi Mary ]
The Rise of Competition in Syntax
9
As a first step towards accounting for these data, Chomsky assumes that French has "strong" I nodes, whereas English has "weak" I nodes. This distinction becomes important for the following local (derivational) constraint: (15)
Strength of I: Strong I tolerates adjunction of all Vs; weak I tolerates adjunction only of "light" Vs (auxiliaries).
This excludes (14-c) in English: Overt V-to-I movement violates Strength of I. In contrast, overt V-to-I movement in (14-a) in French does not violate this constraint. Still, something extra needs to be said about (14-b) in French, which vacuously fulfills Strength of I, just like (14-d) in English does. Thus, the question is: Why is overt V-to-I raising obligatory if it is possible? Chomsky's (1991) background assumption is that inflection is base-generated in I. If V does not raise to I, I must lower to V in overt syntax, so as to fulfill another local constraint, the Stray Affix Filter, which prohibits inflectional affixes that are not attached to a verbal host. The crucial idea now is that overt I lowering creates an unbound trace that must be undone by LF via covert raising of the whole V-I complex to the position of the trace of I. The derivations underlying (14-a) and (14-b) in French are given in (16) and (17), respectively. (16) a. b.
Jean I2 souvent [yp embrassei Marie J Jean [¡2 embrasse!-I 2 ] souvent [yp ti Marie ]
(17) a. b. c.
Jean I 2 souvent [vp embrassei Marie ] Jean t 2 souvent [yp [v, embrassei-I 2 ] Marie 1 Jean [y, embrassei-I 2 ] souvent [yp ti Marie 1
(overt raising) (overt lowering) (covert raising)
The second derivation has more movement steps than the first one, and it is therefore filtered out as uneconomical by the translocal economy constraint Fewest Steps, which can be formulated as follows: (18)
Fewest Steps'. If two derivations Di and D 2 are in the same reference set and Dj involves fewer operations than D 2 , then Di is to be preferred over D2.
A definition of reference set that works for the approach in Chomsky ( 1991 ) is (19); here, the numeration is the set of all lexical items (including functional heads) that are used in a derivation. 7
10
Gereon Müller & Wolfgang Sternefeld
(19)
Reference Set: Two derivations Di and D2 are in the same reference set iff they (i) have the same numeration and (ii) respect all local constraints.
The qualification in (i) ensures that, e.g., (20-b) cannot accidentally block (20-a) even though it involves fewer syntactic operations. Furthermore, the statement in (ii) produces the welcome consequence that (20-c) cannot accidentally block (20-a) even though it involves fewer syntactic operations by leaving the w/i-phrase and the auxiliary in situ - (20-c) violates a local constraint like the Wh-Criterion, which requires a wA-phrase to move to SpecQ+u,/,] overtly in English. (20) a. Whati have2 you t2 seen ti ? b. You have seen a car c. *You have2 seen whti ? This particular application of a translocal constraint in the minimalist program is not generally accepted anymore. Still, it can serve as an illustration of certain recurring properties and problems of translocal constraints, and indeed of competition-based approaches in general. First, as Chomsky (1991:433) observes, this system "tends to eliminate the possibility of optionality in derivation. Choice points will be allowable only if the resulting derivations are all minimal in cost ... This may well be too strong a conclusion, raising a problem for the entire approach." As an example, consider optional topicalization in English. (21-a) and (21-b) are both well formed, even though (21-b) invariably involves one movement operation more than (21-a). (21 ) a. b.
Mary gave a book to John! To Johni Mary gave a book tj
The solution suggested by Chomsky (1991:433) is that certain movement operations (like, we can assume, topicalization in English) might be "assigned to some other component of the language system, perhaps a 'stylistic' component of the mapping... to PF." Movement operations of this type might then be exempt from the Fewest Steps constraint. Alternatively, we might revise the definition of reference set appropriately, such that the two derivations in (21) do not compete anymore. For instance, we might add the requirement that competing derivations must have identical LF representations, as in (22).
The Rise of Competition in Syntax 11 Assuming that sentences which differ with respect to whether topicalization has applied must have different LFs, this would yield the desired result. (22)
Reference Set (revised): Two derivations Di and D2 are in the same reference set iff they (i) have the same numeration and the same LF, and (ii) respect all local constraints.
Second, a potential conceptual problem arises. Translocal economy constraints increase complexity: To find out whether a given sentence is grammatical, it does not suffice to look at internal properties of the sentence (Does it violate a local constraint?); rather, the properties of other sentences have to be taken into account as well (Does the sentence have the most economical derivation in the reference set?). Chomsky (1991:448) remarks that this might suggest that "language design as such appears to be in many respects 'dysfunctional,' yielding properties that are not well adapted to the functions language is called upon to perform." 8 Third, Chomsky (1991) discusses successive-cyclic movement. It is standardly assumed that long-distance w/i-movement of adjunct wA-phrases must be successive-cyclic; otherwise, a locality constraint (like the ECP) will be fatally violated, as with, e.g., w/i-island configurations; cf. (23). (23)
*Howi do you wonder fcp whether to fix the car ti | ?
Chomsky (1991) observes that successive-cyclic movement creates a potential problem for the Fewest Steps condition: Successive-cyclic movement as in (24-a) should always be blocked by one-swoop movement as in (24-b). (24) a.
Howi do you think |cp t" that John said | cp t', that Bill fixed the car ti ]]? b. *Howi do you think [CP — that John said fcp — that Bill fixed the cart, | ] ?
Note, though, that no particular problem arises under the notion of reference set in (19) or (22). According to these definitions, only those derivations can compete that respect all local constraints of grammar, i.e., that are otherwise well formed. By hypothesis, the derivation that generates the surface representation (24-b) violates a locality constraint; hence, it cannot compete with the derivation that generates (24-a), and (24-a) is chosen by Fewest Steps because there is no competing derivation that would be more economical.
12 Gereon Müller & Wolf gang Sternefeld But what if we were to dispense with clause (ii) in the definition of reference set, or that clause (ii) were weakened in such a way that some derivations violating local constraints could compete after all. (As we will see below, there is some evidence for this latter option.) Then, the derivations generating (24-a) and (24-b) might compete, and the problem of accounting for successivecyclic movement under Fewest Steps would persist. How, then, can we permit successive-cyclic movement in (24)? Chomsky (1993) advances the following solution: "Operations" as they are relevant for Fewest Steps do not simply involve applications of Move a as such. Rather, a more complex process of chain formation that (a) moves some item to its target position and (b) automatically inserts intermediate traces in appropriate positions counts as a single operation for the purposes of Fewest Steps: 9 (25)
Form Chain: Move a to its target position and freely insert intermediate traces in appropriate positions.
On this view, "successive-cyclic" movement is no more costly from the perspective of Fewest Steps than one-swoop movement. Furthermore, the initial evidence concerning the French case of overt I-lowering followed by covert V-raising remains unaffected: A succession of movement operations involving a single item can only be reanalyzed as one instance of Form Chain (one operation for the purposes of Fewest Steps) if there is no other operation that intervenes; but in the case at hand, the operation spell-out that creates the branching in the derivation to PF and LF must intervene between overt lowering and covert raising in the derivation in (17). Hence, (17) still involves two applications of Form Chain. 3.1.2 Wh-Topicalization in Epstein (1992) Another application of the Fewest Steps condition is the account of the ban on w/i-topicalization in Epstein (1992). As noted above, topicalization is in principle optional in English; cf. (21). For many speakers, topicalization is also optionally possible in contexts like (26), where the target position is in an embedded clause and the matrix clause involves short w/i-movement. Given the qualification that competing derivations must have identical LFs, this poses no problem for Fewest Steps. (26) a. b.
Who] ti said [cp that [IP Mary gave a book to John 2 ]] ? Whoi ti said fcp that to John 2 lip Mary gave a book t 2 ]] ?
The Rise of Competition in Syntax
13
Interestingly, embedded topicalization becomes impossible when the item that is topicalized is a wA-phrase; cf. (27-b). Note that the embedded whphrase may stay in situ in overt syntax, giving rise to a multiple question interpretation; cf. (27-a). (27) a. Whoi ti said fcp that [IP Mary gave a book to \vh0m2 |] ? b. *Whoi ti said [cp that to whom2 fip Mary gave a book t2 J] ? Epstein proposes deriving the ban on wA-topicalization in (27-b) from the Fewest Steps condition. The derivations D) (generating (27-a)) and D 2 (generating (27-b)) are in the same reference set. Assuming that all wA-phrases must be in the domain of a SpecQ+,,,/,ι at LF, they both end up with the LF representation in (28): (28)
*Whoi to whom2 t t said fcp that [IP Mary gave a book
]] ?
D| reaches this LF by applying one (covert) instance of wA-movement to the embedded object to whomNote that there is only one movement operation in this case, either because LF movement of arguments does not have to be successive cyclic, or because successive-cyclic covert movement can be analyzed as one instance of Form Chain. D2, on the other hand, reaches the same LF by applying two instances of wA-movement to the embedded object to whom2 (viz., one overtly and one covertly - given the intervening spellout operation, these two movement operations cannot be reanalyzed as one instance of Form Chain). Hence, Di blocks D 2 via Fewest Steps. As shown in Müller & Sternefeld (1996), the same kind of analysis may be given for the ban on wA-scrambling in German, which is illustrated in (29). (29) a.
Warum 1 hat der why has ART b. *Warumi hat wasi why has what
Fritz wasj gelesen ? Fritz what read der Fritz ti gelesen ? ART Fritz read
However, it is also shown in Müller & Sternefeld (1996) that the Fewest Steps approach to the ban on optional movement of wA-phrases which is later undone by further, covert operations is not entirely unproblematic, and may necessitate additional assumptions. For one thing, German exhibits the same ban on wA-topicalization as English: (30) a.
Weri sagte ti fcp daß Maria wem 2 ein Buch gegeben hat 3 | ? who said that Maria whom a book given has
14 Gereon Müller & Wolf gang Sternefeld b. *Weri sagte ti Lcp wem2 hat3 Maria t2 ein Buch gegeben t3 | ? who said whom has Maria a book given This strongly suggests an identical account in terms of Fewest Steps. However, since German topicalization always requires V/2 movement, and since V/2 movement is incompatible with the presence of a complementizer in German, the derivations generating (30-b) and (30-a) do not share an identical numeration, and we would wrongly expect no competition to arise. 10 Thus, to accomodate this evidence, it seems as though the definition of reference set must be revised, as in (31) - given complelementizer deletion at LF, (30-a,b) can be assumed to be identival at this level. (31 )
Reference Set (second revision): Two derivations Di and D2 are in the same reference set iff they (i) have the same LF, and (ii) respect all local constraints.
Moreover, it turns out that there are several well-formed constructions attested in the world's languages in which wA-phrases can in fact undergo optional overt fronting to a non-target position. In Müller & Sternefeld (1996), we discuss evidence from partial wA-movement, wA-imperatives, and wAreconstruction. For the present purposes, the example of optional partial whmovement to a SpecQ_,„/,] position in Ancash Quechua may suffice (cf. Cole 1982). (32-a) shows that wA-phrases may be fronted to a SpecCj+UJ/,i target position in overt syntax in Ancash Quechua; (32-d) shows that whphrases may also stay in situ in overt syntax, raising (by assumption) to the SpecC[+U)/j] position in covert syntax. Interestingly, (32-b) and (32-c) are also possible. Here, the wA-phrases raise to an intermediate SpecQ-u,/,] overtly. Given that this implies an additional wA-movement operation at LF, Epstein's (1992) Fewest Steps approach should rule out these cases. (32) a.
b. c. d.
I cp Ima-ta-taqi (qam) kreinki [cp t" Maria muna-nqa-n-ta [cp t', what acc you believe Maria want-nom-3-acc José ti ranti-na-n-ta ]]] ? José buy-nom-3-acc [CP — (Qam) kreinki |cp ima-ta-tai Maria muna-nqa-n-ta [cp t, José ti ranti-na-n-ta ]]] ? I CP — (Qam) kreinki [cp — Maria muna-nqa-n-ta [cp ima-ta-tai José ti ranti-na-n-ta ]]] ? I CP — (Qam) kreinki [cp — Maria muna-nqa-n-ta [cp — José imata-tai ranti-na-n-ta ]]] ?
The Rise of Competition
in Syntax
15
The conclusion drawn in Müller & Sternefeld (1996) in view of well-formed constructions like this one is that reference sets should be significantly reduced in size by assuming that identity of surface structure (rather than identity of LF) matters in the definition of reference sets, as in (33). (33)
Reference Set (third revision): Two derivations Dj and D2 are in the same reference set iff they (i) have the same surface structure, and (ii) respect all local constraints.
This way, partial w/z-movement is permitted, but it is clear that much (in fact, most) of the original evidence in favor of Fewest Steps is lost: Thus, on this view, neither French V-in situ, nor English (or German) w/i-topicalization can be ruled out by Fewest Steps anymore. As noted in Sternefeld (1997), this situation might be viewed as indicative of a general problem with translocal constraints: A significant reduction of competition in reference sets may be empirically desirable so as to account for cases of optionality (as in partial wA-movement constructions); but as an unwanted side effect, it also threatens to undermine the notion of translocal economy itself: Many ill-formed derivations that could be ruled out by translocal constraints will now survive because the more economical derivation is not part of the same reference set anymore. Finding a suitable definition of reference set that is weak enough to permit optionality and strong enough to actually do some work is one of the fundamental concerns of all versions of the minimalist program that employ the notion of competition. 3.1.3
Freezing
in Collins
(1994)
Evidence for yet another definition of reference sets comes from Collins' (1994) account of freezing effects with Α-movement in English. As shown in (34-a,b), subject NPs are islands for extraction in English, whereas object NPs permit extraction (with certain types of verbs). In the present context, the interesting case is that of subject NPs that originate in object position, as in the case of passivization. As can be seen in (34-c), such derived subject NPs are also islands. (34) a. Whoi did John take [NP a picture of TI | ? b. *Who1 is INP a picture of ti | on sale ? c. *Whoi was I NP2 a picture of ti ] taken t2 by John ?
16 Gereon Müller & Wolf gang Sternefeld In a derivational approach, it must be shown that any derivation of (34-c) leads to illformedness. In one derivation, Dj, NP raising to subject position applies before w/z-extraction from NP takes place. (35) a. I cp — was | IP — taken [NP2 a picture of whoi ] by John J] b. I CP — was Iip [NP2 a picture of whoi ] taken t2 by John ]] c. *|CP whoi was [IP [NP2 a picture of t| ] taken t2 by John ]] Here, extraction of who\ from NP2 (which is in subject position already, hence non-L-marked, and therefore a barrier) violates a local constraint like the C E D . " (36)
CED ("Condition on Extraction Domain"): Movement must not cross a barrier.
In another derivation, D2, wA-movement precedes NP raising: (37) a. b. c.
I CP — was [IP — taken [NP2 a picture of whoj ] by John ] ] I CP who] was [IP — taken [NP2 a picture of ti ] by John J] [CP whoi was [IP [NP2 a picture of ti ] taken t2 by John ]]
This derivation violates another local constraint, the Strict Cycle Condition in (38). The reason is that NP raising targets the subject position. The subject position is included in the CP domain, which has already been affected by w/î-movement to SpecC earlier in the derivation. (38)
SCC ("Strict Cycle Condition"): No movement operation may target a landing site that is included in a domain that has already been affected by movement earlier in the derivation.
So far, translocal constraints are not needed in an account of the illformedness of (34-c). The Fewest Steps condition does become relevant, though, when we consider a third derivation, D3. This derivation proceeds by what Collins (1994) calls chain interleaving. First, the w/i-phrase who\ is extracted from the object NP2 while NP 2 is still in situ (i.e., transparent for extraction); who\ adjoins to VP. Second, NP raising to the subject position takes place. Finally, who\ moves from its intermediate position to SpecC; see (39). (39) a. b.
[CP — was [IP — taken [yp INP2 a picture of whoi ] by John ]]] I CP — was [IP — taken [yp whoi [yp [NP2 a picture of ti ] by John |]]]
The Rise of Competition in Syntax c. d.
17
[ cp — was LIP INP 2 a picture of TI | taken |vp WHOI |vp T2 by John ]]]] I cp whoi was [ip [np2 a picture of ti ] taken | yp t', | yp t2 by John ]]]]
Di violates the CED; D2 violates the SCC. D3 violates neither of these local constraints. However, D3 is blocked by Di and D2 via Fewest Steps: Other things being equal, D3 needs three movement steps where D| and D2 make do with two movement steps. 12 This approach has an important consequence for the definition of reference sets. The three derivations Di, D2, and D3 yield the same surface string, which is ill formed. Thus, the more economical derivations that block D3 via Fewest Steps are not well-formed derivations, as in the applications of Fewest Steps discussed above, but rather ill-formed derivations that violate local constraints, viz., the CED and the SCC. This reasoning implies that reference sets can in fact not be defined as assumed so far, by requiring that only those derivations can compete that satisfy all local constraints - in the case at hand, Di and D2 violate local constraints. Still, we cannot simply drop this requirement in the definition of reference sets; otherwise, all instances of movement would invariably be blocked in favor of in-situ derivations by Fewest Steps, and syntactic derivations would be fairly trivial. It seems that what is needed in view of this conflicting evidence is a relativized notion of local constraint satisfaction. In this context, the idea of convergence of derivations introduced in Chomsky (1993) becomes relevant: Only those derivations that converge can compete with respect to translocal constraints. Essentially, whereas all violations of local constraints lead to ungrammaticality, only a subset of violations of local constraints also leads to non-convergence. Ungrammatical derivations that converge may then still be used to block other derivations as ungrammatical, as in the freezing construction discussed by Collins (1994). It is an empirical issue how convergence is to be defined. As a rule of thumb, and for the present purposes, we can say that a violation of those constraints that trigger movement (like the WA-Criterion, the Extended Projection Principle (EPP), which triggers subject raising, or whatever constraint optionally triggers topicalization) leads to non-convergence, whereas a violation of constraints like the CED and the SCC permits convergence of a derivation. 13 Under these assumptions, the notion of reference set needed for the approach in Collins (1994) can be defined as in (40). Note that the analysis is compatible with assuming that either numerations, or surface structures, or LF representations, or any combination of these determines the competition;
18 Gereon Müller & Wolfgang Sternefeld hence, a commitment to one of these options is not necessary in the case at hand. (40)
Reference Set (fourth revision): Two derivations D] and D2 are in the same reference set iff they (i) have the same numeration/surface structure/LF, and (ii) converge.
To end the discussion of Fewest Steps, we would like to emphasize that there is no inherent reason why the notion of an "operation" that is mentioned in the Fewest Steps condition should be confined to movement. Indeed, Chomsky & Lasnik (1993) argue that the deletion of intermediate traces in the LF component (which is argued to be an option with arguments and impossible with adjuncts in Chomsky 1986a, Lasnik & Saito 1992, and related work) is also regulated by the Fewest Steps condition. There have been many more applications of the Fewest Steps condition in the minimalist program (see, e.g., the Fewest Steps account of the ban on semantically vacuous quantifier raising in Fox 1995), but these may suffice for the time being. 14 Let us now consider the translocal economy constraint Shortest Paths.
3.2 Shortest Paths The Shortest Paths condition can be defined as follows (cf. Chomsky 1993, 1995): (41)
Shortest Paths·. If two derivations Di and D2 are in the same reference set and the movement paths of Di are shorter than the movement paths of D2, then D, is to be preferred over D2.
Various applications of this condition have been suggested in minimalist syntax. Perhaps the most striking one concerns the derivation of superiority effects. 3.2.1 Superiority Effects in Chomsky ( 1993) and Kitahara (1993) Superiority effects in English are illustrated by the examples in (42). (42) a. I wonder [cp whoi C [n> ti bought what 2 ]] b. *I wonder |cp what2 C [ip who, bought t2 ]]
The Rise of Competition
in Syntax
19
c. Whomi did John persuade ti [cp to visit whom2 | ? d. *Whom2 did John persuade whomj |cp t'2 to visit I ? The Superiority Condition proposed by Chomsky (1973) demands that in cases where there are two (or more) w/z-phrases that could in principle be moved to a given SpecC[+U)/,] position, only the highest wA-phrase can undergo such w/i-movement overtly, i.e., the one that asymmetrically ccommands the other(s). This condition is respected in (42-a,c). The examples in (42-b,d) are ungrammatical because the highest w/z-phrase in the clause has failed to undergo overt movement to SpecC; rather, the lower w/i-object has moved to SpecC. As indicated in Chomsky (1993) and argued extensively in Kitahara (1993), superiority effects can systematically be accounted for by the translocal condition Shortest Paths. For instance, the movement path from ti to whotti\ in (42-c) is shorter than the movement path from t 2 to whom2 in (42-d). Hence, given that the two derivations Di and D2 generating (42-c) and (42-d), respectively, compete (which would follow from most definitions of reference sets envisaged above), Di blocks D2 as ungrammatical by Shortest Paths. Or does it? Recall that at least some of the evidence for Fewest Steps (the ban on V-in situ in French, the ban on w/i-topicalization and w/z-scrambling in English and German) has relied on the assumption that covert movement counts in the same way that overt movement does. But assuming that LF movement also counts for the Shortest Paths condition leads straightforwardly into a dilemma: In the case at hand, the derivation that has the shorter overt w/z-movement path invariably has the longer covert w/i-movement path, and it seems that by LF, both derivations have w/z-movement paths of equal length. Hence, ceteris paribus, both should be well formed. In view of this, several steps can be taken. First, one can assume that there is in fact no covert w/z-movement of any kind; this makes it possible to maintain the Shortest Paths account of superiority phenomena without qualification, but is incompatible with the Fewest Steps applications sketched above. Second, one might explicitly distinguish between Fewest Steps and Shortest Paths in this respect: Whereas Fewest Steps compares whole derivations, Shortest Paths compares only the overt parts of derivations. A third possibility is developed in Sternefeld (1997). The underlying intuition of this account is that LF movement from a position a to another position β has a chance to be shorter than overt movement from a to β. The central observation is that the issue of whether a Shortest Paths account of superiority effects is incompatible with covert w/z-movement
20
Gereon Müller & Wolfgang
Sternefeld
is highly dependent on how path length is defined. If (movement) path length is determined by considering the number of nodes crossed by a movement operation, there is indeed a problem. But suppose now that path length is determined by considering the number of complete chains that are crossed. Now LF movement of whom2 in (42-c) will create a shorter path than overt movement of whom2 in (42-d), even though whom2 originates in the same position and targets the same landing site. The reason is that covert movement of whom2 in (42-c) crosses t], which is part of a chain, but not a complete chain, whereas overt movement of whom2 in (42-d) crosses whom\, which is a complete chain at this point of the derivation. Given that whom\ crosses the same number of complete chains in the course of overt movement in Di (generating (42-c)) and in the course of covert movement in D2 (generating (42-d)), the small difference pertaining to movement of whom-i becomes decisive, and Di successfully blocks D2 as ungrammatical via Shortest Paths, even under the assumption that covert w/i-movement exists and is relevant for the Shortest Paths condition. Another interesting issue raised by the Shortest Paths account of superiority effects is posed by what one might call "LF-optionality." Sentences like (43-a) have two possible readings (see Baker 1970) that correspond to two different LF representations, given LF movement of wh-in situ elements. (43)
Whoi ti wonders [cp where2 we bought what3 t2 ] ? a.
b.
whoi what3 ti wonders | cp where2 we bought t3 t2 | Answer: John wonders where we bought the books, Mary wonders where we bought the records, etc. whoi t] wonders [CP where2 what3 we bought t3 t2 J Answer: John wonders where we bought what, Mary wonders where we bought what, etc.
Given that all wh-in situ phrases must undergo movement to a SpecC[+u,/,] position at LF, D 2 (creating (43-b)) should block Di (creating (43-a)) because Di's paths are longer. Again, there are several possible solutions. 15 As before, one might stipulate that covert wA-movement either does not exist, or does not count with respect to the Shortest Paths condition. Alternatively, this evidence could be viewed as a further argument that reference sets are defined in such a way that competing derivations must have identical LF representations.
The Rise of Competition in Syntax 21 3.2.2
Yo-Yo Movement in Collins
(1994)
The term yo-yo movement characterizes a combination of lowering and raising operations affecting a single item in the course of a derivation, or even within the overt part of a derivation. Derivations employing yo-yo movement are identified as problematic in Chomsky (1986a) (the main observation being attributed to Andy Barss), but envisaged as legitimate possibilities in Lasnik & Saito (1992). Collins (1994) shows that the availability of yo-yo movement would make a wrong prediction for the West African language Ewe, and attempts to derive a ban on yo-yo movement from the Shortest Paths condition. Ewe is among the languages that show reflexes of successive-cyclic whmovement in the C domain. The reflex of successive cyclicity concerns the morphological form of the 3.Pers.Sing. subject pronoun in the canonical subject position. The regular form of the pronoun is é\ cf. (44-a). The regular pronoun é can be replaced by wo in cases of long-distance extraction (focus movement, in the case at hand); cf. (44-b). (44) a.
Kofi gblö [cp be Kofi said
b.
Kofi] ε
é/*wo Jo Kösi ]
that he
hit Kösi
me gblö [CP (t.) be
Kofi Foc I
said
é/wo /ο ti |
that he
hit
Collins assumes as the correct underlying generalization that é is replaced by wo if and only if the local SpecC position is filled. Accordingly, we can postulate that the apparent optionality of wo in (44-b) is due to an option for longdistance Α-bar movement of arguments in Ewe to apply either successivecyclically, via SpecC (in which case wo is obligatory), or in one swoop (in which case wo is impossible). It does not come as a surprise from a pretheoretical point of view that Α-bar movement that originates in the matrix clause and targets a SpecC position there does not trigger the morphological change in the embedded subject position: (45)
Kofii ε me gblö na ti |cp be é/*wo Jo Kösi | Kofi Foc I said to that he hit Kösi
Still, to ensure that wo is impossible in (45), a derivation like (46) that employs yo-yo movement must be ruled out. In this derivation D ¡ , Kofi\ is first lowered to the embedded SpecC position, licensing wo in the subject position there, and then raised to the target SpecC position in the matrix clause.16 (46) a.
Foe [IP I [yp said |pp to Kofii | [CP that |IP he hit Kösi |||]
22
Gereon Müller & Wolfgang Sternefeld b. c.
Foc I ip I fvp said Lpp to ti | [cp Kofii that |ip he hit Kösi ]]]] Kofii Foc I IP I [vp said |PP to ti ] [cp t', that [IP he hit Kösi ]]]]
There are various possibilities to exclude such a derivation (relying, e.g., on versions of the SCC or versions of the Proper Binding Condition). Still, Collins (1994) observes that D] in (46) is blocked by the derivation D2 in (47) via Shortest Paths. D2 proceeds without intermediate lowering. 17 (47) a. b.
Foc fip I [VP said [pp to Kofii J [CP that [IP he hit Kösi ]]]] Kofii Foe [IP I [vp said [pp to ti ] [CP that [IP he hit Kösi ]]]]
A final remark is due concerning the notion of reference set presupposed by this analysis. Since "he" is a wo in D] and an é in D2, it is clear that this difference must not suffice to create different reference sets. This can be accomplished in a number of ways, e.g., by defining reference sets with respect to a level of representation at which the difference in pronoun shape is invisible (possibly LF), or by explicitly stipulating in the definition of reference set that minor differences like the one at hand do not suffice to create different competitions. It is this latter strategy that is also pursued by Nakamura (1998) in his approach to wA-movement in Tagalog. 3.2.3 Tagalog Wh-Movement in Nakamura (1998) A generalization underlying wA-movement in the Austronesian language Tagalog is that only the highest A-position of a given clause (the subject position) is accessible for wA-movement; Nakamura (1998) assumes this to be SpecT (or Speci). In constructions in which an agent NP occupies the highest A-position (the so-called Agent Topic (AT) constructions), this NP can be wA-moved; an NP bearing a different Theta-role that shows up in an object position cannot undergo wA-movement; cf. (48). 18 (48) a.
[CP Sinoi ang [χρ t', b-um-ili [vp tL tv ng damit2 ]]] ? who Ang bought^ dress,,,/, 'Who is the one that bought the dress?' b. *[CP Ano2 ang [ΧΡ siJuani b-um-ili [yp ti TV t2 ]]] ? what Ang Juanafo b o u g h t ^ 'What is the thing that Juan bought?'
A different marking on the verb triggers the so-called Theme Topic (TT) construction. Here, the theme NP occupies the structural subject position SpecT; and indeed, only the theme NP can undergo wA-movement; cf.:
The Rise of Competition in Syntax
23
(49) a. * | c p S i n o i a n g [χρ damiti b-in-ili |vp t( tv t2 J J] ? who Ang dressai bought/· 7'Who is the one that bought the dress?' b. tcp Ano2 ang [χρ t'2 b-in-ili [yp ni Juan tv t2 IDJ ? what Ang bought τ τ JuanerÄ, 'What is the thing that Juan bought?' Nakamura's (1998) basic idea is that the derivations generating (48-a) and (49-a) compete, as do the derivations generating (48-b) and (49-b). The derivations underlying (48-a) and (49-b) can then block their respective competitors as ungrammatical because of the Shortest Paths constraint. To see this, consider the case of wA-movement of the theme NP in (48-b) and (49-b). The movement path from the VP-internal object position to the SpecC target position in (48-b) is longer than the path from the subject position SpecT to SpecC in (49-b). Consequently, the Shortest Paths condition guarantees that the derivation generating (49-b) blocks the derivation generating (48-b) as ungrammatical. An analogous account is available for the agent wA-movement case in (48-a) vs. (49-a). As Nakamura observes, this analysis raises two further potential problems. First, we have to ensure that derivations can compete even though they do not have identical lexical material - the Agent Topic and the Theme Topic constructions clearly differ in lexical make-up. Nakamura accomplishes this by replacing the notion of "identical numeration" in the definition of reference set with the more liberal notion of "non-distinct numeration;" the latter is defined in such a way that two numerations that only differ with respect to functional features do not count as distinct. (Clearly, this raises some nontrivial questions for other languages in which competitions of the type that Nakamura postulates seem unwanted.) Second, the derivation that generates, e.g., (49-b) may minimize the whpath in comparison with the derivation that generates (48-b), but it increases path lengths in the Α-domain. It is not quite clear how problematic this is; in the case presently under consideration, the Α-chain formed in (49-b) by theme raising is only minimally longer than the Α-chain formed in (48-b) by agent raising, whereas the wA-chain formed in (49-b) is much shorter than the wA-chain formed in (48-b). There would be even less of a problem for the agent wA-extraction case in (48-a) and (49-a). In any event, Nakamura (1998) replaces the notion of "movement paths" in the definition of the Shortest Paths condition with the more specific notion of "comparable chain links." This yields the effect that, e.g., the derivation generating (49-b) blocks the derivation generating (48-b) just because the former derivation's wA-chain
24
Gereon Müller & Wolfgang Sternefeld
links are shorter than the latter derivation's comparable wA-chain links, irrespective of the length of other chain links created by Α-movement, V raising, etc. 3.2.4 Freezing in Chomsky ( 1995) Recall the English freezing construction in (34-c), which is repeated here: (50)
*Whoi was [np2 a picture of ti ] taken Í2 by John ?
Above, we have considered three derivations. If NP raising precedes whmovement, the CED is violated. If w/i-movement precedes NP raising, the SCC is violated; the CED and the SCC are both local constraints. Finally, if chain interleaving occurs, this derivation can be excluded by invoking the Fewest Steps condition (cf. Collins 1994). Chomsky (1995) suggests that translocal economy constraints might play an even bigger role in accounting for the illformedness of (50). The idea is that the second derivation can be excluded without recourse to something like the SCC; the Shortest Paths condition can do this just as well. Consider again the two derivations that are compatible with Fewest Steps: (51) a.
b.
(i) I cp — was [ip — taken [np2 a picture of whoi ] by John ]] (i·) [cp — was I ip [np2 a picture of whoi ] taken t2 by John ]] (iii) *[cp whoj was [¡ρ [np2 a picture of tj j taken t2 by John ]] (i) [cp — was [ip — taken | np2 a picture of whoi | by John ]] (ii) [cp whoi was L i p — taken [np2 a picture of ti J by John ]] (iii) [ cp whoi was [ip [np2 a picture of ti | taken t2 by John ]]
Chomsky's (1995:328) suggestion reads as follows: "Passive [i.e., NP raising] is the same in both [derivations]; w/i-movement is 'longer' in the illicit one in an obvious sense, object being more remote from SpecC than subject in terms of number of XPs crossed. The distinction might be captured by a proper theory of economy of derivation." In other words: D 2 in (51-b), which does not violate a local constraint (the SCC, by assumption, being irrelevant or dispensable), is blocked as ungrammatical via Shortest Paths by the more economical D] in (51-a), which does violate a local constraint (the CED) but converges.
The Rise of Competition in Syntax 25
3.3 Procrastinate Chomsky (1993, 1995) assumes the following local condition as a trigger for overt movement. (52)
Feature Condition: a. b.
Strong features must be checked in overt syntax. Weak features must be checked by LF.
As in the approach in Chomsky (1991), French I features are classified as "strong," English I features as "weak." However, whereas the constraint Strength of I (cf. (15)) in the 1991 analysis states that strong I tolerates V raising, the Feature Condition forces V raising to a strong I. Another difference between the two analyses is that the 1993 model does not require I lowering in the syntax anymore if V does not overtly raise to I. Looking back at the French/English paradigm in (14), we can now see that the original problem has disappeared: (53-a) respects the Feature Condition, (53-b) does not, and there is no need to invoke a translocal economy constraint to exclude (53-b). (53) a. Jean embrassei souvent | yp t| Marie ) b. *Jean souvent | vp embrassei Marie | c. *John kissesi often | yp tj Mary | d. John often Ivp kissesi Mary | However, a new problem has appeared: Assuming LF V-to-I movement in (53-d), this derivation respects the Feature Condition; but it seems that (53-c), with overt V-to-I movement, does so as well. This derivation was ruled out by Strength of I in Chomsky's (1991) approach, but with this condition gone, something else must be said. More generally, a condition is needed that guarantees that overt movement is possible only if it is forced by the Feature Condition, i.e., in the presence of strong features. This is achieved by the Procrastinate condition, which is explicitly formulated as a translocal constraint in Marantz (1995:357). (54)
Procrastinate: If two derivations Di and D2 are in the same reference set, and D| differs from D2 in that an item a is moved covertly in Di and overtly in D2, then Di is to be preferred over D2.
26
Gereon Müller & Wolfgang Sternefeld
Procrastinate blocks (53-c) in favor of (53-d), which delays V-to-I movement to the LF component. However, (53-b) does not block (53-a), assuming that only those derivations can compete that converge, and that (53-b) does not converge because of its Feature Condition violation. The current status of Procrastinate is somewhat unclear. There have been attempts to dispense with this translocal constraint, either by deriving its effects from local constraints, or by reducing it to the Fewest Steps condition (cf. Kitahara 1997).19 3.4 Merge before Move Chomsky (1995, 1998) assumes that syntactic structures are created by alternating operations of structure-building (Merge) and movement (Move). At any given stage of the derivation, the situation can arise that it must be decided whether the next step is a Merge or a Move operation. The following translocal condition settles the issue by preferring Merge to Move if both are possible as such; the specific formulation is based on Frampton & Gutman (1999). (55)
Merge before Move: Suppose that two derivations Di and D2 are in the same reference set and respect all local constraints, and Di = (Σο,..., Σ„, Σ„ + ι, ...Σ*) and Da = (Σ 0 ,..., Σ„, Σ' η + ι , ...Σ(). Then Di is to be preferred over D2 if Σ„ —>• Σ„+ι is an instance of Merge and Σ„ —> is an instance of Move.
Evidence for this condition comes from expletive constructions in English. Consider the data in (56-a,b). (56) a. Therei seems [ip ti to be [pp someone2 in the room J] b. *Therei seems [ip someone2 to be [pp t 2 in the room ]] Given the predicate-internal subject hypothesis, someone is first merged in the SpecP position. When the derivation reaches the embedded IP domain, the Extended Projection Principle (EPP) becomes active; this local constraint requires filling of Specl by either Merge or Move. Assuming that there is part of the numeration at this stage of the derivation, two possibilities arise: Either there is merged in Specl (and subsequently raised to the matrix Specl position), as in (56-a), or someone is moved to Specl (and there is merged later in the derivation, directly in the matrix Specl position), as in (56-b).
The Rise of Competition in Syntax
27
These two derivations involve an identical numeration, and they both respect all local constraints. In this case, Merge before Move tells us to choose the derivation underlying (56-b) and dispense with the derivation that generates (56-a). Given that identity of numeration is a prerequisite for competition, (57) is correctly predicted to be possible - if there is no there present in the numeration, there is no competing derivation here that could be preferred by Merge before Move. (57)
Someonei seems | ip tj to be ti in the room |
The question arises of whether there is a deeper reason why Merge operations count as more economical than Move operations. Chomsky ( 1 9 9 5 , 1 9 9 8 ) suggests that Move is to be defined in terms of Merge, which would make it inherently more complex, and this fact might ultimately be exploited in an attempt to derive the Merge before Move condition. Chomsky (1998:14) himself remarks: "Good design conditions would lead us to expect that simpler operations are preferred to more complex ones, so that Merge ... preempt|s| Move, which is a 'last resort,' chosen when nothing else is possible."
3.5 Conclusion The four translocal constraints discussed so far do not yet exhaust the list of translocal constraints that have been proposed; see, e.g., the translocal Economy of Representation constraint in Chomsky (1991), or the translocal Preference Principle for Reconstruction in Chomsky (1993). Still, the constraints discussed here can be considered representative. At this point, we can address the question of what the structure of a minimalist syntax with translocal constraints looks like. Such a syntax has two parts. In the first part, derivations are created by structure-building (Merge), movement (Move), deletion, and perhaps other operations. Convergent derivations are assembled in reference sets according to criteria that must be decided on (see the above definitions of reference sets for some options). In the second part, translocal constraints choose among the competing derivations and thus determine the wellformedness of sentences. In essence, then, it turns out that a minimalist syntax with translocal constraints has exactly the shape that Prince & Smolensky (1993) attribute to an optimality-theoretic grammar: A first generator part (called Gen) creates the candidate set (= reference set, in minimalist syntax); Gen has only local constraints. A second "harmony"-evaluation part (called H-Eval)
28
Gereon Müller & Wolfgang Sternefeld
determines the optimal candidate(s) (= derivation(s), in minimalist syntax) in the candidate set. More generally, we will see that all kinds of competitionbased syntax have this structure, which is schematically shown in (58). (58)
Structure of a competition-based a. b.
syntax:
Gen creates the candidate set {Q, C2,...}. H-Eval determines the optimal candidate(s) C, (C¡,...) in {C,,C 2 ,...}.
Two issues concerning the notions of optimality and grammaticality in minimalist syntax remain to be clarified. 20 First, does optimality equal grammaticality? Whereas this is the case in optimality-theoretic syntax (see below), things are slightly more involved in minimalist syntax. As we have seen, it has been argued that derivations that converge can enter the competition, even though they may violate certain local constraints (recall the discussion of freezing effects in English). Accordingly, an optimal candidate may be one that violates a local constraint, and is therefore ungrammatical. Second, we have so far left open the question of how optimality evaluation proceeds in the presence of more than one translocal economy constraint in the grammar. In this case, conflicts may arise. As a simple, abstract example, suppose that there are two translocal constraints (T! and T 2 ), and only three derivations (D l t D 2 , and D 3 ) in the reference set.21 Suppose further that T] prefers Di over D 2 and D 3 ; that T 2 prefers D 2 over Di and D 3 ; and that a derivation Do that would be preferred by both Ti and T 2 fails to converge, so that it cannot participate in the competition. In such a situation, various possibilities arise. A first possibility would be what we can call "tolerance." On this view, it suffices to be selected by one translocal constraint to be optimal (hence, potentially grammatical); consequently, both Di and D 2 would be classified as optimal. A second possibility would be "ranking": The conflict among translocal constraints is resolved by a ranking, such that that derivation is optimal that is preferred by the higher-ranked constraint in the case of conflict. If T] is ranked higher than T 2 , this would imply that only D] is optimal. Finally, a third possibility is what we can call "breakdown": In the case of conflicting instructions made by translocal constraints, no derivation can emerge as optimal. It turns out that this last possibility is the one that is generally assumed, and though it is not easy to come up with decisive evidence, it also strikes us as the most adequate one (see Collins 1994, Sternefeld 1997, and Müller 2000). On this basis, we can conclude that grammaticality can be defined as follows in a minimalist syntax with translocal constraints:
The Rise of Competition
(59)
in Syntax
29
Grammaticality: A derivation D, is grammatical iff (a) and (b) hold: a. b.
(60)
D, does not violate a local constraint. D, is optimal. Optimality in minimalist syntax: A derivation D, is optimal iff there is no derivation D^ in the same reference set that is preferred over D, by a translocal constraint.
The minimalist system that emerges in this way is not without problems. Some of those show up in all versions of competition-based syntax. For one thing, since a minimalist syntax of this type involves a global competition in a reference set that may be large, or even infinite, the overall complexity of the system is significantly increased. For another, we have seen that it is difficult to come up with a single, unified definition of reference set that accommodates all available evidence that one may want to treat in terms of translocal constraints. Other problems are more specific and confined to the particular notion of optimality that is employed in minimalist syntax. Most notably, the H-Eval metric is not maximally homogeneous and simple (because it may depend on a number of formally unrelated translocal constraints); however, it is rather inflexible nevertheless. Specifically, all translocal constraints must be classifiable as economy constraints in some sense (thus, properties of sentences that are not related to economy considerations cannot be subject to optimization). Even more importantly, it implies that all variation among languages must take place in the Gen part of the grammar - there is no room for parameterization in the H-Eval system. It is not always obvious that this position can be maintained in the light of conflicting empirical evidence. As an example, consider the effect that the Shortest Paths condition has on w/i-movement in German. Recall that the Shortest Paths condition accounts for the superiority effect with wA-movement in languages like English; cf. (42). As has often been noted (see, e.g., Haider 1983), German does not exhibit superiority effects of this kind: (61) a. b.
Ich I Ich I
frage mich |cp wen C ti was2 ask myself who what frage mich [cp was2 C weri Í2 ask myself what who
gekauft hat 1 bought has gekauft hat | bought has
Still, it seems clear that the path from tj to werx in (61-a) is shorter than the path from t2 to was2 in (61-b). To avoid the result that the Shortest Paths
30
Gereon Müller & Wolfgang
Sternefeld
condition blocks (61-b) in favor of (61-a), additional assumptions concerning Gen are therefore necessary. 22 Not least because of problems like this, there is a strong tendency in recent versions of the minimalist program to dispense with translocal constraints - and hence, with the concept of competition - altogether; see in particular Collins (1997), Frampton & Gutman (1999), and also Chomsky (1995, 1998).23 Local (derivational) constraints like the Last Resort condition and the Minimal Link Condition (MLC) have been developed in Chomsky (1995) and much recent work as economy conditions that can take over at least some of the work that was done by translocal constraints like Fewest Steps and Shortest Paths. Similarly, effects that were attributed to Procrastinate and Merge before Move have been shown to be derivable without invoking translocal constraints. Apart from these theory-internal considerations, it is interesting to note that the fall of translocal constraints (and with it the fall of the concept of competition) in minimalist syntax goes hand in hand with the rise of optimality theory, and hence optimality-theoretic syntax, which inherently relies on translocality and competition. However, before turning to this approach, we will discuss another model of competition-based syntax, one that developed concurrently with (and as an extension of) government and binding theory: blocking syntax.
4 Blocking Syntax Blocking syntax was developed by DiSciullo & Williams (1987) and Williams (1997) on the basis of Aronoff's (1976) approach to blocking in morphology. To a large extent, it is equivalent to a syntactic theory that incorporates the Elsewhere Condition which has played a major role in phonology (see Kiparsky 1982 and the references cited there). A version of blocking syntax that is even closer to Kiparsky's approach is developed in Fanselow (1989, 1991); this approach relies on the Proper Inclusion Principle. A blocking syntax has the same general form as a minimalist syntax, viz., that in (58): Gen generates the candidates (in blocking syntax typically S-structure representations) which are assembled in candidate sets. The competing candidates are then subjected to an H-Eval procedure that determines the optimal candidate(s). The underlying idea of blocking syntax is that synonyms are not tolerated
The Rise of Competition in Syntax 31 in natural languages. Consequently, candidate sets are defined in terms of identity of meaning: (62)
Candidate Set: Two candidates C, and C ; are in the same candidate set iff they (i) have the same meaning, and (ii) respect all local constraints.
A candidate is grammatical iff it is optimal (given (62), an optimal candidate cannot violate a local constraint). The concept of optimality is different from that adopted in the minimalist program. The optimal candidate is the most specific one. Thus, more specific candidates block less specific ones: This is the Blocking Principle. (63)
Optimality in blocking syntax: A candidate C, is optimal iff there is no candidate C¡ in the same candidate set that is more specific.
This approach crucially depends on how specificity is understood. In morphology, irregular forms count as more specific than regular forms. In syntax, we can assume that C, is more specific than C, if local constraints (or rules) lead to C, 's distribution being more restricted than that of C¡. 4.1 Comparative Formation in Williams (1997) With this in mind, let us consider an example that is discussed in Williams (1997): English comparative formation. As shown in (64), English has two ways of comparative formation: a morphological one and a syntactic one. The two strategies appear to be in complementary distribution. (64) a. b. c.
hot —• hotter, *more hot happy —> happier, *more happy colorful *colorfuller, more colorful
Williams suggests that these data can be accounted for by the following two rules, which we call rule A and rule B. (65) a.
Rule A (morphological): Comparatives can be formed by attaching the suffix er to monosyllabic adjectives, and to disyllabic adjectives ending in y.
32
Gereon Müller & Wolfgang Sternefeld b.
Rule Β (syntactic): Comparatives can be formed by adding more in the syntax.
The candidate that employs the morphological comparative *colorfuller violates rule A, while the candidate that uses the syntactic comparative more colorful respects both rule A (vacuously) and rule B. In contrast, candidates like hotter or happier respect rule A (and, vacuously, rule B); but what is left open by rule A and rule Β is why candidates like *more hot or *more happy are ungrammatical. The illformedness of these forms could be derived by replacing rule Β with rule B' in (66). However, adopting this rule would lead to a redundancy: The context that permits morphological comparatives in rule A is repeated in an identical form as the context that prohibits syntactic comparatives in rule B'. (66)
Rule B'\ Comparatives can be formed by adding more in the syntax, unless the adjective is monosyllabic, or disyllabic and ending in y.
To avoid this redundancy, Williams proposes maintaining rule Β instead of adopting rule B'. The generalization that syntactic comparatives seem to be possible only if morphological comparatives are impossible is then derived by invoking the notion of blocking embodied in (63). Given rules A and B, morphological comparatives are more restricted in their distribution than syntactic comparatives - in fact, rule Β imposes no specific restrictions on syntactic comparative formation at all. Hence, if both morphological and syntactic comparatives respect all local constraints (like rules A and B), the more specific morphological comparative is selected as optimal. If, however, the morphological comparative violates a local constraint (like rule A), the syntactic comparative cannot be blocked anymore, and is consequently selected as optimal.
4.2 Anaphors vs. Pronouns in Fanselow (1991) Recall from section 2.1 the data that seem to suggest a complementary distribution of anaphors and pronominals in English (at least in the domain under discussion here). (67) a. Johnj likes himself ι b. *Johni thinks that Mary likes himself ι c. Johni thinks that Mary likes himi
The Rise of Competition in Syntax
33
d. *Johni likes him ι We have seen that standard government and binding theory accounts for these data by invoking the principles A and Β in (68). (68) a. b.
Principle A: An anaphor is bound in its binding domain, Principle Β: A pronominal is free in its binding domain.
However, as in the case of the comparative formation rules A and B' that were just discussed, it seems that this approach involves a redundancy: A generalization is missed if two separate local constraints are postulated for anaphors and pronominals, where the context that permits one strategy is identical to the context that precludes the other strategy (viz., the binding domain in both cases). As noted by Fanselow (1989, 1991), Burzio (1991), and Richards (1997), among others, a more elegant account can be given if the notion of competition is invoked. Here, we will sketch Fanselow's blocking approach. 24 Fanselow's analysis relies on the Proper Inclusion Principle (PIP), a version of the Elswhere Condition (cf. Kiparsky 1982) that can be viewed as a translocal constraint: (69)
Proper Inclusion Principle (PIP): a.
b.
Suppose that two feature assignment mechanisms M, and M 2 compete in a given structure. Then, other things being equal, M2 cannot be applied if Mi is more specific. Mi is more specific than M2 if the application domain of M2 properly includes the application domain of Mi.
The feature assignment mechanisms that play a role in the present context are (a) the assignment of the feature |+anaphoric] to an NP, and (b) the assignment of the feature [+pronominall to an NP - in short, reflexivization (or reciprocalization) and pronominalization. By assumption, the assignment of the feature l+anaphoric] is subject to (something like) Principle A, whereas there is no comparable requirement for the assignment of the feature [+pronominal|; i.e., Principle Β is dropped. This implies that, due to Principle A, anaphors are more restricted in their distribution than pronominals; the application domain of pronominalization properly includes the application domain of reflexivization. From this it follows directly that in all those cases where both anaphors and pronominals respect all local constraints, the PIP
34
Gereon Müller & Wolfgang Sternefeld
forces the choice of the anaphor. Pronominals can emerge only in contexts in which anaphors are precluded (e.g., because of a violation of Principle A, as in the examples presently under consideration). The PIP can be viewed as a version of the blocking principle that is part of the definition of optimality in (63). The only relevant change that must be made for the case at hand concerns the question of which entities compete. We can now assume that the competing items are complete syntactic objects (syntactic candidates), rather than feature assignment mechanisms.
4.3 Conclusion Blocking syntax is characterized by the fact that it is fairly simple in various respects. Most importantly, blocking syntax employs a simple concept of optimality in its H-Eval part. There is only one translocal constraint (the Blocking Principle that selects the most specific candidate), not more than one, as in minimalist syntax. In addition, due to the origin of the blocking principle as a means to avoid synonymy, blocking analyses uniformly rely on identity of meaning in the definition of candidate sets, again in contrast to the variability involved in minimalist syntax. However, the simplicity comes at a certain price: Harmony evaluation is even less flexible than in minimalist syntax. First, the blocking principle by its very nature can only have a small domain in which it is active; in general, the role of H-Eval in optimality theory is smaller than in the minimalist program. Second, there is no room for parameterization in the H-Eval domain at all. And third, since blocking analyses depend on complementarity of distribution, cases of optionality pose problems that are almost insurmountable. 25 A competition-based approach that strengthens the role of the H-Eval part of the grammar and increases flexibility in this domain is optimality-theoretic syntax; and it is this model that we finally turn to now.
5 Optimality-Theoretic Syntax 5.1 Basic Concepts By definition, an optimality-theoretic syntax takes the general form in (58), with the grammar divided into a Gen part that creates the competing candidates, and an H-Eval part that selects the optimal candidate(s). Recall that the notion of optimality in a minimalist syntax or in a blocking syntax is a
The Rise of Competition in Syntax
35
comparatively simple one: Optimality is determined by a small set of simple translocal economy constraints in the former case (cf. (60)), and by a single translocal blocking principle (selecting the most specific candidate) in the latter (cf. (63)). In optimality-theoretic syntax, there is only one translocal constraint that determines optimality: Optimal (and grammatical) is a candidate that has the best "constraint profile" - or, more precisely, a candidate f o r which there is no competitor that has a better constraint profile; cf. (70). This definition makes it possible f o r more than one candidate to be optimal in a given candidate set. (70)
Optimality in optimality-theoretic syntax: A candidate C, is optimal (= grammatical) iff there is no candidate Cy in the same candidate set that has a better constraint profile.
However, the evaluation metric is internally highly complex. The notion of constraint profile is defined in (71). (71 )
Constraint Profile". A candidate C j has a better constraint profile than a candidate C, iff there is a constraint Con such that (a) and (b) hold: a. b.
C j satisfies Con better than C, ; i.e., C j satisfies Con and C, violates Con, or C j violates Con less often than Q . There is no constraint Con' ranked higher than Con on which C, and C j differ.
This presupposes that in addition to the local constraints employed by the Gen component, which are inviolable and unranked, the H-Eval component relies on a system of local constraints that are violable and ranked (and, by assumption, universal) in order to determine the best constraint profile, hence, optimality. T h e ranking among the violable local constraints of the H-Eval component is indicated by the symbol the H-Eval constraints themselves are typically written with small capitals. Optimality-theoretic competitions are often illustrated by tables (so-called tableaux)·, optimality of a candidate is indicated by the pointing finger: cs= ; violation of a local constraint is shown by a star * in the appropriate column of the table; if this violation is fatal f o r a candidate (i.e., responsible for its suboptimality), an exclamation mark ! is added (redundantly). In the abstract H-Eval competition in table Τ ι , in which the candidate set consists of C1-C5, Q emerges as the optimal candidate: It avoids a violation of the high-ranked constraints A and Β (unlike C3-C5), and
36
Gereon Müller & Wolfgang Sternefeld
it minimizes a violation of the low-ranked constraint C (unlike C2). Hence, there is no competing candidate with a better constraint profile than Q . T\ : Determining
optimality
Candidates ErCi
A Β
*
**t
c2 c3 C4 C5
C
*!
*! *!
*
By reranking the constraints Β and C in T i , candidate C3 emerges as the optimal candidate; cf. table T2. T2: Reranking Candidates
Ci c2 1®= C3 c4 c5
A c Β *! * 1* *
*! *!
*
Reranking of constraints f o r m s the basis of the concept of parameterization in optimality-theoretic syntax. A further characteristic feature of this approach is that it is essentially non-cumulative; i.e., no number of violations of a lowranked constraint can outweigh a single violation of a higher-ranked constraint. Thus, suppose that there were an additional, lowest-ranked constraint D in Ti that Q violated, say, five times, and that C 2 - C 5 did not violate at all. This would not undermine C i ' s optimality. Before we turn to some illustrations of optimality-theoretic analyses, something must be said about the nature of candidates and candidate sets. Optimality-theoretic syntax is strongly influenced by work in optimalitytheoretic phonology. Since the latter is characterized by an orientation that is predominantly representational (cf. Prince & Smolensky 1993 and McCarthy & Prince 1995), it does not come as a surprise that many approaches in optimality-theoretic syntax postulate that the competing candidates created by Gen are surface structure representations. This holds, e.g., f o r what can arguably be viewed as the three most influential analyses in optimalitytheoretic syntax so far, viz., Grimshaw (1997), Pesetsky (1998), and Legendre, Smolensky & Wilson (1998). However, there is no inherent reason
The Rise of Competition in Syntax
37
why the candidates that are subject to optimization should not be syntactic objects of a more complex type, like (D-structure, S-structure, LF) tuples as in government and binding theory, or, indeed, complete derivations, as in the minimalist program. 26 The choice of candidate type goes hand in hand with the choice of local constraint type that shows up in the H-Eval part as violable and ranked: If candidates are representations, constraints will be representational; if candidates are derivations, constraints will be derivational; and if candidates are (D-structure, S-structure, LF) tuples as in government and binding theory, constraints can take any of the forms sketched in section 2. Similarly, candidate sets can be defined in various ways, which of course significantly influences the nature of the competition. Basically, all of the definitions of reference sets in minimalist syntax that have been proposed (see section 3 and Sternefeld 1997) are also potential definitions of candidate sets in optimality-theoretic syntax. A further influential definition of candidate sets comes from Grimshaw (1997). She postulates that two candidates (Sstructure representations) compete iff they are realizations of the same predicate/argument structure and have non-distinct logical forms (or non-distinct interpretations). By making optimality depend on an intricate system of violable and ranked constraints, H-Eval - and hence, the concept of competition - becomes even more important than in minimalist syntax and blocking syntax. As a matter of fact, much work in optimality-theoretic syntax has tried to minimize the role of the Gen component, and maximize the role of the H-Eval component (but see Pesetsky 1997, 1998 for some cautionary remarks). An optimality-theoretic approach gains immediate support in all those contexts where postulating a competition of syntactic objects is initially plausible. This includes, but is by no means confined to, contexts where notions of economy seem to play a role. A prototypical case is one in which the wellformedness of a sentence S, that exhibits an otherwise peculiar property seems to depend on the unavailability of another sentence S¡ that exhibits the property one would normally expect. Here, S, is often referred to as a "repair" form; a typical instance is the English úfo-support construction. Accordingly, Jo-support was among the first phenomena to be tackled in optimality-theoretic syntax (see Speas 1995 and Grimshaw 1997). Most of the constructions discussed in sections 2-4 can also be viewed as suggesting an underlying competition; and indeed, they can fruitfully be addressed in optimality-theoretic syntax. This is shown in the following section.
38
Gereon Müller & Wolfgang Sternefeld
5.2 Case Studies 5.2.1 Anaphors vs. Pronouns in Wilson (1999) Let us begin with the competition between reflexivization and pronominalization. The following optimality-theoretic account is based on Wilson (1999). 27 Recall the generalization that, by and large, pronominals are allowed to express binding relations in English in just those cases in which anaphors are not allowed to do so. To account for this, a prerequisite is that two sentences which differ only with respect to the choice of anaphor vs. pronominal in a given position must compete. The ranking LOC-ANT » REF-ECON of the two constraints in (72) then produces the right results. (72) a.
b.
LOC-ANT ("Local Antecedent"): If a binding domain contains an anaphor, then it must also contain the anaphor's antecedent, REF-ECON ("Referential Economy"): A (referentially dependent) argument must not have lexical φfeature specification.
LOC-ANT is a version of Principle A; it requires a local antecedent for an anaphor. Hence, as a tendency, this constraint favors pronominals, which satisfy it vacuously. REF-ECON, on the other hand, inherently prefers anaphors to pronominals if we are willing to make the assumption that anaphors do not have a lexical -feature specification, whereas pronominals do. Consequently, when an anaphor can respect LOC-ANT, the violation of REF-ECON incurred by a pronominal is fatal; cf. table T3. T3 : Reflexivization Candidates is* Q : Johni likes himselfi C2: Johni likes him!
LOC-ANT
REF-ECON *!
However, when an anaphor cannot find an antecedent in its binding domain and must violate LOC-ANT, a violation of the lower-ranked REF-ECON constraint becomes possible, and pronominalization turns out to be optimal; cf. table T4.
The Rise of Competition in Syntax T4:
39
Pronominalization
Candidates Q : Johni thinks that Mary likes himselfi i®· C2'. Johni thinks that Mary likes himi
5.2.2
Complementizer-Trace
LOC-ANT
REF-ECON
*! *
Effects in Grimshaw ( 1997)
In section 2, we noted that government and binding theory accounts for the complementizer-trace effect in (4-a) on a purely local basis, without postulating a competition with the complementizer-less variant in (4-b) from which only the latter would emerge as optimal. This view is abandoned in Déprez (1991), which is the basis of the optimality-theoretic account advanced in Grimshaw (1997). As background, Grimshaw assumes that the size of clauses is variable. Clauses are extended projections of V; they are minimally VPs, but they can be IPs, CPs, or functional projections of an even bigger size, depending on the outcome of optimization. Bridge verbs in English permit both CP-embedding (with a complementizer - a declarative CP without a complementizer will typically fatally violate a high-ranked constraint that precludes empty head positions) and IP- or VP-embedding (without a complementizer). In the latter case, IP must be chosen if an auxiliary or do is present (i.e., if the need arises to accommodate an additional lexical head); VP can be chosen otherwise. The main constraints that are needed in the account of complementizer-trace effects are listed in (73). A possible ranking for English is OP-SPEC » T-LEX-Gov » STAY.28 (73) a.
b. c.
OP-SPEC ("Operator in Specifier"): W/i-operators must occupy a specifier position from which they ccommand all elements of the extended V projection over which they take scope. T-LEX-Gov ("Lexical Government of Traces"): A trace is lexically governed. STAY ("Economy of Movement"): Trace is not allowed.
STAY is a local version of the translocal economy constraint Fewest Steps; OP-SPEC is a version of the W/z-Criterion that is often postulated in government and binding theory (see, e.g., Lasnik & Saito 1992). The ranking OPSPEC STAY ensures overt wA-movement in simple questions in English.
40
Gereon Müller & Wolfgang Sternefeld
T - L E X - G O V corresponds to (a part o f ) the ECP. Assuming that candidates with and without that compete, the complementizer-trace effect is derived as shown in table T5. 2 9
75: Subject wh-movement Candidates Ci :... e®- C2:... C3:... C4: ...
vvhoi you vvhoi you you think you think
OPSPEC think [cp that fip ti will leave J] think lip ti will leave ] |cp that [IP vvhoi will leave J] [jp vvhoi will leave 1
T-LEXGov *!
STAY * *
*! *!
C 3 and C 4 fatally violate OP-SPEC. Both Q and C 2 violate STAY, but Q violates T - L E X - G O V in addition: t| in Q is not lexically governed {that being unable to do so), whereas ti in C2 is lexically governed (by the matrix V ) . 3 0 In contrast, an embedded V governs object traces throughout, irrespective of the presence or absence of a complementizer that, hence, T - L E X - G o v is satisfied equally well by Q and C2 in table T 6 . Given that Q and C2 do not diifer with respect to any other constraint either, optionality of a complementizer is correctly predicted in cases of object extraction, due to an identical constraint profile.
7V Object wh-movement Candidates «s· Ci :... us- C2:... C3: ... C4: ...
vvhoi you whoi you you think you think
OPSPEC think [cp that [IP she will invite ti J] think [IP she will invite ti ] fcp that fip she will invite whoi |] [1 she will invite vvhoi ]
T-LEXGov
STAY *
*
*! *!
Thus far, there is no evidence f o r treating T - L E X - G O V as a violable constraint in the H-Eval part of the grammar (rather than as an inviolable constraint in the Gen part). Such evidence can be gained by considering adjunct extraction. In this case, T - L E X - G O V is violated by both candidates involving w/z-movement (C|, C2). However, given that there is no competing candidate that can avoid a violation of T - L E X - G O V without violating a higher-ranked constraint (e.g., a candidate that employs a resumptive pronoun; see below), Ci and C2 can emerge as optimal despite this violation.
The Rise of Competition in Syntax Τη: Adjunct
41
wh-movement
Candidates es- Ci : ... vvhyi you think [cp that lip she has left t] |] i®· C2: ... vvhyi you think fjp she has left ti ] C3:... you think [cp that |jp she has left vvhyi 11 C4:... you think [¡p she has left vvhyi 1
OPSPEC
T-LEX-
Gov
STAY
*
*
*
*
*! *!
5.2.3 Subjacency and Resumptive Pronouns in Pesetsky (1998) and Legendre, Smolensky & Wilson (1998) Recall that resumptive pronouns often seem to be possible only as last resort strategies in cases where traces are blocked (see (8)). Competition-free models like government and binding theory have no obvious means to relate one construction to the other (at least, as long as they are supposed to stay strictly competetion-free; see note 3); but the case is different in optimality-theoretic syntax. An optimality-theoretic account of resumptive pronoun strategies is developed in Legendre, Smolensky & Wilson (1998) (on the basis of evidence from Chinese) and Pesetsky (1998) (on the basis of English data comparable to those in (8), as well as evidence from Hebrew, Russian, and Polish). The details of the two analyses differ a great deal, but the gist of the explanation is identical; it centers around two constraints like those in (74). 31 (74) a.
b.
CNPC ("Complex Noun Phrase Condition"): Traces must not be separated by a complex noun phrase from their antecedents, RES ("Resumptive Constraint"): Resumptive pronouns are prohibited.
The CNPC prohibits traces in certain (non-local) environments; RES disfavors resumptive pronouns (i.e., pronouns that are bound from an Α-bar position) in general. The ranking is CNPC » RES. (Thus, the two constraints and their ranking are analogous to what we have seen with LOC-ANT and REF-ECON in the domain of binding theory.) As with the wA-movement construction discussed in the last section, it must be ensured that overt movement of the relative operator takes place in examples like those in (8). We assume that this is independently taken care of. 32 Based on these assumptions, consider table T 8 .
42
Gereon Müller & Wolfgang Sternefeld
Tg: Trace vs. resumptive pronoun in transparent
contexts
Candidates I®· Q : the man [cp who(m)i I saw ti ] C 2 : the man |cp who(m)i I saw himi J
CNPC
RES *!
Both candidates respect CNPC. Consequently, the RES violation incurred by the resumptive pronoun in C2 becomes fatal, and C¡ is optimal. However, in the competition illustrated in table T9, Q violates CNPC. In this case, C2'S RES violation is tolerable, and the resumptive pronoun strategy emerges as optimal. 33 Tg : Trace vj. resumptive pronoun in C N P C contexts Candidates CI : the man [cp who(m)j [ip I don't believe [np the claim [cp t', that anyone saw t| J]]] us- C2: the man fcp who(m)i frp I don't believe [np the claim Icp that anyone saw himi J]]]
CNPC
RES
*! *
A lot more could be said about relativization in English and other languages in an optimality-theoretic approach (in particular, concerning f/iaf-relatives and their relation to wA-relatives), but these considerations will have to suffice for now; cf. Grimshaw (1997) and Pesetsky (1998). 5.2.4 Avoid Pronoun Consider now the Avoid Pronoun facts that were discussed in section 2 (cf. (10) and (12)). In English gerunds, PRO and a lexical pronoun can both occur in principle; however, PRO must be used instead of a lexical pronoun if it can fulfill the Control Rule. A transfer of Chomsky's (1981) approach into optimality theory is straightforward. The Control Rule in (11) can directly be viewed as an optimality-theoretic constraint (with the same qualification as in government and binding theory; see note 4); cf. (75-a). The Avoid Pronoun Principle in (13) can be simplified by turning this translocal constraint into a local (though violable) one; cf. (75-b). 34 The ranking for English is CONTROL »
(75) a.
*PRON.
CONTROL ("Control Rule"): If PRO is minimally dominated by a declarative clausal object a ,
The Rise of Competition in Syntax
43
then it must be controlled by an antecedent within the minimal C P dominating a . *PRON ("Avoid Pronoun"): Pronouns are prohibited.
b.
Suppose that candidate sets are defined in such a way that candidates with PRO and candidates with a lexical pronoun can compete, but, crucially, that sentences with different indexings (hence, different logical forms) do not compete. Then, the facts fall into place. The blocking of a lexical pronoun by P R O in cases where CONTROL can be satisfied is illustrated in table TIQ. PRO
Γιο·"
vs.
pronoun under
co-indexing
Candidates
CONTROL
Q : Johni would much prefer [ hisj going to the movie 1
*PRON *!
na* C 2 : Johni would much prefer [ PRO] going to the movie 1
Table Τ π illustrates the case where PRO is not co-indexed with the matrix antecedent, thereby violating CONTROL. Here, the *PRON violation incurred by all pronouns is non-fatal, and the pronoun strategy is optimal. T\ ι .· PRO vs. pronoun under
contra-indexing
Candidates
CONTROL
C 2 : Johni would much prefer | PRO2 going to the movie |
5.2.5
Superiority
*PRON *
o®· C i : Johnj would much prefer | his2 going to the movie | *!
Effects
The question arises of whether the evidence that is accounted for by translocal constraints in minimalist syntax can also be reanalyzed in optimalitytheoretic syntax. At least to some extent, this seems to be the case. As noted above, the STAY constraint adopted in Grimshaw (1997), Legendre, Smolensky & Wilson (1998), and much related work, is essentially a local version of the translocal Fewest Steps condition. Similarly, a local counterpart has been suggested for the translocal economy constraint Shortest Paths. Let us reconsider the superiority phenomenon as one of the core applications of Shortest Paths. Some relevant examples are repeated in (76). ( 7 6 ) a. I wonder [cp whoi C [IP ti bought what2 1| b. * I wonder fcp what2 C [IP whoi bought t2 |]
44
Gereon Müller & Wolfgang Sternefeld
Ackema & Neeleman (1998) propose a local version of Shortest Paths that we may call M I N - C H A I N ("Minimize Chain Length"). This constraint records a star * for every node crossed by a movement chain. 35 Assuming that only overt movement counts for the purposes of this constraint, (76-a) can successfully block (76-b) under M I N - C H A I N : Other things being equal, the whchain in (76-a) violates M I N - C H A I N twice (IP and C are crossed), whereas the wA-chain in (76-b) violates M I N - C H A I N four times (VP, I', IP, and C' are crossed), the third violation being fatal already. Another account is developed by Legendre, Smolensky & Wilson (1998). They start out with BAR, which is nearly identical to the CED given above: (77)
BAR ("Barriers Condition"): A chain link must not cross a barrier.
By a general operation of local constraint conjunction, BAR can be conjoined with various constraints, including itself. Reflexive local conjunction of BAR and B A R yields a new constraint B A R & B A R = B A R 2 that is violated if a chain link crosses two barriers. Local conjunction of BAR2 and BAR yields B A R 3 , which is violated if a chain link crosses three barriers, and so on. This mechanism recursively produces a subhierarchy of BAR' constraints which has a fixed internal ranking, given a universal meta-restriction on constraint conjunction: Coni & Con2 Coni, Con2·36 The violability of BAR' subhierarchy constraints makes it possible to adopt a simple theory of barriers according to which a significant number of XPs are barriers (see Köster 1987 for this general idea). Adopting a strict interpretation of Chomsky (1986a), Legendre, Smolensky & Wilson (1998) assume that all non-L-marked XPs are barriers, including VP, IP, subject NPs, adjunct CPs, etc. This makes the BAR' subhierarchy a good means to measure path lengths and derive typical Shortest Paths effects. In the superiority case currently under consideration, subject w/z-movement as in (76-a) crosses only an IP barrier, violating BAR, whereas object wA-movement as in (76-b) crosses a VP barrier and an IP barrier, violating BAR2, which is universally ranked higher. Given that the two candidates do not differ with respect to other constraints, and that they have a better constraint profile than all other competitors, it follows that the availability of short subject movement blocks the possibility of longer object movement, as desired. To end this section, note that Legendre, Smolensky & Wilson (1998) also succeed in reanalyzing the Tagalog w/z-movement evidence discussed above in an optimality-theoretic way by invoking the BAR' subhierarchy. More generally, we can conclude that all of the analyses involving translocal constraints
The Rise of Competition in Syntax
45
that have been proposed in minimalist syntax or blocking syntax can be recast in optimality-theoretic terms by employing local, violable constraints. 37
5.3 Some Open Issues Optimality-theoretic syntax inherits the complexity problem from minimalist syntax. Candidate sets are typically large (as in Pesetsky 1998), often infinite (as in Grimshaw 1997). In addition, there are several open issues that are specific to the optimality-theoretic approach. We will focus on two of these in what follows, and briefly mention five others after that. 5.3.1
Inputs and Faithfulness
An important optimality-theoretic concept that has played no role so far in our discussion is the notion of input. In optimality theory (cf. Prince & Smolensky 1993), Gen does not create competing candidates (the outputs) freely; rather, it does so on the basis of a given input. In phonology, inputs are underlying representations stored in the lexicon; here, inputs qualify as roughly the same types of objects as outputs. In syntax, it is much less clear what the input might look like (see Archangeli & Langendoen 1996). The null hypothesis that the input is a completely articulated potential sentence of the same type as the output candidates - is not unproblematic because it would seem to imply the assumption that all possible sentences are "stored," which cannot possibly be true. To find out what the input in syntax is, it is instructive to consider its theory-internal functions. By and large, there are two. First, the input is standardly taken to define the competition. Second, the input serves as a basis for faithfulness constraints that demand input/output identity and thereby minimize deviations from the input in the optimal output. Let us consider the second function first. Faithfulness constraints play an important role in phonology. Constraints of the PARSE (or MAX) family prohibit deletion of input material in the output; constraints of the FILL (or DEP) family prohibit insertion of output material that is not part of the input; constraints of the I DENT family prohibit modifying input material. Faithfulness constraints have also been adopted in much recent work in optimality-theoretic syntax. The following two constraints are taken from Legendre, Smolensky & Wilson (1998) and Bakovic & Keer (1999), respectively.
46
Gereon Müller & Wolfgang Sternefeld
( 7 8 ) a.
PARSE[SC0PE|:
Scope assignment in the input must be realized by chain formation in the output. b.
FAITH[COMPJ:
The output value of [±COMP] is the same as the input value. Note that (78-a) implies that the input is a more complex object than just a collection of words (a numeration) or a predicate/argument structure; it must be a highly structured representation that encodes the relative scope of operators. (78-b) presupposes an abstract feature [±COMP| that for the present purposes we can assume to be located on a V that selects a proposition. Let us consider candidates that violate these constraints. Suppose that (79-a) is the input for output candidate (79-b), and (79-c) is the input for output candidate (79-d). Legendre, Smolensky & Wilson (1998) assume that (79-b) violates PARSE|SCOPE] because matrix scope for how\ in the input (79-a) (indicated by [+wh]i) is reduced to embedded scope in the output (again indicated by [+wh]i). Similarly, Bakovic & Keer (1999) assume that (79-d) violates FAITH|COMP] because a [-COMP] specification in the input contrasts with a |+COMPL specification (hence, a complementizer) in the output. 3 8 (79) a. b. c. d.
| + w h | | ... wonder[ +u ,/,| [ [+wh| 2 ... what 2 ... howi ... | (input) You wonderf + i„/,] |cp [+wh]i [+wh]2 howi John did what2] (output) ... V\-comp\ I ». J (input) I think | cp that [pp on him Ji no coat looks good t| ] (output)
At this point, we need not go into the actual analyses in which these constraints play a role (as it happens, both faithfulness violations turn out to be non-fatal, i.e., (79-b,d) are optimal). The crucial question is: Is it really necessary to refer to the concept of input here, or is it possible to read the respective violations off the output forms, without any reference to inputs? At least for the cases at hand, the answer is straightforward: By enriching output representations in ways that have independently been proposed, a reference to inputs becomes unnecessary. (79-a,b) is a case where the intended matrix scope is not reached by chain formation in the candidate. Employing abstract scope markers ( Σ ) in S-structure representations (cf., e.g., Williams 1986), we can equivalently encode this input information in the output, as in (80-a). 39 As for the case in (79-c,d), the only assumption that we have to make (and which strikes us as innocuous, in fact, completely standard) is that selectional properties of lexical heads are accessible in syntax; cf. (80-b).
The Rise of Competition
(80) a. b.
in Syntax
47
Σ ι you wonder[+u,/,] [cp [+wh]i [+whb how] John did \vhat2 J (output) I think[_com/?i [cp that [pp on him h no coat looks good ti | (output)
P A R S E [ S C O P E ] and FAITHLCOMP] can now be modified in obvious ways, without reference to inputs. (81)a.
b.
PARSE|SCOPE] (revised):
Scope markers must be reached by chain formation, FAlTHfCOMP] (revised): Lexical [±COMPj selection requirements must be respected.
If this result can be generalized, and all syntactic faithfulness constraints can be reanalyzed in this way, we can conclude that these constraints do not support the concept of input anymore. Why should it be that the notion of input is relevant for phonological faithfulness constraints, but not for their syntactic counterparts? The answer, we believe, follows from what appears to be a fundamental difference between syntax and phonology: Syntax is an information-preserving system with richly structured output candidates, whereas phonology is a system that loses information, so that reference to an underlying input is necessary in constraints. With this in mind, let us turn to the other input function noted above, that of defining candidate sets. Since syntactic output candidates are richly structured, all the relevant information that they must share in order to compete can be read off them, independently of what notion of candidate set is adopted; again, this is in sharp distinction to phonology. Thus, it is possible to explicitly define candidate sets without reference to the concept of input. For instance, if we follow Grimshaw (1997) in assuming that competing candidates must have the same predicate/argument structure, we can read this information off the potentially competing candidates themselves. As a matter of fact, it turns out that an input-independent characterization of candidate sets cannot even be avoided in Grimshaw's own approach. Recall that Grimshaw (1997) postulates that two candidates compete only if they have non-distinct logical forms (in addition to identical predicate/argument structures). If the input fully determines the candidate set, this presupposes that an input is a complex object that exhibits all relevant logical form information. It is generally assumed that outputs can deviate from inputs in many ways, subject only to faithfulness constraints. Hence, if nothing else is said, we expect that output candidates can be semantically unfaithful to the input by, e.g., applying scope reduction (such that, e.g., a w/i-phrase with
48
Gereon Müller & Wolfgang Sternefeld
matrix scope in the input is interpreted with embedded scope in the output). This clearly implies that candidates with distinct logical forms can compete. This consequence is embraced by Legendre, Smolensky & Wilson (1998) (cf. (79-b)). However, such a result is incompatible with Grimshaw's (1997) assumptions, according to which competing candidates must have (not: go back to) non-distinct logical forms. Thus, even in this approach, the input cannot completely determine the competition; the requirement of non-distinct logical forms must be stipulated on top of it. More generally, it emerges that an input-free characterization of candidate sets is both readily available and independently motivated. Hence, reference to inputs is unnecessary for the purpose of defining competition in syntax. From all this, we would like to conclude that it may eventually be possible to dispense with the notion of input in syntax; but further research is needed in this domain (also see note 43 below). 5.3.2 Absolute
Ungrammaticality
Another important open question in optimality-theoretic syntax is how to account for the phenomenon of "absolute ungrammaticality" or "ineffability," i.e., cases where there does not seem to be a candidate in a candidate set that is grammatical. As an example, consider the following ungrammatical example involving wA-extraction across an adjunct island in German: (82)
*Wasi ist Fritz eingeschlafen [cp nachdem er ti gelesen hat | ? what is Fritz fallen asleep after he read has
Let us apply the suggestions that can be found in the literature to the case at hand. First, Pesetsky (1997, 1998) emphasizes that certain sentences may be ungrammatical not because they are classified as suboptimal in the H-Eval part of the grammar, but because they cannot be generated by Gen in the first place. Thus, a constraint like (83) might be part of Gen. (83)
("Adjunct Island Constraint"): A trace must not be separated by an adjunct clause from its antecedent. ADJ-ISL
Second, it is suggested in Grimshaw (1994) and Müller (1997) that certain optimal candidates may have properties that make them inaccessible for other domains of the language faculty, like, e.g., semantic interpretation. A D J - I S L might be part of H-Eval, but ranked higher than OP-SPEC. On this view,
The Rise of Competition in Syntax
49
(84) could block (83) as suboptimal; but this optimal candidate would be uninterpretable (indicated by #) and, hence, unusable. (84)
#Fritz ist eingeschlafen [cp nachdem er wasi gelesen hat ] ? Fritz is fallen asleep after he what read has
These two approaches have in common that they allow the possibility that absolute ungrammaticality is not located in the H-Eval component of grammar, but in a component that precedes (Gen) or follows (interpretation) optimization. If, however, H-Eval is to be held responsible for the ungrammaticality of (82), there must be a competing candidate with a better constraint profile that blocks it. A priori, this might be a candidate that employs a resumptive pronoun strategy, which is only legitimate in this context as a last resort. If this were so, the ineffability problem would be spurious in the case at hand. However, (85) shows that the resumptive pronoun strategy is not an option in German (a constraint like RES must outrank ADJ-ISL and other locality constraints in German): (85)
*Wasi ist Fritz eingeschlafen |CP nachdem er esi gelesen hat | ? what is Fritz fallen asleep after he it read has
What, then, could the optimal candidate blocking (82) look like? Following Prince & Smolensky (1993), Ackema & Neeleman (1998) propose that the empty candidate 0 (the "null parse") is part of every candidate set. This candidate violates the constraint in (86), which is typically ranked high. 40 (86)
* 0 ("Avoid Null Parse"): 0 is prohibited.
Constraints that are ranked higher than * 0 in effect become inviolable (given that there is no constraint except * 0 that 0 can violate). In this sense, * 0 introduces a dividing line into rankings. Thus, if both ADJ-ISL and the constraint that triggers vWi-movement (e.g., OP-SPEC) outrank * 0 , adjunct islands become inviolable. This is shown in table T12. T12: Adjunct islands and the null parse Candidates Q : wasi ... [cp nachdem er tj V ] C2· — ... [CP nachdem er wasi V ] e r C3: 0
ADJ-ISL
OP-SPEC
*0
*! *! *
50
Gereon Müller & Wolfgang Sternefeld
A final possibility to be discussed here is the neutralization approach to absolute ungrammaticality in syntax. Such an approach has been adopted by Legendre, Smolensky & Wilson (1998), Schmid (1998), Bakovic & Keer (1999), and Wilson (1999), among others. For the present case, a neutralization analysis might posit that the optimal candidate blocking (82) is (87). Fritz ist eingeschlafen [cp nachdem er was ι gelesen hat ] Fritz is fallen asleep after he something read has
(87)
The crucial difference from (84) is that WAS ι is turned into an indefinite pronoun, and the matrix C[+U)/,] is turned into a C[_„,/,]. Thus, there is a feature change from |+wh] in (82) to f - w h ] in (87), and the sentence is interpreted as declarative, rather than as a question. 41 If (87) is to block (82) as suboptimal, this presupposes that candidates that differ in their wA-feature specification can compete. But then, the problem arises that we would also wrongly expect one of the sentences in (88) to block the other. (88) a. b.
Wasi hat er ti gelesen ? what has he read Er hat wasi gelesen he has something read
The neutralization approach solves this problem as follows. The [±wh]specification is unambiguously specified in the input; an input with a [+wh] specification on some item and a minimally different input with a | - w h ] specification count as different, and define different candidate sets. The important assumption is that there is a faithfulness constraint that demands preservation of the | ± w h ] feature specification in the output: (89)
FAITH[WH]:
The output value of [ ± w h ] is the same as the input value. Suppose now that ADJ-ISL and OP-SPEC are ranked higher than FAITH|WH|. Then, (87) will have a better constraint profile than (82) both in the competition that has a f - w h ] specification in the input, and in the competition that has a f+wh] specification in the input. Thus, there is a "neutralization" of different input specifications in the output. This is shown in tables T,3 and Τ14. 42
The Rise of Competition in Syntax 51 T\j: Adjunct islands and neutralization; f-w] in the input Candidates c. wasi[+U)]... [cp nachdem erti V ] c2 — ... Icp nachdem er vvasj V ] — ... [cp nachdem er wasij_ u ,| V ]
ADJ-ISL
OP-SPEC
FAITH |WH| *
*! *!
*
T\4: Adjunct islands and neutralization; [+w] in the input Candidates Ci : vvasi[+u,|... [cp nachdem erti V ] C2: — ... [CP nachdem er WASIF+U)] V ] ny C3: — ... ICP nachdem er WASIF-u,] V ]
ADJ-ISL
OP-SPEC
FAITH|WH]
*! *! *
In transparent contexts, where movement may occur without a violation of a high-ranked locality constraint like A D J - I S L (cf. (88)), F A I T H [ WH| violations become fatal, and the candidate that maintains the [±wh] specification of the input emerges as optimal. 43 Of the four approaches to absolute ungrammaticality discussed here (Gen, interpretation, null parse, neutralization), the neutralization approach is arguably the most elegant one. Still, it is not without problems. One conspicuous peculiarity is that neutralization creates massive derivational ambiguity. A well-formed sentence like (87) can have different "histories," being an optimal candidate in two candidate sets with different inputs. This vacuous ambiguity may be considered problematic from the point of view of language acquisition and parsing; and it can only be avoided by additional meta-optimization procedures that compare the competitions in T, 3 and T| 4 ; cf. the notion of input optimization in Prince & Smolensky (1993) (called lexicon optimization in phonology). 5.3.3
Residual Issues
As remarked above, this does not exhaust the list of open issues that are currently under debate in optimality-theoretic syntax. We end this section by briefly mentioning a few others.
52
Gereon Müller & Wolfgang Sternefeld
Optionality In the best of all possible worlds, one would not expect optionality to arise in a theory that selects the best candidate. The solutions that have been proposed in view of this situation center around concepts like (i) true optionality, according to which more than one candidate can be optimal due to an identical constraint profile (recall the above discussion of complementizer-trace effects); (ii) constraint ties, which come in various versions (global and local, ordered, conjunctive, and disjunctive) and all somehow incorporate the idea that two (or more) constraints are equally important; (iii) pseudo-optionality, which rests on the idea that the observed optionality is only apparent, and reducible to different optimization procedures in different candidate sets; and (iv) neutralization again, essentially an elaborate version of (iii). It turns out that none of these solutions is completely unproblematic. See Müller (2000:chapter 5) for a critical overview. Degrees of
Grammaticality
According to the definition of optimality in (70), an optimal candidate is grammatical, and a suboptimal candidate is invariably ungrammatical, no matter what the relative quality of its constraint profile is in comparison with other suboptimal candidates. Without further assumptions, this makes it impossible to account for degrees of grammaticality (or acceptability) in a syntax-internal way, in contrast to what is the case in government and binding theory (albeit only by stipulation; cf., e.g., the traditional distinction between "mild" Subjacency violations and "strong" ECP violations). Cumulativity A related property of optimality-theoretic syntax is that, in its standard form, it does not capture cumulative effects; in government and binding theory, cumulativity manifests itself in the assumption that a sentence gets "more ungrammatical," the more constraints it violates. The reason for optimality theory's failure to integrate cumulativity is that many violations of a lowerranked constraint cannot outweigh a single violation of a higher-ranked constraint. However, as we have seen, this consequence does not hold if we adopt the mechanism of local constraint conjunction. Whether this is a positive or negative result remains to be seen.
The Rise of Competition in Syntax
53
Parameterization Work in government and binding theory and the minimalist program has focussed on morphological properties of lexical items as factors that determine parametrization. Such a view can in principle be reconciled with optimalitytheoretic syntax without too much ado (one and the same syntactic constraint ranking may yield different optimal candidates if the morphological properties of these candidates differ from language to language, and there are constraints that refer to these morphological properties). However, in practice, work in optimality-theoretic syntax has often sought to account for syntactic parameterization exclusively in terms of syntactic reranking, and either deny a relation to morphology, or view morphological properties not as the basis, but as a reflex of syntactic parameterization. Again, this issue is far from being settled; for opposing views, see, e.g., Grimshaw & Samek-Lodovici (1998) and Legendre, Smolensky & Wilson (1998) on the one hand, and Vikner (2000) on the other. Another recurring question in the optimality-theoretic approach to parameterization is whether every reranking of constraints that is logically possible is also linguistically plausible (i.e., results in a potential grammar). The hypothesis that it is is known as factorial typology, and is the focus of much recent work. Multiple
Optimization
Following Prince & Smolensky (1993), it is standardly assumed that there is exactly one optimization procedure in syntax; the candidates are evaluated only once. An alternative that is considered in Prince & Smolensky (1993) is that optimization procedures can affect candidates more than once. Recently, this idea has been pursued in various ways in optimality-theoretic syntax. Several proposals rely on the distinction between interpretive optimization and expressive optimization: Interpretive optimization may precede expressive optimization (see Wilson 1999), expressive optimization may precede interpretive optimization (see Hendriks & de Hoop 1999), or the two procedures may influence each other (see Blutner 2000 and Jäger & Blutner 2000). Heck (1998:this volume) argues that the government and binding model can be transferred into optimality-theoretic syntax by assuming that optimization applies three times: First, D-structures are subject to optimization; second, the optimal D-structure output serves as the input to S-structure optimization; finally, the optimal S-structure output serves as the input to LF optimization.
54
Gereon Müller & Wolfgang
Sternefeld
Fanselow, Kliegl & Schlesewsky (1999) develop an optimality-theoretic approach to parsing that is based on the idea that parsing can be viewed as an iteration of optimization procedures that stop when the final word of a sentence has been taken in. Finally, Heck & Müller (2000) adopt a minimalist syntax in which each cyclic node (XP) created in the derivation is subject to optimization; only the optimal XP is submitted to the next step of the derivation, and so on, until the optimal root node is determined. Thus, in this system, optimization is not just multiple; it is local in the sense that each optimization procedure affects only a small unit. None of these cases of multiple optimization can be viewed as a notational variant of standard, single optimization. It remains to be seen to what extent multiple optimization is a viable alternative.
6 The Contributions to This Volume Most of the papers in this volume originate from a workshop at the 2117 Annual Conference of the DGfS (German Linguistic Society), which took place at the university of Constance in February, 1999. The contributions have in common that they discuss pieces of empirical evidence for which a competition-based approach has some initial plausibility. They are all primarily concerned with optimality theory, and they take up a number of the open issues that were just mentioned. Biiring's paper is a study of free word order in German, a domain that has been tackled in terms of violable and ranked constraints in pre-optimality work going back to the 70's and 80's. Like Choi (1999), Büring's approach presents an optimality-theoretic analysis that rests on Lenerz's (1977) seminal work. Central theoretical notions that play a role include optionality, degrees of grammaticality, and, in particular, the prosody/syntax interface. Fanselow & Cavar adopt the copy theory of movement and assume that overt and covert movement both apply before spell-out. The crucial difference relates to the question of which members of a copy chain are pronounced, and which are deleted. To give a comprehensive answer to this question, the authors discuss evidence from a variety of languages that includes longdistance and partial wA-movement, the w/i-copy construction, the NP split construction, and instances of head movement. They develop an optimalitytheoretic approach that reconciles features of the analyses in Pesetsky (1998) and Grimshaw (1997), and that relies on a system of multiple (local) optimization which integrates Chomsky's (1998) concept of a phase.
The Rise of Competition
in Syntax
55
As in the case of word order, it has often been argued in pre-optimalitytheoretic analyses of relative quantifier scope that the notions of violability and ranking (or weight) of interacting factors play an important role. Fischer sets out to transfer some main results of one such study (viz., Pafel's 1998 approach to quantifier scope in German) into optimality theory. In view of the fact that this approach also employs the notions of optionality (in the guise of scope ambiguity) and cumulativity, Fischer develops an analysis that rests on constraint ties and local constraint conjunction. Heck is also concerned with quantifier scope in German. Based on the observation that scope relations at LF are highly dependent on word order at S-structure, which is in turn strongly influenced by the variable order of arguments at D-structure, he argues for a new system of multiple optimization (called "cyclic optimization"). This system takes the government and binding organization of grammar as a starting point and postulates three optimization procedures: at D-structure, at S-structure, and at LF. In his experimental study of English gapping constructions, Keller observes the influence of various interacting constraints. On the basis of this evidence, he argues for an approach that incorporates features of optimalitytheoretic syntax but also provides room for (a) cumulativity of constraint violation; (b) gradient acceptability of candidates (related to the number and quality of constraint violations); and (c) a distinction between "hard" and "soft" constraints (e.g., a hard clause-mateness constraint, and soft subject/predicate and minimal distance constraints), the latter being violable and subject to choice of context. Like Büring, Lenerz addresses free word order structures in German. This paper complements Biiring's, since Lenerz argues that the empirical evidence does not in fact support a competition-based approach. Going through all the main pieces of word order evidence that have been analyzed in terms of competition, Lenerz shows that an analysis that focusses on the variable semantic and pragmatic contributions of definite and indefinite NPs in different positions in the German middle field can yield empirically adequate results without any recourse to the notion of competition; the particular approach that he develops relies on choice functions and the partitioning of clauses into domains with background-determined reference and with immediate sentence constituent reference. Schmid's paper is a close investigation of different ways to handle optionality in optimality-theoretic syntax. After reviewing the options that exist in the literature, Schmid focusses on a comparison of a specific (global) notion of constraint tie and the concept of neutralization. The cases of optionality that
56
Gereon Müller & Wolfgang Sternefeld
serve as the empirical basis are (a) complementizer drop in English, (b) whmovement in French root clauses, and (c) the German "Ersatzinfinitiv" (IPP) construction. For each of these phenomena, a global tie analysis is compared with a neutralization analysis; general strategies are suggested that permit a transfer from one type of approach to the other; and a conclusion is drawn that ultimately favors the neutralization solution. The focus of Vikner's paper is the conflict that arises between two wellmotivated constraints in Icelandic: First, the relative scope of quantified items must correspond to their surface order; second, NPs can undergo object shift in front of an adverbial only if the main verb has undergone movement. Interestingly, it seems as though relative scope does not have to correspond to surface order in exactly those contexts in which object shift is blocked. Vikner shows that this supports an optimality-theoretic analysis in which the first constraint is ranked below the second one, and is thus violable in the case of conflict. Finally, the analysis is extended to German. Vogel takes as a starting point the observation that free relative constructions by their very nature strongly suggest constraint violability and constraint ranking: They are incompatible with the standard assumption that there is a one-to-one correspondence between Case assigners and items that are assigned Case. Moreover, Case conflicts can show up in free relatives which are often resolvable by ranking (but may also result in absolute ungrammaticality). On this basis, Vogel develops an optimality-theoretic analysis of free relative constructions in German, and he investigates the typological implications that result from reranking the proposed constraints; among other things, the analysis sheds new light on the concepts of factorial typology and neutralization. Finally, Wanner observes that there are conflicts between linking rules which become manifest in the domain of psych verbs in English. For instance, the CONTROL-RULE favors experiences as external arguments, whereas the CAUSER-RULE prefers causers as external arguments; in an optimalitytheoretic approach, the conflict can be resolved by ranking the latter rule above the former, and this is what explains the difference between Mary frightens John (where the theme is a causer) and John fears Mary (where the theme is not a causer). An interesting theoretical aspect of this analysis is that the competing candidates are not sentences, but argument structures. We believe that the papers collected in this volume give a fair indication of both the potential and the limitations of optimality-theoretic syntax, and of competition-based syntax in general. To us, they strongly suggest that it is
The Rise of Competition in Syntax
57
fruitful to further explore the concept of syntactic competition, even though an eventual success of this enterprise cannot be taken f o r granted at this point.
Acknowledgments We would like to thank Kirsten Brock f o r the enormous a m o u n t of excellent work that she put into the present volume. We are also grateful to Oliver Avieny and Annette Farhan f o r their editorial assistance. Miiller's work was supported by D F G grants M U 1444/1-1,2-1; Sternefeld's work was supported by a D F G grant within the SFB 441.
Notes 1. Our use of the term global follows its original interpretation in Lakoff (1971) throughout this introduction. Sometimes, global is understood in a rather different sense in the literature (including Chomsky 1995 and Collins 1997), as a synonym for translocai or transderivational (see below). As we will see, in this second interpretation, a global constraint can in fact not be checked by exclusively looking at a given syntactic object S,·. 2. The resumptive pronoun strategy is by itself marginal in English and is chosen here mainly for expository reasons; see Chomsky (1981:173) for a discussion of the case at hand. However, resumptive pronouns as a last resort in cases where movement is blocked are widely attested in other languages. See Shlonsky (1992), Pesetsky (1998), and the references cited there. 3. Note, however, that Chomsky (1982:63f.) envisages an account in terms of the Avoid Pronoun Principle, which, as we will see, is an exception insofar as it is in fact a non-local constraint in government and binding theory. 4. As Manzini shows, the Control Rule is actually a theorem that can be derived from more primitive assumptions. This need not concern us here. 5. The Avoid Pronoun Principle has been applied to pro-drop phenomena in languages like Italian by Haegeman (1994:217). The idea here is that the availability of the empty pronominal pro in the subject position of finite clauses tends to make the use of an overt pronoun impossible; on this view, overt subject pronouns can only show up in pro-drop languages if they fulfill a function that pro cannot fulfill (like, e.g., focus interpretation). Also recall Chomsky's (1982) analysis of resumptive pronouns that was mentioned above. 6. Also see Reinhart (1983) on a version of binding theory that relies on pragmatic constraints of this type.
58
Gereon Müller & Wolfgang Sternefeld
I . The notion of a numeration is first introduced in Chomsky (1993), so this is strictly speaking an anachronism. 8. This view is later abandoned in the minimalist program. Thus, Chomsky (1998:6) speculates that "language design might be optimal... approaching a 'perfect solution' to minimal design specifications." 9. We can assume that a position is "appropriate" for insertion of intermediate traces if the resulting structure does not violate local constraints - e.g., those on improper movement (an Α-bar trace must not end up within an Α-chain, see May 1979; an adjoined trace must not end up within a chain headed by an antecedent in a specifier position (and vice versa), see Müller & Stemefeld 1996). 10. At least, this holds as long as we are not prepared to assume that embedded V/2 in German is derived by deletion of a complementizer daß that is present in the numeration. I I . All XPs which are not L-marked are barriers. XPs which are not in complement position are therefore always barriers. 12. Note that the concept of Form Chain cannot undermine this reasoning because the two instances of chain formation applying to the w/i-phrase in D3 are not adjacent, but interrupted by another operation - that of NP raising to subject position. 13. See Chomsky (1993, 1995) for discussion of the various options that arise here. 14. Also compare Kitahara's (1997) reconstruction of Procrastinate effects in terms of Fewest Steps; see section 3.3 on Procrastinate. 15. Note in passing that Chomsky's (1991) way out in terms of 'stylistic movement' that can be chosen in the case of optional overt movement is not viable here for obvious reasons, LF movement never being 'stylistic.' 16. The derivation originally envisaged by Collins (1994) is actually even more complex since it involves two additional VP-adjunction operations. The derivation in (46) is sufficient for our present purposes, though. 17. One might think that Fewest Steps would also suffice to block Di in favor of D2. However, assuming that the two movement operations in Di can be reanalyzed as a single instance of Form Chain, this is not the case. That said, it is worth noting that Shortest Paths would indeed suffice to account for the ban on V-in situ in French that was explained by invoking Fewest Steps in Chomsky (1991). The derivation in (17) also instantiates yo-yo movement; the only essential difference from the derivation in (46) is that yo-yo movement is interrupted by spell-out in the former case, but not in the latter. 18. Two additional assumptions must be clarified. First, in line with what is probably the majority of literature on the topic, Nakamura (1998) postulates that whconstructions in Tagalog do not actually involve movement of the wA-phrase, but rather movement of an empty operator in a relative clause-like construction. That is, English questions like "What did Juan buy?" are rendered as "What is it
The Rise of Competition in Syntax
59
Opi that Juan bought t] ?" For expository purposes, we will ignore this complication in what follows, but the correct structure is still reflected in the translation. Second, note that actual positions of items that are overtly visible do not always reflect the position that is theoretically relevant in Nakamura's (1998) analysis. In particular, he assumes that the structural subject position SpecT is left-peripheral, and in many cases can only be filled at LF. Still, subject NPs behave in every respect as if they occupied the SpecT position overtly. This covert subject raising with overt effects is indicated here by italicizing the relevant subject NP; thus italicization is meant to imply that the italicized NP is pronounced in the position of its trace. Note that these complications do not arise in a language like Toba Batak, which otherwise exhibits the same general effect; see Schachter (1984) and Sternefeld (1995). 19. Note, however, that a residue of the Procrastinate condition still shows up in Chomsky's (1998:14) translocal principle that prefers Agree over Move. Also see the next section. 20. It is worth noting that the notion of optimality has systematically been used in minimalist syntax, apparently without recourse to optimality theory as developed by Prince & Smolensky (1993), and at a time when optimality-theoretic syntax papers did not yet exist. See, e.g., Chomsky (1993:4), and, for explicit uses of the notion, Collins (1994:46), Kitahara (1997:18), and Frampton & Gutman (1999:5). 21. See Müller (2000:chapter 4) for a slightly more realistic (albeit still simplified) example. 22. See Fanselow (1997), who argues that was2 can be scrambled to a position in front of wer ι before w/i-movement takes place in (61-b). However, to avoid a blocking of (61-a) by (61 -b) in a system with translocal constraints (which Fanselow does not assume), it would then also have to be ensured that the two derivations do not compete; this could be achieved by assuming that the presence vs. absence of the trigger for waj-scrambling creates two different reference sets. An alternative would be to assume that whereas translocal constraints cannot be parameterized, the definition of reference set can be. Without recourse to intermediate scrambling, reference sets might then be defined in German in such a way that (61-a) and (61 -b) do not compete, whereas they could be defined differently in English, so that the English counterparts of these derivations do compete. See Sternefeld (1997) for an extensive discussion of this option. 23. Recall, however, that Chomsky retains some translocal constraints even in more recent work, though often hesitantly and with a sense that if truly necessary, translocality would qualify as an "imperfection" of language. Thus, directly after suggesting the Shortest Paths account of the ban on the acyclic derivation of freezing effects with NP raising cited above, Chomsky (1995, 328) comes close to revoking it by stating: " - though the issue is nontrivial, in part because we are
60
Gereon Müller & Wolfgang Sternefeld
invoking here a 'global' [i.e., translocal] notion of economy of the sort we have sought to avoid." 24. Also see Hornstein (2000), who gives the same kind of account in a minimalist setting. 25. The standard way out of the problems created by optionality chosen by proponents of blocking syntax is to find subtle semantic differences between the relevant sentences - in other words, to deny true optionality. 26. The former strategy is discussed in Heck (1998:this volume); the latter strategy is pursued in Müller (1997). 27. We hasten to add that all the case studies in this section are simplified versions of the actual analyses proposed in the literature. In the present context, we are mainly interested in the logic of the argument, not in the specific (or maximally elegant) formulation of the constraints. Accordingly, we leave open the questions of defining candidates and candidate sets where they do not seem to be important for our present purposes. Note also that the simplification is particularly radical in Wilson's (1999) case. Based on evidence from binding theory, Wilson argues for an elaborate model of multiple optimization in syntax (see section 5.3.3 below); he is concerned with many more data and, eventually, typological universale that the naive analysis presented here cannot possibly account for. 28. The ranking of T-LEX-GOV and STAY could also be reversed; this ranking is not determined by the cases we are interested in here. 29. To avoid the issue of do-support in root clauses, which is orthogonal to the issue of complementizer-trace effects in embedded clauses, we have chosen examples here in which the SpecC[+U!/,] target position is in an embedded clause. 30. Note that a violation of T-LEX-GOV will automatically imply a violation of the more general STAY constraint. Hence, given that there is no other constraint on which Ci and C2 differ, it follows that Ci's constraint profile is better than that of C2 under ranking; i.e., Ci harmonically bounds C2 31. The relevant constraints are BAR3 (see below) and FILL in Legendre, Smolensky & W i l s o n ( 1 9 9 8 ) , a n d I S L A N D - C O N D a n d S I L E N T - T R A C E in P e s e t s k y ( 1 9 9 8 ) .
CNPC should be viewed as a placeholder for one or more general conditions that yield the described effects; RES is arguably part of a more general system of constraints on pronouns. Also see Hornstein (2000) resumptive pronouns and islands, including CNPC, in English. 32. Operator movement in relative clauses in English can be achieved by (something along the lines of) Grimshaw's (1997) ranking OP-SPEC »
STAY (plus STAY
» RES); this option is chosen by Legendre, Smolensky & Wilson (1998). In contrast, Pesetsky (1997) does not assume the movement operation in relative clauses to be subject to optimization; in his view, Gen does not generate the insitu version in English in the first place. 33. It seems that in order to achieve compatibility of this account of resumptive pronouns with the account of the lack of complementizer-trace effects with adjuncts
The Rise of Competition in Syntax
61
sketched in the preceding section, the ranking RES T-LEX-GOV would have to be assumed. That said, one will probably have to assume independent, highranked constraints that block resumptive pronouns in adjunct chains, anyway. 34. Is *PRON confined to personal (and possessive) pronouns, or does it also cover anaphoric pronouns? Under the first option, REF-ECON and *PRON might be the same constraint. Under the second option, we would in fact face what is known as a subhierarchy of constraints: A general constraint *PRON prohibits all kinds of pronouns, a more specific constraint *PERS-PRON (= REF-ECON) prohibits only personal pronouns, and an even more specific constraint *RES-PERS-PRON (= RES) prohibits only personal pronouns used as resumptives. 35. Ackema & Neeleman call this constraint STAY, but this may be somewhat unfortunate, given that MIN-CHAIN differs substantially from Grimshaw's (1997) STAY, in the same way that Shortest Paths differs from Fewest Steps. 36. Local constraint conjunction makes it possible to reintroduce the concept of cumulativity into optimality-theoretic syntax: Multiple violations of a given constraint Coni may not directly outweigh a single violation of a higher-ranked constraint Con2, but can do so indirectly by triggering a violation of an even higher-ranked constraint Coni'. 37. An interesting question is whether a translation of translocal constraints into local constraints is actually needed in optimality theory; in other words: Could not some of the violable and ranked constraints in the H-Eval part be translocal themselves, just like the basic optimality principle is? For instance, one could envisage a translocal SHORTEST PATHS that fulfills the same task as MIN-CHAIN or BAR' : SHORTEST PATHS selects the candidate C, with the shortest movement paths in a given candidate set, and this can be signalled by stipulating that all candidates except C, are assigned a star * under this constraint. Such an approach may raise additional complexity issues, and it has - to the best of our knowledge - not yet been proposed in optimality-theoretic syntax. Still, it seems to us to be viable in principle. Indeed, a translocal constraint of this type has been proposed for phonology in Prince & Smolensky (1993) (H-NUC, which, however, is eventually replaced there by a subhierarchy of local constraints that are derived by a process of harmonic alignment). 38. In both cases, only those aspects of the input are considered that matter for the faithfulness constraints under consideration. 39. Note that the distinction between the actual scope position for a wA-item (here designated by |+wh|) and the "intended" scope position for a w/z-item (here designated by Σ ) is fundamental in the analysis by Legendre, Smolensky & Wilson (1998), and not an artefact of the input-independent approach. 40. Assuming the concept of input, this constraint amounts to the statement that the input must not be left completely unrealized. 41. Here we exploit the fact that was is ambiguous between a wh-reading and an indefinite reading in (colloquial) German. This does not hold for other wA-phrases
62
Gereon Müller & Wolfgang Sternefeld
like welches Buch ('which book'), which, however, also cannot be extracted from adjunct islands. For these cases, the neutralization approach would have to be complicated in such a way that the candidate with the f-wh] NP (perhaps ein Buch ('a book')) deviated from the one that must be blocked not just in feature specification, but also in morphological shape. Such complications do not affect the general argument, though. 42. A side remark: The candidates in (79-b) and (79-d) that were discussed in the previous section also signal input neutralization; these candidates are also optimal in candidate sets where they do not violate the respective faithfulness constraint. 43. This account rests on the concept of input. Is it possible to maintain the analysis without reference to this notion? It is, but the task is slightly more difficult here than in the cases that were discussed in the last section. We have to ensure that an output candidate like (87), with a C[_ m /,| and a was[- w h\, has abstract [+wh] or | - w h l markers that encode the postulated input difference, and that can be referred to by an appropriately revised FAlTHfWH] constraint.
References Ackema, Peter — Ad Neeleman 1998 Optimal questions. Natural Language and Linguistic Theory 16: 443490. Archangeli, Diana — D. Terence Langendoen 1996 Afterword. In: Diana Archangeli & D. Terence Langendoen (eds. ), Optimality Theory: An Overview, 200-215. Oxford: Blackwell. Aronoff, Mark 1976 Word Formation in Generative Grammar. Cambridge, MA: MIT Press. Baker, Carl L. 1970 Notes on the description of English questions: The role of an abstract question morpheme. Foundations of Language 6: 197-219. Bakovic, Eric — Ed Keer 1999 Optionality and ineffability. Ms., Harvard University & UMass., Amherst. To appear in: Géraldine Legendre, Jane Grimshaw & Sten Vikner (eds. ), Optimality-Theoretic Syntax, Cambridge, MA: MIT Press. Blutner, Reinhard. 2000 Some aspects of optimality in natural language interpretation. Ms., Humboldt-Universität Berlin. Burzio, Luigi 1991 The morphological basis of anaphora. Journal of Linguistics 27: 81-105.
The Rise of Competition in Syntax Choi, Hye-Won 1999 Optimizing Structure in Context: Scrambling and Information Stanford: CSLI Publications.
63
Structure.
Chomsky, Noam 1973 Conditions on transformations. In: Stephen Anderson & Paul Kiparsky (eds.), A Festschrift for Morris Halle, 232-286. New York: Academic Press. Chomsky, Noam 1981 Lectures on Government and Binding. Dordrecht: Foris. Chomsky, Noam 1982 Some Concepts and Consequences of the Theory of Government Binding. Cambridge, MA: MIT Press.
and
Chomsky, Noam 1986a Barriers. Cambridge, MA: MIT Press. Chomsky, Noam 1986b Knowledge of Language. New York: Praeger. Chomsky, Noam 1991 Some notes on economy of derivation and representation. In: Robert Freidin (ed.), Principles and Parameters in Comparative Grammar, 417454. Cambridge, MA: MIT Press. Chomsky, Noam 1993 A minimalist program for linguistic theory. In: Kenneth Hale & Samuel Jay Keyser (eds.), The View from Building 20, 1-52. Cambridge, MA: MIT Press. Chomsky, Noam 1995 Categories and transformations. (Chapter 4). In: The Minimalist gram, 219-394. Cambridge, MA: MIT Press.
Pro-
Chomsky, Noam 1998 Minimalist inquiries. Ms., MIT, Cambridge, MA Chomsky, Noam — Howard Lasnik 1993 Principles and parameters theory. In: Joachim Jacobs, Arnim von Stechow, Wolfgang Sternefeld & Theo Vennemann (eds.), Syntax, vol. I, 506-569. Berlin: de Gruyter. Cole, Peter 1982 Subjacency and successive cyclicity: Evidence from Ancash Quechua. Journal of Linguistic Research 2: 35-58. Collins, Chris 1994 Economy of derivation and the generalized proper binding condition. Linguistic Inquiry 25: 45-61.
64
Gereon Müller & Wolfgang Sternefeld
Collins, Chris 1997 Local Economy. Cambridge, MA: MIT Press. Déprez, Viviane 1991 Economy and the that-t effect. In Proceedings of the Western Conference on Linguistics 4: 74-87. DiSciullo, Anna-Maria — Edwin Williams 1987 On the Definition of Word. Cambridge, MA: MIT Press. Epstein, Samuel David 1992 Derivational constraints on A'-chain formation. Linguistic Inquiry 23: 235-259. Fanselovv, Gisbert 1989 Konkurrenzphänomene in der Syntax. Linguistische Berichte 123: 385414. Fanselovv, Gisbert 1991 Minimale Syntax. Habilitation thesis, Universität Passau. Fanselovv, Gisbert 1997 The proper interpretation of the minimal link condition. Ms., Universität Potsdam. Fanselovv, Gisbert — Reinhold Kliegl — Matthias Schlesewsky 1999 Optimal parsing. Ms., Universität Potsdam. Fox, Danny 1995
Economy and scope. Natural Language Semantics 3:283-341.
Frampton, John — Sam Gutman 1999
Cyclic computation. Syntax 2: 1-27.
Grimshavv, Jane 1994
Heads and optimality. Handout, Universität Stuttgart.
Grimshavv, Jane 1997
Projection, heads, and optimality. Linguistic Inquiry 28: 373-422.
Grimshavv, Jane — Vieri Samek-Lodovici 1998 Optimal subjects and subject universals. In: Pilar Barbosa et al. (eds.), Is the Best Good Enough?, 193-219. Cambridge, MA: MIT Press & Haegeman, MITWPL. Liliane 1994
Introduction to Government and Binding Theory. Oxford: Blackwell.
Haider, Hubert 1983 Connectedness effects in German. Groninger Arbeiten zur chen Linguistik 23: 82-119.
Germanistis-
The Rise of Competition in Syntax
65
Heck, Fabian 1998 Relativer Quantorenskopus im Deutschen - Optimalitätstheorie und die Syntax der Logischen Form. M.A. thesis, Universität Tübingen. Heck, Fabian — Gereon Müller 2000 Repair-driven movement and the local optimization of derivations. Ms., Universität Stuttgart & IDS Mannheim. Short version in: Glow Newsletter 44: 26-27. Hendriks, Petra — Helen de Hoop 1999 Optimality theoretic semantics. Ms., University of Groningen. (Cognitive Science and Engineering Prepublications 98-3.) Hornstein, Norbert 2000 Is the binding theory necessary? Ms., University of Maryland. Jäger, Gerhard — Reinhard Blutner 2000 Against lexical decomposition in syntax. Ms., ZAS & HumboldtUniversität Berlin. Kiparsky, Paul 1982 From cyclic phonology to lexical phonology. In: Harry van der Hulst & Neil Smith (eds.), The Structure of Phonological Representations, vol 1, 131-175. Dordrecht: Foris. Kitahara, Hisatsugu 1993 Deducing 'superiority' effects from the shortest chain requirement. Harvard Working Papers in Linguistics 3: 109-119. Kitahara, Hisatsugu 1997 Elementary Operations and Optimal Derivations. Cambridge, MA: MIT Press. Koster, Jan 1987 Domains and Dynasties. Dordrecht: Foris. Lakoff, George 1971 On generative semantics. In: Danny Steinberg & Leon Jakobovits (eds.), Semantics, 232-296. Cambridge: Cambridge University Press. Lasnik, Howard — Mamoru Saito 1992 Move a. Cambridge, MA: MIT Press. Legendre, Géraldine — Paul Smolensky — Colin Wilson 1998 When is less more? Faithfulness and minimal links in wh-chains. In: Pilar Barbosa et al. (eds.), Is the Best Good Enough?, 249-289. Cambridge, MA: MIT Press & MITWPL. Lenerz, Jürgen 1977 Zur Abfolge nominaler Satzglieder im Deutschen. Tübingen: Stauffenburg.
66
Gereon Müller & Wolfgang Sternefeld
Manzini, Rita 1983 On control and control theory. Linguistic Inquiry 14: 421-446. Marantz, Alec 1995 The minimalist program. In: Gert Webelhuth (ed.) , Government and Binding Theory and the Minimalist Program, 351-382. Oxford: Blackwell. May, Robert 1979 Must COMP-to-COMP movement be stipulated? Linguistic Inquiry 10: 719-725. McCarthy, John — Alan Prince 1995 Faithfulness and reduplicative identity. In: Jill Beckman, Laura WalshDickie & Suzanne Urbanczyk (eds.), Papers in Optimality Theory, 249384. Amherst, MA: UMass Occasional Papers in Linguistics 18. Müller, Gereon 1997 Partial vvh-movement and optimality theory. The Linguistic Review 14: 249-306. Müller, Gereon 2000 Elemente der optimalitätstheoretischen
Syntax. Tübingen: Stauffenburg.
Müller, Gereon — Wolfgang Sternefeld 1996 Α-bar chain formation and economy of derivation. Linguistic Inquiry 27: 480-511. Nakamura, Masanori 1998 Reference set, minimal link condition, and parameterization. In: Pilar Barbosa et al. (eds.), Is the Best Good Enough?, 291-313. Cambridge, MA: MIT Press & MITWPL. Pafel, Jürgen 1998 Skopus und logische Struktur — Studien zum Quantorenskopus im Deutschen. Habilitationsschrift, Universität Tübingen. Pesetsky, David 1997 Optimality theory and syntax: Movement and pronunciation. In: Diana Archangeli & D. Terence Langendoen (eds.), Optimality Theory. An Overview, 134-170. Oxford: Blackwell. Pesetsky, David 1998 Some optimality principles of sentence pronunciation. In: Pilar Barbosa et al. (eds.), Is the Best Good Enough?, 337-383. Cambridge, MA: MIT Press & MITWPL. Pollock, Jean-Yves 1989 Verb movement, universal grammar, and the structure of IP. Linguistic Inquiry 30: 365-424.
The Rise of Competition in Syntax
67
Prince, Alan — Paul Smolensky 1993 Optimality Theory: Constraint Interaction in Generative Grammar. Ms., Rutgers University. To appear: Cambridge, MA: MIT Press. Reinhart, Tanya 1983 Anaphora and Semantic Interpretation. London: Croom Helm. Richards, Norvin 1997 Competition and disjoint reference. Linguistic Inquiry 28: 178-187. Schachter, Paul 1984 Studies in the Structure of Toba Batak. UCLA Occasional Papers in Linguistics 5. Schmid, Tanja 1998 West germanic "Infinitivus Pro Participio" (IPP) constructions in optimality theory. In: Tina Cambier-Langeveld, Anikó Lipták, Michael Redford & Erik Jan van der Torre (ed.) , Proceedings of Console VII, 229244. Leiden: SOLE. Shlonsky, Ur 1992 Resumptive pronouns as a last resort. Linguistic Inquiry 23: 443-468. Speas, Margaret 1995 Generalized control and null objects in optimality theory. In: Jill Beckman, Laura Walsh-Dickie & Suzanne Urbanczyk (eds.) , Papers in Optimality Theory, 637-653. Amherst, MA: UMass Occasional Papers in Linguistics 18. Sternefeld, Wolfgang 1991 Chain formation, reanalysis, and the economy of levels. In: Hubert Haider & Klaus Netter (eds.), Representation and Derivation in the Theory of Grammar, 71-137. Dordrecht: Kluwer. Sternefeld, Wolfgang 1995 Voice phrases and their specifiers. FAS Papers in Linguistics 3: 48-85. Sternefeld, Wolfgang 1997 Comparing reference sets. In: Chris Wilder, Hans-Martin Gärtner & Manfred Biervvisch (eds.) , Economy in Linguistic Theory, 81-114. Berlin: Akademieverlag. Vikner, Sten. 2000 Checking strong verbal inflection in optimality theory. Ms., Universität Stuttgart. Williams, Edwin 1986 A reassignment of the functions of LF. Linguistic Inquiry 17: 265-299. Williams, Edwin 1997 Blocking and anaphora. Linguistic Inquiry 28: 577-628.
68
Gereon Müller & Wolfgang Sternefeld
Wilson, Colin 1999 Bidirectional optimization and the theory of anaphora. Ms., Johns Hopkins University. To appear in: Géraldine Legendre, Jane Grimshaw & Sten Vikner (eds.) Optimality Theoretic Syntax, Cambridge, MA: MIT Press.
Let's Phrase It! Focus, Word Order, and Prosodie Phrasing in German Double Object Constructions Daniel Biiring
This paper presents a case study in the interaction of word order, prosody and focus. The construction under consideration is the double object construction in German. The analysis proposed is in line with the following more general hypotheses: First, focus and word order do not interact directly. There are no grammatical rules that relate focus to specific phrase structural positions. Rather, focus interacts with prosodie phrasing, which in turn may interact with word order. Second, the kind of word order variation under investigation here is governed by two potentially conflicting types of constraints: morphosyntactic constraints that express ordering preferences relating to case, definiteness and possibly other categories, and prosodie constraints that define what a prosodie structure should look like. If these constraint families call for incompatible demands, languages may allow only the morphosyntactically perfect structure, or only the prosodically perfect structure, or, as is arguably the case in German, both. Third, violable ranked constraints provide a well-suited framework to account for these kinds of phenomena. Both the morphosyntactic and the prosodie constraints, as well as those governing the relation between prosody and focus, are implemented as markedness constraints. Their relative (non-)ranking accounts for the variation observed within a language and cross-linguistically.
1 Introduction German, like many of its Germanic cousins, is a verb-second language. What sets it, along with Dutch, apart from the other Germanic verb-second languages is what Bech (1955/57) calls its Klammerstruktur (lit.: 'bracket structure')· All non-finite verb forms appear at the very end of the clause, so that
70
Daniel Biiring
the finite verb in second position and the non-finite ones in final position together form a sort of bracket around the main body of the clause. initial finite verb .. position
(1)
... . . . . Mittelfeld
_ . , . non-finite verb forms
As indicated, this main body of the clause, as delimited by the finite verb to its left and the non-finite ones to its right, is traditionally called the Mittelfeld ('middle field')· In embedded clauses, the initial position usually remains empty and the finite verb is found at the end, too. In its place the subordinating complementizer constitutes the left bracket of the Mittelfeld. ι »· complementizer
(2)
is υ Mittelfeld
non-finite verb forms
finite verb
The Mittelfeld contains all non-clausal complements of the verb, some nonfinite clausal ones, and most adverbials (almost any of these can alternatively occupy the initial position in declarative main clauses, a fact we can ignore here). The relative order among the elements in the Mittelfeld is basically free. In particular, German, unlike Dutch, allows reordering among the nominal arguments quite freely. Subject and object as well as the two objects in a ditransitive construction can be found in various orders. The following examples of embedded clauses from Müller (1998) (his (31) and (36)) illustrate this: (3)
nominative-accusative-order a. b.
(4)
... dass eine Frau den Fritz that a woman the-ACC Fritz ... dass den Fritz eine Frau that the-ACC Fritz a woman '... that a woman kissed Fritz.'
geküsst kissed geküsst kissed
hat. has hat. has
dative-accusative-order a. b.
... dass man das Buch dem Fritz geschickt that one the book the-DAT Fritz sent ... dass man dem Fritz das Buch geschickt that one the-DAT Fritz the book sent '...that someone sent Fritz the book.'
hat. has hat. has
All arguments are nominal. Overt case marking for nominative, dative and accusative is found on articles. As one might suspect, (4) allows even more
Double Object Constructions
71
different orderings involving subject-object-reordering, which we did not list here. It has long been observed that various factors determine the acceptability of a given word order in a particular case, among them case, definiteness, animacy, and focus (cf. Lenerz 1977, Uszkoreit 1987, Müller 1998, among others). In the present study we will concentrate on the particular role that focus plays in relation to case (which we take as representative of the other morphosyntactic constraints). We also limit our discussion to the relative ordering of accusative and dative objects in double object constructions.
2 Focus and Word Order: A Summary of the Proposal In his seminal study on German word order, Lenerz (1977) found that there are two main semantic/pragmatic factors that co-determine object ordering in German double object constructions: definiteness and focus. Simplifying slightly, the generalizations in (5) hold: (5) a. b.
Definite NPs precede indefinite NPs. Non-focused NPs precede focused NPs.
An equally important finding of that study was that there is one purely morphosyntactic factor involved, too: 1 (6)
Dative NPs precede accusative NPs.
As Lenerz observed, these three conditions interact in a complex and interesting fashion: Either one or both of (5) can be violated, as long as (6) is met; and (6) can be violated only if both conditions in (5) are met. Put differently, if the dative object precedes the accusative object (henceforth DatO>AccO order), any distribution of focus and (in)definiteness between the objects is possible; but the accusative object can precede the dative object (henceforth A c c O > D a t O order) only if DatO is in focus and AccO is definite. Lenerz (1977) concluded from this that DatO>AccO is the "unmarked" word order, and that deviance from it is only justified in compliance with the conditions in (5). The focus-case interaction is demonstrated in (7) and (8) (Lenerz' (2) and (3), p. 43). To control for focus, a context-question as in (7) and (8) is provided; the focus in the answer can then be identified as the constituent that corresponds to the W/I-phrase in the question ([...'IF brackets indicate focus, capitals represent pitch accents).
72
Daniel Biiring
The DatO > AccO order in (a) is fine in both cases, whereas the AccO > DatO order in (b) is only acceptable if DatO is in focus (or, as we shall sometimes say, F-marked). (7) a.
b.
(8)
Wem hast du das Geld gegeben? 'Who did you give the money to?' | +def. DatO] F > [+def. AccO] Ich habe Idem KasSIErer]/r das Geld I have the teller the money | +def. AccOl > [+def. DatO] F Ich habe das Geld [dem KasSIErer]/r I have the money the teller Ί gave the money to the teller.'
gegeben. given gegeben. given
Was hast du dem Kassierer gegeben? 'What did you give to the teller?' a.
| +def. DatO] > [+def. AccO] F Ich habe dem Kassierer [das GELDJ/r I have the teller the money b. r+def. AccO]/? > [+def. DatO] ?*Ich habe [das GELD]/r dem Kassierer I have the money the teller Ί gave the teller the money.'
gegeben. given gegeben. given
The definiteness-case interaction is illustrated in (9) and (10) (Lenerz' (18) and (20) on p. 52f.). DatO > AccO order is possible with an indefinite preceding a definite, contra (5-a), as in (9). (9)
Was hast du einem Schüler geschenkt? 'What did you give to a student?' [ - d e f . DatO] > [+def. AccO| F Ich habe einem Schüler [das BUCH]/r geschenkt. I have a-DAT student the book given Ί gave a student the book.'
But AccO > DatO order is unacceptable if AccO is indefinite; cf. (10) (note that in both examples the focus follows the non-focus, in accordance with (5-b)): (10)
Wem hast du ein Buch geschenkt? 'Who did you give a book?'
Double Object Constructions
73
[ - d e f . AccO] > |+def. DatO|/r *Ich habe ein Buch [dem SCHÜler|f geschenkt. I have a book the student given Ί gave a book to the student.' In an unpublished paper (Biiring 1996) I proposed reinterpreting Lenerz' findings in the following terms: DatO > AccO is the base generated VPinternal order of objects in German; AccO > DatO is the result of a syntactic movement operation called scrambling, which adjoins AccO to the VP (this follows a common line of syntactic analysis for German; cf. Webelhuth 1989, Müller 1991, Vikner 1991). There are two constraints on scrambling, which can be phrased as in (11): ( 11 ) a. b.
Don't scramble a focused NP! Don't scramble an indefinite NP!
To derive these I proposed utilizing two constraints along the lines of (12) and (13): (12)
FINALFOCUS (FF)
Focus should be sentence final. (13)
IND(EFINITES)
Indefinites must be properly contained in VP (if they are to receive an existential reading). Both these constraints have been proposed in the literature and can be seen to be independently motivated. I will return to this issue below. They interact with a general syntactic faithfulness constraint that penalizes movement, including scrambling, which we will call STAY (cf. Grimshaw 1997). Optiona l l y of movement results where the base order violates FF and the derived order violates STAY but respects INDEFINITES and FF. Movement is prohibited where the base order fulfills both STAY and FF; it is also prohibited if the derived order violates INDEFINITES.2 In order to discuss the workings of this system I will implement it in the form of an optimality grammar, as proposed in Choi (1996) and, independently, in Biiring (1997a) (it is the latter proposal I am going to discuss here, although Choi's analysis uses essentially the same constraint tie, her CN2 - dative precedes accusative, and NEW - roughly: a non-focused argument precedes a focused one, to derive focus-related word order variation; since I will propose a fundamental reanalysis later in this paper, I will not attempt a
74
Daniel Biiring
comparison of the two accounts here). To achieve the desired results, I N D E F INITES must be undominated, while F I N A L F O C U S and STAY are tied. The proposed ranking is thus the one in (14): (14)
INDEFINITES »
STAY
FINALFOCUS
The AccOF candidate fulfills both STAY and FF (and INDEFINITES), while the scrambled candidate AccO/r>DatO violates them both. That is, under either resolution of the tie the in situ version is optimal. Movement is blocked: i: a. usb.
1 vp DatO | V ' dAccO F VI] [ vp DatO [ v / dAccO/R V]] [VP dAccOf [ V p DatO | V ' tAccO VI]]
IND
STAY
FF
*
*
As I already noted in that earlier work, this system also derives a case not considered in Lenerz (1977), but observed in Eckardt (1996): If both objects are focused, scrambling is excluded, regardless of (in)definiteness. In terms of the system proposed: STAY must not be violated if no improvement in terms of FF results: (18)
i: a. usb.
[VP DatO/r f V ' dAccO F VI] [vp DatO/r [ v / dAccO F V]] [ V p dAccO/r [VP DatO F [ v / t A c c o V]]]
IND
STAY
FF
It was finally observed that scrambling of indefinites is possible if these are not to receive an existential interpretation (cf. the exact wording of the constraint in (13)). 3 To illustrate this, let us compare two sentences with focus on DatO and an indefinite AccO. In the first version, the indefinite AccO is meant to be existential (indicated as iAccOg): i:
[VP D a t O / r | v / i A c c O g V ] ]
a. ι®·
[VP D a t O / r [ v / i A c c O g V ] ]
b.
[ V p iAccOa [ V p DatO/r [ v / tAccO V[]|
IND
STAY
FF *
*!
*
The scrambled structure is blocked for its violation of INDEFINITES: Since the indefinite is existential, it ought to stay within VP. Turning now to the second version, we observe that scrambling the indefinite AccO becomes an option if the indefinite is supposed to be interpreted as a generic NP (indicated by subscript Gen):
76
Daniel Biiring i:
[ vp DatO/R [ V ' iAccO G t 7 1 V]]
IND
STAY
b. κ ? 1 VP iAccO G e „ fvp D a t O F f V ' TACCO V]]]
FF *
a. us-1 v p D a t O / r [ V ' ÍACCOG«, V I ] *
The INDEFINITES constraint does not apply here since A c c O is generic. Accordingly, (20-a) is optimal if the tie is resolved to STAY > FF, while (20-b) is optimal if it is resolved to FF » STAY. Movement of the indefinite across the definite is thus (optionally) possible. 4 This quick overview illustrates all the relevant aspects of the system as proposed in Biiring (1996) in its application to German double object constructions. Empirically successful though it is, many questions remain open. Some of them regard the nature of the constraints. Why should they hold in the way they do? Others regard the technical set-up of the system. What advantages does it have to specify focus patterns (rather than, say, contexts, accent patterns, or nothing at all) in the input? Regarding the first set of questions, the INDEFINITES constraint in (13) is a fairly direct adaptation of the seminal proposals in de Hoop (1992) and Diesing (1992). If the position taken in these works is basically correct, positional preferences of indefinites can be explained in terms of the way syntax is mapped onto semantics. The effects of FINALFOCUS can and should, I believe, be derived f r o m the way syntax is mapped onto prosody, utilizing ideas found in Truckenbrodt (1995, 1999) and Biiring (1997a). It is this latter aspect that the present paper is mainly concerned with. T h e system I will present below shares many essential properties with the one sketched in this section and preserves its basic tenets: Object ordering in German is determined by morphosyntactic and focus-related constraints, F-marking is specified in the input, and optional reordering is derived by a constraint tie in the very way illustrated above. I will not, however, continue to use the particular constraints STAY, FIN A L F O C U S a n d INDEFINITES. In s e c t i o n 3 I p r o p o s e r e p l a c i n g F l N A L F o -
CUS by a group of constraints relating focus, prosody and syntax. Their net effect will be similar to that observed with FINALFOCUS above; in contradistinction to this single constraint, however, their empirical coverage is much broader, and they are compatible with and well motivated by current work in prosodie phonology. In section 4 I will then introduce a constraint DAT, which takes over the work of STAY. The effects of DAT will be the same as those of STAY; it is chosen merely to avoid commitment to a derivational syntactic framework. The issue of (in)definiteness and its influence on object order will be ignored in what follows, along with the constraint introduced to
Double Object Constructions
11
handle it; reintegration of it within the analysis developed below will have to await a later occasion (Büring in prep.)· Regarding the second set of questions, the issue to be addressed here regards the specification of the input. The system sketched above and elaborated in what follows crucially specifies F-marking (and the different readings of indefinites) in the input, but not, e.g., accenting or prosodie phrasing. This choice could be made differently. I don't think that the present paper presents conclusive evidence in favor of the set-up chosen here. Its purpose is to show that such a system can be devised, and explore what properties it will have, facilitating further discussion. I will touch upon some of the issues involved after the main exposition in section 5 below.
3
Deconstructing F I N A L F O C U S
This section explores the rationale behind a constraint like FINALFOCUS, and proposes replacing it with more precise and natural constraints on prosodie phrasing. Likewise, we will no longer assume the constraints INDEFINITES and STAY (which will be replaced by a less committing constraint called DAT(IVE) in section 4 below).
3.1 Phrasing, Stress, and Accent Let me start by clarifying some of the assumptions about the relation between context, focus and accent I am making. I follow Selkirk (1984, 1995), Rochemont (1986), and many others in assuming an overall picture as in (21).
(21) Context (specified by, e.g., a question)
Syntactic Structure with F-marking
Prosodie Structure with stress and pitch accents
The context determines which constituents in the syntactic structure need to be F-marked. I will adopt the most straightforward characterization of this relation, as proposed in Schwarzschild (1999): Any constituent which is not contextually Given (or c(ontext)-construable in Rochemont's terms) needs to be F-marked. Usually this will be the constituent that corresponds to the w/z-phrase in a context-question (see Selkirk 1995 and Schwarzschild 1999 for enlightening discussion), plus all or most of its sub-constituents. In this
78
Daniel Biiring
paper I will have nothing more to say on the Context-F-marking relation; my subject will be the correspondence between the two boxes on the right in (21), focus realization. In English, focus is signalled by pitch accents, i.e., movements of the fundamental frequency of the speaker's voice, centered around prominent syllables. But, as is well known, not every terminal element that bears F must receive a pitch accent. Selkirk (1984), like many following her, assumes two additional steps on the way from the context-determined F-marking to the actual prosodie structure: First, a set of conditions on the possible F-patterns within the syntactic tree, usually cast in terms of focus projection rules·, second, a correspondence condition between F-marked terminals and pitch accents, e.g., the basic focus rule of Selkirk (1984: 207). In what follows I want to explore a different line: I will derive the relevant effects of the focus projection rules in terms of prosodie principles (in keeping with a larger project aimed at eliminating focus projection rules altogether; cf. Biiring 1997a, Drubig 1994, Schwarzschild 1999), and I will follow Truckenbrodt (1995, 1999) in assuming that there is no rule that directly relates F-marking to pitch accents, but that the focus-accent relation is mediated through prosodie phrasing. My assumptions about prosodie phrasing are fairly standard: Lexical heads, sometimes together with lighter material accompanying them, form prosodie words (PWds). Prosodie words are grouped into intermediate prosodie categories which I will call accent domains, ADs (a term used by Uhmann 1991, similar to Gussenhoven's 1984focus domains, Pierrehumbert & Hirschberg's 1990 intermediate phrases, and Truckenbrodt's 1999 phonological phrases), which in turn are grouped into intonational phrases (iPs). Following Selkirk (1984) and many others, I assume that prosodie phrasing is exhaustive, strictly layered, and non-recursive.5 Each such prosodie category has a unique head. The head is the most prominent element of the category. For example, the syllables sie, geld and ge (marked by capitals) in (22) - which is the end of sentence (8-a) above are the heads of their respective prosodie words; they receive a grid mark at the word-level, and hence are more prominent than all the other syllables. (22)
... χ ( χ )ad ( χ ( X )pWd ( X )pWd dem KasSIErer das GELD the-DAT teller the money
( X geGEben give
)ip )ad )pwd
Double Object Constructions 79 The prosodie words (dem Kassierer and das Geld) are the heads of their respective accent domains. Accordingly they are more prominent than the prosodie word (gegeben), which means that their most prominent syllables receive AD-level stress. Finally, the AD (das Geld gegeben) is the head of the iP that wraps the entire sentence (the dots indicate that the iP extends further to the left) and thus receives a grid mark at the iP-level (note that this is different from the notation used in Halle & Vergnaud 1987, where heads are indicated by grid marks on the next higher level). As noted, the grid marks represent stress, where higher columns represent a higher degree of stress. Finally, stressed syllables are associated with pitch accents (I will not be concerned with the choice of pitch accent here, see Pierrehumbert & Hirschberg 1990 for general discussion, and Biiring 1997b on German). For our purposes it is sufficient to state that each sentence contains at least one pitch accent, and that if a syllable gets a pitch accent associated with it, every other syllable with the same or higher degree of stress must get a pitch accent, too; the range of the pitch movement (the perceived "intensity" of the accent) is positively correlated with the level of stress on the syllable the accent is associated with (cf. Pierrehumbert 1980). The result will be that the head of iP always bears a pitch accent. A common pattern in German is that all AD-heads have a pitch accent, too (cf., e.g., Uhmann 1991). We stipulate that syllables with only PWd-level stress never bear pitch accents. The convention we will use where no prosodie trees are given is the following: AD-heads are marked by capitalization of the pertinent syllable, the iPhead by capitalization plus underlining of the word; Pwd-heads aren't marked at all. (22) can thus be abbreviated as in (23): (23)
dem KasSIErer das GELD gegeben.
Given what we said above, Geld must bear a pitch accent here, while KasSIErer may (along with every other AD-head that may precede it); the V gegeben cannot. The pitch accent on Geld will be the most prominent one (the nuclear stress).6 Let us start by elaborating on the notion of accent domain (I will ignore the issue of prosodie word formation, because nothing hinges on it in the present context). An AD has an "ideal size", which is described by the constraints in (24). Since its two parts, PRED and XP, don't conflict in the examples I discuss in this paper, I will treat them as one constraint, ADF, in the tableaux that follow:
80
Daniel Biiring
(24)
A D F (ACCENTDOMAINFORMATION): a.
PRED: A predicate shares its A D with at least one of its arguments.
b.
XP: A D contains an XP. If X P and Y P are within the same A D , one contains the other (where X and Y are lexical categories).
A s stated quite explicitly in (24-b), the ideal aimed at is to map lexically headed X P s onto A D s . T w o special cases arise: If one such X P contains another, the dominating one will be mapped onto an A D . A n d if an argument X P is adjacent to its predicate, the predicate will be integrated into the X P ' s A D (borrowing Jacobs' 1992 apt term). For example, the N P and its selecting predicate will be mapped onto one A D in the German (25-a) (the same results obtain mutatis mutandis f o r English V O structures). This candidate doesn't violate XP, because even though it contains two lexically headed maximal projections, N P and VP, the latter contains the former. 7 It doesn't violate PRED either because the predicate - i.e., the verb - gets to share the A D with its N P argument. (25)
das Geld
geben
the money give ( ' g i v e the m o n e y ' )
(
)AD
a. cs= ( das Geld b. c.
( ( das (( ( das
Geld Geld
)( )( )( ) )(
/
geben )pWd )AD
* (PRED: V is alone in its A D )
geben )pWd )AD
* (illegal recursion of A D )
geben )pWd
A s said above, every A D has one head, which is the most prominent element within it, indicated by a grid mark within the A D . If an A D consists of more than one prosodie word, as in (25-a), the head will be determined by (26): (26)
A/P (ARGUMENT-OVER-PREDICATE): Within A D , an argument is more prominent than a predicate.
(27)
( a.
χ
)AD violates A/P
( das Geld )( geben )pwd (
χ
)AD satisfies A/P
b. ι®· ( das Geld )( geben )pwd
Double Object Constructions
81
The effect of (26) is demonstrated in (27) (note that here and henceforth I don't indicate prosodie words in a separate line; hence, PWd-heads are no longer marked with a grid mark at all).
3.2 Simple Foci A D F and A/P in tandem govern phrasing and prominence, all other things being equal. How does focus enter the picture? I submit that one simple constraint, (28), is all that is needed: 8 (28)
FP(FOCUSPROMINENCE)
Focus is most prominent. Importantly, (28) inspects the prosodie structure of the sentence; for example, if an A D contains two prosodie words, only one of which contains an Fmarked node, FP will demand that PWd become the head of AD; likewise for higher prosodie categories (to which I will turn below). In German, FP is crucially ranked above A/P; for reasons that will become clear later, A D F must be ranked in-between them. To understand the workings of FP, let us consider AD-formation in cases in which exactly one immediate constituent of the clause bears an F-feature (that constituent may in turn contain more F-features, which I will ignore here); I will refer to these as simple or narrow focus cases. What we observe here is that among the elements within VP, only the constituent in focus has A D level stress (and, for reasons to be discussed in a moment, the main pitch accent). (Since narrow V-foci are hard to elicit by a w/z-question, I will use contrasting contexts, which, in accordance with Schwarzschild (1999), I assume work the same in all relevant respects.) (29) a.
b.
(Was hast du dem Kassierer gegeben? Ich habe dem Kassierer) das GELD/r gegeben. 'What did you give the teller? I gave the teller |the money|/r.' (Ich habe nicht gesagt, du sollst dem Kassierer das Geld beschreiben, sondern du sollst dem Kassierer) das Geld GEben/r. Ί didn't say you should describe the money to the teller, but you should |givej/r the money to the teller.'
The pitch accents on Geld and geben, respectively, tell us that these must be AD-heads, while their respective sisters are not (otherwise they could at least bear a secondary - j c e n t , which I would have indicated by capitals). Let us
82
Daniel Biiring
see how this follows from our assumptions: If the AccO alone is F-marked (case (29-a)), FP will require that the prosodie word containing it become the head of the AD. This is the case in (30-a) and (30-c), but not in (30-b), which is therefore blocked ((30-b) violates A/P on top of this, which is irrelevant here). Between (30-a) and (30-c), ADF prefers the former, due to the reason already seen in (25-b): The V alone is not a good A D , violating A D F / P R E D . i: 1 VP INP das Geld]/r geben] FP OS>
(
a.
(das Geld/r)(geben)pwd
Χ
(
b. c.
ADF
A/P
)AD
Χ
)AD
*
*!
(das Geld/r)(geben)pwd ( Χ Χ Χ )AD (das Geldf )(geben)pwd
*!
If V alone is F-marked (case (29-b)), it has to be head of AD, which blocks (31-a). Both (31-b) and (31-c) achieve this goal, though by different means. The former shifts the accent within an otherwise perfect AD, violating A/P; the latter makes V an AD of its own, which then trivially satisfies FP. i: 1 vp INP das Geld] geben H FP a. ISR
b.
( χ )AD (das Geldf)(geben)pwd ( χ )AD (das Geld/r)(geben)pwd Χ Χ Χ )AD (das Geld/r)(geben)pwd
(
c.
ADF
A/P
*! *
*!
Note that A/P is not violated in (31 -c), since there is no AD containing a predicate and its argument. Yet (31-c) is blocked by (31-b) since ADF dominates A/P. Let us now include the intonational phrase, iP, in the picture. Since we are only concerned with VP-internal focus here, it suffices to assume that every sentence (or inflection phrase) is mapped onto one iP. Intonational phrases are strictly right-headed in German: The head will be that AD which is aligned with iP's right edge. We implement this by assuming (32) as an undominated constraint (cf. McCarthy & Prince 1993). (32)
ÌP-HEAD-RIGHT:
ALIGN(iP, right, head(iP), right)
Double Object Constructions
83
Main prominence within iP will therefore be on the most prominent element of its final A D (this is essentially equivalent to Selkirk's 1984 final strengthening). Since at least the most prominent syllable of a sentence must be associated with an accent (see above), the iP-head will be perceived as the main accent or "nuclear stress" of the sentence; on casual inspection it might even be perceived as the only prominence, even though that constitutes an oversimplification in most cases. Since (32) is not violated in any of the examples I discuss in this paper I won't include it in the tableaux and will indicate violations of it separately, where necessary. The full representations of (30-a) and (31-b) are then (33-a) and (33-b), respectively (the dots again indicate that iP extends to the left): (33) a. (das b.
( (das
χ
)¡p
Χ
)AD
Geld/r)(geben)pwd Χ ).p Χ
)AD
Geld)(geben/r-) PWd
Notice that, also at the iP-level, the structures in (33) respect FP: The AD containing the focus ends up being the head of iP. Note in particular that although there is plenty of material following the most prominent syllable Geld within the iP in (33-a), there is no AD following the AD containing Geld. Therefore Ì P - H E A D - R I G H T in ( 3 2 ) is respected here: The rightmost daughter AD is the head of iP.9 To conclude the simple focus cases, what if the VP-initial dative argument is the sole focus? As already discussed in section 2 above, the nuclear accent then falls on the DatO; moreover, no secondary accents on either AccO or V are allowed. (34)
(Wem hast du das Geld gegeben? Ich habe) dem KasSIEreiy das Geld gegeben. 'Who did you give the money? I gave [the teller|/r the money.'
This is predicted: The dominant constraint FP will force the head of A D and iP to be on the focused DatO. As Truckenbrodt (1995: ch. 5) was the first to point out, this, together with Ì P - H E A D - R I G H T , excludes the presence of any AD following the one containing the focus. Consider (35-a) and (35-b); in both, the final A D (das Geld geben) becomes the head of the iP, consonant with the right-headedness of iP, (32). But this violates FP at the iP level, since
84
Daniel Biiring
the AD (dem Kassierer ρ) is not most prominent in iP. Since FP dominates ADF, (35-a) and (35-b) are blocked by (35-d). Alternatively consider (35-c). Here the AD containing the focus, dem Kassiererρ, becomes the head of iP, satisfying FP, but violating iP-HEADRLGHT, which again dominates ADF. i: 1 vp Idem Kassiererl/r (np das Geld] geben] Χ
a.
)IP
FP ADF
A/P
*!
( χ )( χ )AD (dem Kassierer/r)(das Geld)(geben)pwd X
)IP
*
*!
Χ Χ Χ )AD (dem Kassierer/0(das Geld)(geben)pwd
(
b.
X
c.
d.
iP is not right-headed
( χ )( χ )AD (dem Kassierer ρ )(das Geld)(geben)pwd X
IS-
)IP
)IP
*
( χ )AD (dem Kassierer/r )(das Geld)(geben)pwd
The only way to have a non-final argument be most prominent is thus to make it the head of the final AD. In other words, all subsequent AD boundaries are "deleted", or put more accurately: No more ADs are formed. This blatantly violates ADF, in particular its XP sub-constraint (cf. (24-b)): DatO and AccO are both lexically headed XPs, and neither one contains the other. But since ADF is dominated by both FOCUSPROMINENCE and i P - H E A D RLGHT, these violations are unavoidable. Consequently the post-focal stretch of the sentence gets totally de-structured. There can be no AD-level stresses and, accordingly, no pitch accents. This effect has been observed in various languages; cf. again the discussion in Truckenbrodt (1995: ch. 5). 3.3 Complex Foci We now turn to cases in which more than one immediate constituent of the sentence is in focus. (36) is an example of this sort (quite possibly there is another F-mark on the VP here and in (38) below; I'll address this issue in the next sub-section). (36)
(Wie soll die NSF dabei helfen? - Sie soll) das GELD/r geben F . 'How can the NSF help? They should given the moneyF-
Double Object Constructions
85
The final V, though F-marked itself, does not have AD-level stress and cannot bear a pitch accent. That means that object and verb continue to form an AD; (37) shows how this is accounted for: i: [VP I NP das G e l d | f geben/r| FP X (
a.
X
)( X
)¡p
A/P
*!
(das Geldf Xgeben F ) pwa χ \
)iP )AD
(
b.
(das Geld/r)(geben/r)pwd (
X
)¡p
Χ
)AD
*
*
*!
(das Geld/r)(geben/r)pwd X
d.
ADF
)AD
M·
c.
*
)ip
iP is not right-headed
( Χ Κ Χ )AD (das Geld/r)(geben/r)pwd
The winner in this case has exactly the same prosodie structure as in the object-only-F case in (30). The tableaux look considerably different though. In particular, notice that even the winner in (37) has one violation of FP. This is unavoidable if a sentence has two F-marked constituents, given that every phrase has only one head: At some level, one of the F-constituents must become the non-head. The most instructive candidates to compare are (37-a) and (37-b). In (37a), V is the head of its own AD, and the A D with the AccO is subordinated at the level of iP, inducing an FP violation. In (37-b), V is subordinated, but already at the A D level. It incurs a violation of FP, too, this time because the PWd containg geben ρ is not the head of the A D containing it. Note that since there is only one A D (which then is the head of iP), no further violations of FP occur. The choice pro (37-b) is made by the lower constraint ADF, which prefers the "integrated" structure (37-b) over the "split" one in (37a): A perfect A D cannot consist of just a predicate as in (37-a) (the fact that the prominence within the A D is on the object rather than on the verb, as in (37-c), is then regulated by A/P). What happens if the verb and two arguments are F-marked? In this case we get the nuclear accent on the rightmost argument and a secondary accent, or at least AD-level stress, on the VP-initial one. (38)
(Was hast du gemacht? - Ich habe) dem KasSIErer/r das G E L D F
86
Daniel Biiring gegebenf. 'What did you do? - 1 gave/r the teller ρ the money p.'
Turning to the tableau, there will inevitably be two violations of FP (since there are three prosodie words - two arguments and the predicate - only one of which will be prominent at all levels). But this time, since the ideal AD contains at most one of the arguments (cf. (24-b) above), each argument will get its own AD, and they will only be "merged" at the level of the iP. The predicate will integrate with the closest argument for reasons of ADF, as seen before. i: [vpfdem Kassierer] F [ NP das Geld]/· geben/r] FP ** X )iP ( Χ )AD a. (dem Kassierer jr)(das Geld f )(gebenρ)pwd ** X )iP ( χ )ad b. (dem Kassierer/r)(das Geld/r)(geben/r)pwd ** X )ip IGF ( X )( Χ X )AD c. (dem Kassierer ρ )(das Geldf)(geben/r)py/d ** X )iP ( χ Χ χ )AD d. (dem Kassierer f )(das Geld/r)(gebení-)pwd ** X )¡P ( χ Κ Χ )ad e. (dem Kassierer/τ )(das Gel d f )( geben f ) pwd ** X )iP ( χ Χ χ Χ χ )ad f. (dem Kassierer/r)(das Geld/r)(geben/r)pwd
ADF
A/P
*!
*!
*!
*!
It should be noted once more that the crucial difference is between predicates and non-predicates in complex-F-constructions. While predicates have an incentive to join the AD of one of their arguments and therefore integrate at that lower level, arguments do not. In fact, ADF prefers for them not to share an AD with any other argument, which is why they end up forming their own AD. Incidentally, this reasoning applies to adjuncts, too, except that these never join ADs with a predicate, given that they are never selected by a predicate (cf. again the definition in (24)); it is beyond the scope of this paper to go into this, though.
Double Object Constructions 87 3.4 F on a Verbal Projection In the above cases we have concentrated on F's sitting on immediate constituents of the clause such as DatO, AccO and V. I remarked above (see discussion of (36)) that sometimes, there may be F-marking on clausal projections such as V', VP, IP, etc., as well. It would lead us too far afield to discuss the pragmatic differences leading to, say, a |yp NPp V/r| pattern as opposed to a Ivp NPp VF\F pattern, especially since the issue seems controversial. Fortunately, no commitment is needed, because, as we will see, nothing changes with or without additional F-marking on clausal projections. Consider (40), which is mostly a repetition of (37) with an F on VP added. 10 i: IVP Idas Geld]/? gebenHF χ )¡p •BT ( X )AD a. (das Geld/r)(geben/r)pWd X )ip ( Χ )AD b. (das Geld/r)(geben/r)pwd X )iP ( Χ Χ Χ )AD c. (das Geld/r)(geben/r)PW(j
FP
ADF
A/P
*
*
*
*!
*!
In (40-a) and (40-b), the smallest prosodie constituent containing VPf is the AD. Since that AD is the head of iP, F O C U S P R O M I N E N C E is met.11 If we look at a double object example along the lines of (39), a similar reasoning applies; (41) repeats the winning candidate for this structure with an F on VP added: (41)
( (
Χ
)(
χ
)¡P
X
)AD
I VP dem Kassierer/r das Geld/r geben/r|yr No smaller prosodie unit than iP contains the VP, and since iP is the highest category, nothing is more prominent than iP. Therefore, no additional violation of FP occurs, hence no other candidate will improve relative to (41)/(39-c).
88
Daniel Büring
3.5 Deaccenting So far we have looked at simple foci (DatO/r, AccO/r, and V/r) or complex foci in which all VP internal arguments were F-marked. Now I will turn to examples that contain deaccenting. As a starting point, recall that a ditransitive VP/IP focus without deaccenting results in a structure with the last AD consisting of a prominent AccO and a non-prominent V. (42)/(43) illustrates this with a new example (foci on, external to, and above VP are not indicated): (42)
'Why was Veronika arrested?'-* DatO/r AccO F V F Weil sie ihrem MAcker den KaMINhaken überzog, bec. she her-DAT man the-ACC poker landed 'Because she beat her man with the poker.'
(43) i: 1 vp DatO/r AccOf V F \
a.
χ )iP ( χ Κ χ )(x )ad (ihrem Macker/r)(den Kaminhaken/,)(überzog/7)p\yd χ )iP
( b.
*!
A/P
iP not right-headed
)ad
χ
)(
X
)iP
χ
)ad
χ
χ
χ
**
*!
**
)AD
(ihrem Macker/r)(den Kami η hake η f ) ( ü berzog f ) pwj χ )iP
( e.
χ
(ihrem Macker/r)(den Kaminhakenf-)(iiberzogf-)pwd χ )iP
"sr ( d.
)(
ADF
**
(ihrem Macker/r)(den Kaminhaken/r)(überzog/r)pwd
( c.
χ
FP
**
*!
)ad
(ihrem Macker/r)(den Kaminhaken/r)(überzog/r )pwd
If we now introduce AccO as part of the context-question, it no longer needs to be F-marked (since it is Given). But its surrounding elements, DatO and V, still are (and so are, presumably, its dominating VP and IP). Such a structure is realized with the nuclear accent on V, no stress or secondary accent on AccO, and AD-level stress (and usually a secondary pitch accent) on DatO. So the contextually given AccO must remain unstressed and unaccented between its two stressed neighbors, which is presumably why cases like this have been labelled "deaccented". (44)
'Why was Veronika arrested? Only because she had a poker in her
Double Object Constructions
89
trunk?' -)• DatO/r AccO V/r Nein,... No,... a. ...weil sie ihrem MAcker den Kaminhaken ÜBerzog. b. #...weil sie ihrem MAcker den Kaminhaken überzog. ...bee. she her-DAT man the-ACC poker landed It should also be noted that, unlike all the other cases of XP+V focus we have considered thus far, this type of context strictly excludes an unstressed V, as (44-b) shows. This result is predicted, as tableau (45) demonstrates. i: I vp DatO/r AccO V/r| X )¡p us- ( χ )( χ )AD a. (ihrem Macker/r)(den Kaminhaken)(überzog/r)pwd χ )ip ( χ )( χ )(x )ad b. (ihrem Macker/r)(den Kaminhaken)( überzog ρ ) pwj χ )¡p ( Χ )( χ )AD c. (ihrem Mackerf)(den Kaminhaken)(überzogf jp-y/j χ )¡p ( χ )ad d. (ihrem Macker/r)(den Kaminhaken)(überzog/r)pw(j
FP
ADF A/P
*
*
*
*!
**!
*
*!
In contradistinction to the case of a narrowly focused DatO - (35), whose winner is structurally parallel to (45-d) - the verb together with the AccO will form an AD in this case. The reason is that this last AD does not threaten to violate FP: It contains a focus, and to make that focus prominent, it merely needs to violate A/P. Before I go on I would like to emphasize some points in which the present account differs from others in the literature: First, the integration effect (Fmarked V remains unstressed if its argument is stressed) is arrived at without any syntactic conditions on F-patterns (such as the second phrasal focus rule in Selkirk 1984: 207, or the third focus assignment rule of Rochemont 1986: 85), a result which brings us closer to a theory that dispenses with focus projection rules altogether. Second, it maintains that stress assignment and accenting are related to focus through prosodie phrasing (unlike the proposal in Schwarzschild 1999: sec. 6, which otherwise shares many properties with the present one). Third, it generalizes to complex F-patterns and deaccenting,
90
Daniel Biiring
cases not discussed in, e.g., Truckenbrodt (1995, 1999). And fourth, it allows integration with a theory of word order variation, as I will demonstrate now.
4 Word Order Variation 4.1 A First Look In (35) above we derived the fact that single focus on the DatO yields a structure as in (46). FP requires that DatO be the head of AD and iP; since iP is right-headed, this blocks insertion of further A D boundaries to the right of AD. Due to this, a "super-big" AD is formed, violating ADF. (46)
...
χ
)¡p
( χ )ad DatO/rAccOV Let us now see what happens if word order permutations enter the picture. In this case, the same F-pattern could be realized without violating any constraint by utilizing AccO > DatO order. And in fact, this option exists alongside the one in (46) in German, as noted in (7) above. I repeat both variants for convenience here (note that I have added the indication of AD-level prominence on AccO in (47-b), which may or may not be associated with a pre-nuclear pitch accent): (47)
Who did you give the money to? a. b.
Ich habe dem KasSIErer das Geld gegeben. (DatO > AccO = (46)) Ich habe das GELD dem KasSIErer gegeben. (AccO > DatO = (48)) Ί gave the teller the money/the money to the teller.'
Let us examine (47-b) more closely, since we haven't done so before. Its prosodie structure is (48). It is perhaps worth pointing out that the prosodie structure of (47-b)/(48) is identical to that of the parallel DatO-AccO-V example in (38)/(39) above; in particular, V and the adjacent DatO integrate into one AD in just the same way that V and AccO do. (48)
...
χ
)ip
( Χ )( Χ )ad AccO DatO ρ V
Double Object Constructions
91
(49-a) is the constraint profile for this AccO > DatO order. No violations occur. (I will henceforth leave out the iP-level for reasons of space; the head of iP - which is predictably the rightmost AD - will be indicated by a capital bold face grid mark: X.) i: [ vp AccO DatO/R V] •er ( χ Χ X )AD a. (das Geld)(dem Kassierer/r)(geben)pwd b.
( χ )AD (das Geld)(dem Kassierer/r)(geben)pwd
c.
X )AD (das Geld)(dem Kassierer/r)(geben)pwd (
Χ Χ
Χ
Χ
FP
ADF
A/P
*! *
*!
*
Likewise, the deaccentuation case in (44)/(45) above can be realized in an optimal prosodie structure with AccO > DatO order. As (50) illustrates, this structure maps the V and its argument into one AD, avoiding a violation of ADF. (50)
'Why was Veronika arrested? Only because she had a poker in her trunk?' AccO DatO/r Vf Nein, weil sie den KaMINhaken ihrem MAcker überzog, no bec. she the-ACC poker her-DAT man landed i: [ V p AccO DatO/r V/r|
FP *
a.
( Χ Κ X )AD (den Kaminhaken)(ihrem Macker/r)(iiberzogf )pwd
*
b. er c.
( Χ )( X XX )AD (den Kaminhaken)(ihrem Mackcr/r )(überzogρ)pwj ( χ )( X )AD (den Kaminhaken)(ihrem Macker /τ )( überzog f ) pwd
d.
( X )AD (den Kaminhaken)(ihrem Macker/r)(iiberzog/r)pw ( j
ADF
A/P *!
*!
*
**|
*
If the F-marked argument is capable of forming a "perfect AD" with the verb, which it is when it is adjacent to V, this option is preferred - (51 -c). Phrasing and accenting the verb separately - (51-b) - is just as impossible as with A C C O f V F i n (37).
92
Daniel Büring
4.2 Competing with Movement Suppose now that AccO > DatO structures as in (49) and (51 ) were to enter into competition with their DatO > AccO siblings such as (35) and (45). That is, suppose that the input was not specified with respect to object ordering, allowing outputs with either order to compete with one another (I'll use set notation in the input specification to indicate this). 12 Then the AccO > DatO structure would be the sole winner. It clearly beats even the best DatO > AccO candidate. (52) demonstrates this for the simple DatO/r case. It compares the optimal structure from (51) with that from (35), rendering the latter sub-optimal. who did you give the money? i: {vp AccO, DatO/r, V} (
a.
(das Geld)(dem Kassierer/r)(geben)pwd (
b.
X )(
X
FP
KT
X
ADF
A/P
)AD
)AD
*!
(dem Kassiererf)(das Geld)(geben)pwd
If German had a ranking like that in (52), the only object order we would ever find for simple focus cases would be one where the F-marked object follows the F-less one, so the former can be maximally prominent (satisfying FP) and the latter can form an AD (satisfying ADF). Put in derivational terms, we would find obligatory scrambling of non-focused objects around focused ones. While this is obviously not the case in German, a situation much like this can arguably be found in many Romance languages, e.g., Spanish, Italian and, to a lesser degree, French (Ladd 1996, Zubizarreta 1998). In Spanish for example, an NP which is the single focus in a sentence must occur postverbally, in VP-final position. The interpretation of this that I have in mind runs along the following lines: Any structure in which the focus isn't sentence final would require an AD which contains the focus and everything following it, similar to (52-b) (otherwise the AD wouldn't be iP-final, hence not the head of iP, yielding a violation of FP). But such a structure will always be suboptimal compared to one in which the focus occurs in sentence final position, so that every element preceding it can form its own AD, similar to the structure in (52-a). Derivationally put, we find obligatory movement of the focus to a peripheral position (see Gutiérrez-Bravo 1999 for an optimality analysis along these lines).
Double Object Constructions
93
So how is German different from a Spanish-type language? Why is there optionality of object re-ordering at least in some cases, as discussed in section 2? Assume a constraint that disfavors the AccO > DatO word order in (52) (such as STAY from section 2 above). Such a constraint, if ranked high enough, in particular higher than ADF, would be able to tip the scales in favor of (52-b) again. To derive the German case, then, the pertinent constraints must be arranged so as to allow two optimal candidates sometimes. This we achieve by ...
4.3 Tying the Focus Constraints with DAT As in section 2 above, we will use a constraint-tie to derive the optionality. Essentially, we want to tie the constraint that enforces a good prosodie structure with the one that enforces DatO > AccO word order. Above we used the constraint STAY for the latter purpose. We do not need to commit ourselves to the syntactic assumptions connected to this constraint, though (i.e., a particular base order, movement, VP-adjunction). We can simply follow the guide of Müller (1998: 22) and use the constraint in (53): 13 (53)
DAT(IVE): Dative NPs precede accusative NPs.
Implementing this, however, requires an additional complication. Since we have two relevant constraints regarding prosodie structure - A D F and A/P the question arises as to which of them we should tie with the word order constraint DAT. Upon closer inspection it turns out that the alternation - where it exists is always between the DatO > AccO order under some mediocre prosodie phrasing and the AccO > DatO order under an optimal phrasing (i.e., one which satisfies both A D F and A/P in an optimal way). What this means is that we want the tie to be resolved into (54-a) or (54-b): (54) a. b.
DAT » A D F » A/P A D F » A/P » DAT
It will turn out that we crucially never want the tie to resolve into any of (55): (55) a. b. c. d.
A/P » DAT » A D F A/P »
ADF »
DAT
DAT » A/P » A D F A D F » DAT » A/P
94
Daniel Biiring
Put differently, the two prosodie constraints A D F and A / P never change their ranking relative to each other, but only as a block relative to DAT. This special property of the system will become relevant only in the deaccenting cases, but for reasons of comparability and uniformity I will use it right f r o m the beginning. The notation I invent for this purpose is that in (56): FP
DAT i prosodie constraints ADF A/P
The narrow-focus cases fall out straightforwardly again as in section 2 above: With AccO/r all constraints pull in the same direction, making the DatO > A c c O order the only one possible (I use set notation to designate the input again): FP
What did you give the teller?
DAT ; pros. cons. ADF A/P
i: { D a t O , A c c O / r , V }
( χ )( X )AD a. (dem Kassierer)(das Geld/r)(geben)pwd ( X )AD b. (das Geldf )(dem Kassierer)(geben)pwd
«S*
*!
*!
Since candidate (57-a) violates neither DAT nor any of the prosodie constraints, it will be the winner regardless of how the tie is resolved. With DatO/r, the focus constraints - in particular A D F - favor A c c O > DatO, while DAT favors the opposite order. Since both constraints are tied, both structures emerge as optimal. FP
Who did you give the money?
DAT ; pros. cons. ADF A/P
i: { D a t O / r , A c c O , V }
( X )AD a. (dem Kassierer/r)(das Geld)(geben)pwd RA- ( Χ )( X )AD b. (das Geld)(dem Kassierer/R)(geben)pwd
*
IGT
(
X
)AD
*
*!
*!
c. (das Geld)(dem Kassierer^)(geben)pwd Focus on both arguments (with or without the verb) again allows only the order preferred by DAT. While scrambling doesn't make the phrasing worse, it doesn't improve it either and is therefore excluded:
Double Object Constructions FP
What will you do?
i: {DatO/r, AccO/r, VF} ET ( χ )( X )ad a. (dem Kassierer/τ )(das Geld/r)(geben/7)p\yd ( X )( X )ad b. (dem Kassierer/r)(das Geld/r)(geben/r)pwd ( χ )( χ )( X )AD c. (dem Kassierer/r)(das Geld/r)(geben/r)pw(j ( X )( X )AD d. (das Geld/r)(dem Kassierer/r)(geben/r)pwcj ( X )AD e. (das Geld/r)(dem Kassierer/r)(geben/r)pw(j ( Χ Χ X )( X )AD f. (das Geld/r)(dem Kassierer/r)(geben/r)pwd
DAT
95
pros. cons. ADF A/P
**
**
*!
**
*!
**
*!
**
*!
*!
**
*!
*!
Let us then turn to the deaccenting cases. If DatO and V are F-marked and AccO is not, two structures emerge as grammatical: the one that fulfills DAT (and violates the prosodie constraint A/P), and the one that satisfies the prosodie constraints (but violates Dat).14 (60) Why was Veronika arrested? Because she had a poker in her trunk? No, because she...
FP
D A T • pros. cons. ADF A/P
i: { A c c O , D a t O / r , V / r }
BS· (
χ χ X )ad a. (ihrem Macker/r)(den Kaminhaken)(überzog/r)pwd
*
( χ )( Χ XX )ad b. (ihrem Macker/r)(den Kaminhaken)(iiberzog/r)p Wd
*
*!
*
*!
(
X )ad c. (ihrem Macker/r)(den Kami nhaken)( überzog ρ ) pwd
(
χ
Κ
X )ad d. (den Kaminhaken)(ihrem Macker/r)(iiberzog/r)pv/d
(
χ
)(
χ
)(X )ad e. (den Kaminhaken)(ihrem Macker/r)(überzog/r)pwc) m· ( χ )( X )ad f. (den Kaminhaken)(ihrem Macker/r)(überzog/r)pwd
*
*
*
*
*
*
*
*! *!
If DatO is non-F, on the other hand, AccO > DatO order is excluded because the DatO > AccO order already allows a perfect prosodie structure.
96
Daniel Biirìng
(61) Why was Veronika arrested? Because her man disappeared? No, because she...
FP
DAT j pros. cons. ADF A/P
i: {DatO, A c c O f , V F }
a. b. "ST c. d. e. f. g.
( χ )( X )ad (ihrem Macker)(den Kaminhaken/r)(überzog/r)pwd ( X )( X XX )AD (ihrem Macker)(den Kaminhaken/r)(überzog/r)pwd ( X )( X )ad (ihrem Macker)(den Kaminhaken/r)(überzog/r)p\yd ( X )ad (ihrem Macker)(den Kaminhaken/r)(iiberzog/r)PWd ( χ )( X )ad (den Kaminhaken/r)(ihrem Macker)(iiberzogρ)pwd ( Χ )( Χ XX )ad (den Kaminhaken/r)(ihrem Macker)(überzog/r)Pwd ( X )ad (den Kaminhaken/r)(ihrem Macker)(überzogρ)pwd
*
*!
*
*!
*
*
*!
*
*!
*
*!
*!
*
*!
*!
*!
To sum up then, the proposed system correctly predicts when we get two different object orders with the same focus pattern, and when we don't. It generalizes across the various cases because it doesn't invoke the notion of an optimal focus placement (as the system in section 2 did), but only the notion of an optimal prosodie structure. For each of the grammatical structures, it delivers a unique prosodie phrasing, which in turn can be used to derive the set of its possible accent patterns.
5 Concluding Remarks and Some Thoughts On Markedness One way of looking at the constraint tie is that there are actually two grammars at work: One that finds the prosodically optimal candidate, and one that finds the morphosyntactically optimal one (which we have equated with the one that displays DatO > AccO order here). All candidates which are prosodically and morphosyntactically sub-optimal are predicted to be simply ungrammatical. A reasonable objection to the present proposal is that a sentence like, say, (8-b), repeated here as (62-a), is awkward, but still much better than, say, (62b), and that (62-a) should therefore be question-marked, but not starred, as is done here.
Double Object Constructions
97
(62) a. ??Ich habe das GELD dem Kassierer gegeben. I have the money the teller given b. *Ich habe das GELD gegeben dem Kassierer. I have the money given the teller Ί gave the money to the teller.' Intuitively, (62-b) violates a hard and fast constraint about verb-argument ordering in German, namely that nominal arguments cannot be post-verbal. Constraints like DAT, ADF, A/P and their kin seem "softer" than that. Ignoring the numerous interesting issues about the relation between grammaticality and acceptability that come up here, let us grab the bull by the horns and directly implement the intuition described above. We simply need to say that a candidate C* which is sub-optimal to the prosodically or morphosyntactically optimal candidate C only by virtue of violating any of the constraints in DAT, ADF, A/P - the word order constraints - is marked, but not ungrammatical. 15 In practice this means that all sub-optimal candidates discussed in this paper that lose on constraints lower than FP are no longer ungrammatical, but merely marked. The result of this amendment is a system which is similar to, though much less refined than, the one proposed in Müller (1998). I mention this possibility merely to illustrate that everything proposed in this paper can relatively easily be made to conform to such a system. There is no inherent incompatibility between the type of constraints used here and the kind of system Müller proposes in order to derive both grammaticality and markedness, and the types of judgement reported above make it seem advantageous to actually combine them. Note though that the notion of markedness as introduced in the last paragraph is only defined among members of the same competitor set. And given our decision to include F-marking in the input, this means that otherwise identical sentences with different F-markings are not in competition with each other. It follows that there will not be F-patterns that are sui generis more marked than others. To give an example, the structures in (63-a) and (63-b) are both optimal. Since they involve different F-patterns they do not even compete. Hence neither is marked with respect to the other. (63) a.
( (
χ
)iP
)AD
dem Kassierer/rdas Geld geben
98
Daniel Biiring
b.
(
c. ?*(
χ )iP χ )( χ )ad )ad dem Kassierer/r das Geld/r geben/r X
)iP
Χ Χ χ χ X )ad das Geld/r dem KassiererF geben/r (63-c), on the other hand, is a competitor to (63-b), and sub-optimal with respect to it (it violates DAT and ADF). Hence it is marked (or ungrammatical, on the previous interpretation). This state of affairs reflects the intuition that in the appropriate contexts (e.g., "Who did you give the money to?" and "What did you do?", respectively) (63-a) and (63-b) are judged perfect, while (63-c), even in the context that suits its F-pattern (again "What did you do?") sounds at best marginal. It is of course possible to define different notions of markedness in addition. For example, (63-a) requires a much more specific context than (63-b) (roughly one in which "give X the money" is already under discussion). It has been observed that speakers, when presented with sentence-tokens out of the blue, judge (63-b) as more "normal" than (63-a), presumably because (63-a) forces them to accommodate a more specific context than (63-b) does (cf. Höhle's 1982 explication of the notion "normal intonation"). I believe, however, that this sort of markedness, call it pragmatic markedness, is essentially different from the one manifested in, e.g., (63-c), which we might call structural markedness. Empirically, the first only occurs in the absence of a context and disappears once the appropriate context is provided, while the second invariably remains. As a consequence, pragmatic markedness should be explained in pragmatic terms: It quite likely correlates with the amount of accommodation speakers are required to make. Structural markedness is a matter of grammar proper: It is no more reducible to non-structural facts than, say, the requirement that noun phrases need case. In the present proposal this has been done by specifying F-marking in the input. Accordingly, for every input I with a specified F-pattern there are one or more optimal structures, which are predicted to be fully acceptable in a context that complies with the F-pattern of I. Sub-optimal output structures for I are structurally marked (or ungrammatical) by virtue of the system developed, and thus predicted to be considerably less acceptable in that context. No structural markedness ranking among different inputs with different Fpatterns is defined (though, as I indicated above, it could be on a different scale). In this respect the present system essentially differs from that proposed in Müller (1998).
Double Object Constructions
99
In conclusion, this paper has offered a particular way of looking at word order variation, exemplified with double object constructions in German. It was proposed that two families of constraints co-determine word order: morphosyntactic constraints and prosodie constraints. The apparent influence of information structure (here: focus marking) on word order is really only indirect, since information structure interacts with prosodie phrasing (this line of argumentation was, I believe, first explicitly pursued in the works of Zubizarreta and Reinhart, as documented in Zubizarreta 1998 and Neeleman & Reinhart 1998, though in these works, unlike in the present one, information structure interacts with accenting directly). Depending on the relative ranking of these constraint families, languages may display strict word order (morphosyntactic constraints outrank prosodie ones), free word order (prosodie constraints outrank morphosyntactic ones), or some mixture thereof (the two constraint types are intertwined or tied). The German Mittelfeld presents an instance of the third type. In the present paper we have only included one exemplary morphosyntactic constraint. We mainly tried to explore the prosodie constraints, attempting to connect up approaches like the ones mentioned above with current work in prosodie phonology. We have shown that such a "deconstruction" of accent assignment rules is not only compatible with the general approach to word order variation, but even serves to derive a wide and comprehensive range of data in a satisfactory fashion.
Notes This paper is built on an earlier economy-theoretic paper of mine (Biiring 1996) and a series of optimality-based talks I gave at the SFB 282 Colloquium "Die Intonationale" in Cologne 7-96, the "Interfaces of Grammar" conference in Tübingen 1096, and the Stuttgart "Workshop on OT Syntax" 10-97 (Biiring 1997a). I would like to thank the audiences at these conferences for their comments and discussion, especially Kai Alter, Katharina Hartmann, Gerhard Jäger, Inga Kohlhof, Gereon Müller, Roger Schvvarzschild and Hubert Truckenbrodt. Judith Aissen, Armin Mester and Line Mikkelsen offered invaluable comments and suggestions that greatly helped to shape and improve the present version. All remaining errors and shortcomings have been retained deliberately in order to stimulate future work. 1. The issue of whether the DatO>AccO order is preferred for all verbs or just a lexically specified sub-group is controversial; cf. among others Haider (1993), Fortmann & Frey (1997), Vogel & Steinbach (1998), and Müller (1998). Since
100
2. 3. 4.
5.
Daniel
Biiring
this question is orthogonal to the issue at hand we will leave it unresolved, concentrating on verbs that are uncontroversially among the D a t O > A c c O ones. I ignore cases of obligatory movement here, which I argue in Biiring (1996) exist in German, too. In fact, the situation is more complicated since I argued that (13) is not accurate, a point I won't go into here. To derive Diesing's (1992) original generalization, INDEFINITES in (13) would have to be strengthened to a biconditional, requiring that indefinites are existential if and only if they are VP-internal. In Biiring (1996) I argue, however, that this generalization is too strong, i.e., that VP-internal indefinites can also be generic; cf. also Biiring (in prep.). These conditions, as well as the one requiring headedness of prosodie phrases introduced in the next paragraph, should be properly understood as undominated markedness constraints in an optimality framework. It should be understood that this only holds for German, then, leaving open the possibility that they are ranked in a more interesting way in other languages.
6. Note that not all F-marked XPs will need to have a pitch accent on this view, though they all will have AD-level stress. Likewise, non-focused XPs can be pitch accented (if all focused ones are, too), since they usually receive AD-level stress as well. Crucially, however, non-focused AD-heads cannot have accents if the focused AD-heads don't. This is the empirical generalization reported, e.g., in Uhmann (1991). Others, for example Féry (1993), claim that (in our terminology) all AD-heads must receive a pitch accent if they are part of a focused phrase, while non-focused ones may or may not. This latter viewpoint can be implemented by reformulating FP below, but it necessitates further complications. Given that the empirical situation seems unclear, I will therefore stick with the easier generalizations offered in Uhmann (1991). 7.
PRED a n d X P a r e c l o s e relatives of T r u c k e n b r o d t ' s ( 1 9 9 9 ) W R A P a n d S t r e s s - X P
constraints, respectively. Just like these they will favor two ADs for adjuncthead, argument-argument and adjunct-argument structures (since neither phrase contains the other and no predication is involved), and with head-complement structures they unanimously favor one AD (since bare heads aren't XPs). Note, however, that PRED - unlike STRESS-XP - favors (NP V)AD-integration, even if V ends up being the head of the AD (cf. (31) below). Also, PRED applies even if the predicate is an XP of its own. I'm thinking of subject-intransitiveverb, object-secondary-predicate, and NP-short relative clause structures, which all have been reported to allow, if not require, single ADs. This can be achieved in the present system by ranking PRED above XP. 8. A more precise rendering is (i): (i)
F P (FOCUSPROMINENCE)
a.
If a is the smallest prosodie constituent that contains an F-marked syntactic node β, a is called a prosodie focus.
Double Object Constructions b.
101
If α is a prosodie focus at level η it is the head at level /i+l.
9. It is perhaps worth noting that the second best candidate in (31), (31-c), will receive a very similar overall realization to the winner (31-b). In particular both (31-b) and (31-c) have main prominence (= the head of iP) on V. The difference is merely the presence of AD-level prominence on the AccO in (31-c), which means that it can, but doesn't have to, bear a secondary pitch accent. As far as I can tell, the data are inconclusive with regard to this issue (a lot depends upon the relation between prominence and pitch accents; cf. note 6 above). (31-b) owes its optimality to the fact that A/P is ranked below ADF. If both were tied, both (3 Ιό) and (31-c) would be grammatical. Within the set of data that I discuss in this paper, this seems to be the only case in which ranking ADF above A/P is crucial. If required on empirical or theoretical grounds, this ranking could be given up, ruling in (31-c) as an optimal output. 10. The F-marking on VP is not represented in the outputs. This is because I only indicate the prosodie structure in the output, with some F-marks added for convenience, so there is no natural place for them. Strictly speaking the outputs should be pairs of syntactic phrase markers with F-marking and prosodie structures without (or, perhaps, only the latter). 11. Pedantically speaking, (40-c) violates FP as formulated in note 8 above, since the smallest prosodie constituent containing V P f , iP, is not the head of the next higher prosodie category, because there is no such higher category. But since this affects all structures in the competition equally, no change will arise from this. In the main text I will ignore these extra violations for the sake of perspicuousness. Put differently, I will interpret FP to say "...is the head at level η + 1, if there is such a level." 12. Another implementation would specify object order in the input, but allow GEN to change it. Nothing hinges on this in the present context. 13. As stressed in Müller 1998, this also allows us to add more morphosyntactic word-order constraints upon demand, yielding different morphosyntactically unmarked word orders without committing to the assumption of different basegenerated argument orders. 14. Note that if only DAT and ADF were tied, (60-f) could never emerge as an optimal candidate; it would always loose to (60-a). If instead all of DAT, ADF and A/P were tied, (60-a) and (60-f) would be permitted (as desired), both having one violation. But so would (60-c), which has only one violation, too. But (60-c) is not acceptable in this case. This is where the more complicated construction that I introduced at the beginning of this section pays off: Among the structures that violate DAT, only (60-f) is grammatical, because it violates none of the prosodie constraints. This candidate is optimal under the ranking DAT » ADF » A/P. Among the structures that satisfy DAT, only (60-a) is optimal, because it violates the lower prosodie constraint A/P, rather than the higher one ADF, as (60-c) does. This corresponds
102
Daniel Biiring
to the outcome under the ranking ADF » A/P » DAT. Crucially, for (60-c) to win the constraints would have to be ordered with A/P dominating ADF (either one of those in (55) above), but this is not permitted by the kind of tie that is assumed here. 15. Note that FP is not included here, owing to the empirical fact that a sentence with, say, main prominence on the AccO can absolutely not be used in a context that requires focus on the DatO. In other words, a structure like (i) is strictly impossible, not just marked. (i)
( X )iP ( χ )( χ )AD (DatCV)(AccO)(V)
References Bech, Gunnar 1955/57 Studien ueber das deutsche verbum infinitum. K0benhavn (Det Kongelige Danske Videnkabernes Selskabs Historisk-filologiske Meddelelser 35, No. 2, 1955/ 36, No. 6, 1957). Biiring, Daniel 1996 Interpretation and movement: Towards an economy-theoretic treatment of German 'Mittelfeld' word order. Ms., Frankfurt University. Biiring, Daniel 1997a Perfect or just optimal? Towards an OT account of German Mittelfeld word order. Talk presented at the Workshop on OT Syntax, October 1997, Stuttgart University. Biiring, Daniel 1997b The Meaning of Topic and Focus - The 59th Street Bridge Accent. London: Routledge. Biiring, Daniel in prep. What do definites do that indefinites definitely don't? Ms., UC Santa Cruz. Choi, Hye-Won 1996 Optimizing structure in context: Scrambling and information structure. Ph.D. dissertation, Stanford University, (to appear with CSLI Publications, Stanford). Diesing, Molly 1992 Indefinites. Cambridge, MA: MIT Press. Drubig, Hans Bernhard 1994 Island Constraints and the Syntactic Nature of Focus and
Association
Double Object Constructions
103
with Focus. Arbeitspapiere des Sonderforschungsbereichs 340, No. 51. University of Tübingen. Eckardt, Regine 1996 Intonation and Predication: An Investigation in the Nature of Judgement Structure. Arbeitspapiere des Sonderforschungsbereichs 340, No. 77. Stuttgart & Tubingen. Féry, Caroline 1993
German Intonational Patterns. Tübingen: Niemeyer.
Fortmann, Christian — Werner Frey 1997 Konzeptuelle Struktur und Grundabfolge der Argument im Deutschen. In: F.-J. d'Avis and U. Lutz (eds.) Zur Satzstruktur im Deutschen, 143170. (Arbeitspapiere des Sonderforschungsbereichs 340.) University of Tübingen. Grimshavv, Jane 1997 Projection, heads and optimality. Linguistic Inquiry 28: 373-422. Gussenhoven, Carlos 1984
On the Grammar and Semantics of Sentence Accents. Dordrecht: Foris.
Gutiérrez-Bravo, Rodrigo 1999 An OT account of the crosslinguistic differences in focus and word order in English, Spanish and French. Ms., UC Santa Cruz. Haider, Hubert 1993
Deutsche Syntax - Generativ. Tübingen: Narr.
Halle, Morris — Jean-Roger Vergnaud 1987
An Essay on Stress. Cambridge, MA: MIT Press.
Höhle, Tilman 1982 Explikation für 'normale Betonung' und 'normale Wortstellung'. In: W. Abraham (ed.) Satzglieder im Deutschen, 75-153. Tübingen: Narr, de Hoop, Helen 1992 Case configuration and noun phrase interpretation. Ph.D. dissertation, Rijksuniversiteit Groningen. Jacobs, Joachim 1992 Neutral stress and the position of heads. In: J. Jacobs (ed.) Informationsstruktur und Grammatik, 220-244. (Linguistische Berichte Sonderheft 4.) Opladen: Westdeutscher Verlag. Ladd, Robert D. 1996
Intonational Phonology. Cambridge, UK: Cambridge University Press.
Lenerz, 1977 Jürgen Zur Abfolge nominaler Satzglieder im Deutschen. Tübingen: Narr.
104
Daniel Büring
McCarthy, John — Alan Prince 1993 Generalized alignment. Ms., University of Massachusetts, Amherst & Rutgers University. Müller, Gereon 1991 In support of dative movement. In: S. Barbiers, M. den Dikken, and C. Levelt (eds.) Proceedings ofLCJL 3, 201-217. Leiden. Müller, Gereon 1998 German Word Order and Optimality Theory. Arbeitspapiere des Sonderforschungsbereichs 340, No. 126. Stuttgart & Tübingen. Müller, Gereon 1999 Optionality in optimality-theoretic syntax. GLOTInternational
4(5): 3-8.
Neeleman, Ad — Tanya Reinhart 1998 Scrambling and the PF interface. In: M. Butt and Geuder W. (eds.) The Projection of Arguments: Lexical and Compositional Factors, 309-353. Stanford: CSLI Publications. Pierrehumbert, Janet 1980 The phonology and phonetics of English intonation. Ph.D. dissertation, MIT. Pierrehumbert, Janet — Julia Hirschberg 1990 The meaning of intonational contours in the interpretation of discourse. In: P. Cohen, J. Morgan and M. Pollock (eds.) Intentions in Communications, 271-311. Cambridge, MA: MIT Press. Prince, Alan — Paul Smolensky 1993 Optimality Theory: A Theory of Constraint Interaction. RuCCS Technical Reports 2. Rutgers University (to appear with MIT Press). Rochemont, Michael 1986 Focus in Generative Grammar. Amsterdam/Philadelphia: John Benjamins. Schvvarzschild, Roger 1999 GIVENness, AvoidF and other constraints on the placement of accent. Natural Language Semantics 7(2): 141-177. Selkirk, Elisabeth 1984 Phonology and Syntax: The Relation between Sound and Structure. Cambridge, MA: MIT Press. Selkirk, Elisabeth 1995 Sentence prosody: Intonation, stress, and phrasing. In: J. Goldsmith (ed.) Handbook of Phonological Theory, 550-569. Cambridge, MA/Oxford, UK: Blackwell.
Double Object Constructions
105
Truckenbrodt, Hubert 1995 Phonological phrases: Their relation to syntax, focus, and prominence. Ph.D. dissertation, MIT. (Published 1998 by MITWPL). Truckenbrodt, Hubert 1999 On the relation between syntactic phrases and phonological phrases. Linguistic Inquiry 30(2): 219-255. Uhmann, Susanne 1991
Fokusphonologie. Tübingen: Niemeyer.
Uszkoreit, Jürgen 1987 Word Order and Constituent Structure in German. Stanford: CS LI Publications. Vikner, Sten 1991 Verb movement and the licensing of NP-positions in the Germanic languages. Ph.D. dissertation, University of Geneva. Vogel, Ralf — Markus Steinbach 1998
The dative - An oblique case. Linguistische Berichte 173: 65-90.
Webelhuth, Gert 1989 Syntactic saturation phenomena and the modern Germanic languages. Ph.D. dissertation, University of Massachusetts, Amherst. Zubizarreta, Maria Luisa 1998 Prosody, Focus and Word Order. (Linguistic Inquiry Monographs 33.) Cambridge, MA: MIT Press.
Remarks on the Economy of Pronunciation Gisbert Fanselow & Damir Cavar
1 Introduction and Overview The idea that syntactic movement is composed of two steps, a copying operation followed by a deletion operation (the C&D-theory of movement) - as illustrated in (1) - has again become popular with the rise of the Minimalist Program (Chomsky 1993). In one of the straightforward extensions of the C&D-approach, at least certain instances of so-called covert movement arise from the overt copying of a full phrase before SPELLOUT, followed by the deletion of the higher rather than the lower copy - an assumption that implies that spellout conventions regulate whether the target or the source position of the copying operation is realized phonetically (see, e.g., Bobaljik 1995, Groat & O'Neil 1996, Pesetsky 1997, 1998a, Roberts 1997 (for head movement), Sabel 1998, among others) - as illustrated in (2) for Chinese.
(1)
Overt Movement
a.
(it does not matter) she likes w h o
b.
(it does not matter) who she likes who =>·
COPYING =>
DELETION O F SOURCE
c.
(2)
(it does not matter) who she likes who =>
Covert Movement a.
ta weishenme da meigeren he why hit everyone
COPYING =>
b.
weishenme [ ta weishenme da meigeren J => DELETION O F TARGET =>·
weishenme [ ta weishenme da meigeren ]
108
Gisbert Fanselow & Damir Cavar
In this paper, we will discuss four constructions which we believe have a fairly simple analysis in a C&D-theory of movement only: true partial whmovement as in Bahasa Indonesia (3), w/i-copying (4), left-branch extractions/split constituents as in German (5), and (apparent) head movement. We agree with Pesetsky (1997, 1998a) in the conviction that the particular success of a C&D-theory of movement (as compared to other models) hinges on its interaction with principles of sentence pronunciation in an optimality theoretic fashion.1 (3) a. b. c.
Bill tahu Tom men-cintai siapa Bill knows Tom loves who Bill tahu | siapa yang Tom cintai j Bill knows who FOC Tom loves Siapa yang Bill tahu Tom cintai 'Who does Bill know that Tom loves?'
(4)
Wen denkst du wen sie liebt? who think you who she loves 'Who do you think that she loves?'
(5)
Was hat sie [ t für Bücher ] gelesen? what has she for books read 'What kind of books has she read?'
Furthermore, the analysis of the constructions discussed here relies on the notion of cyclic optimization in syntax, as recently proposed by Müller (1999) and Heck & Müller (1999). The paper is organized as follows. We will introduce the basic idea of pronunciation economy approaches in the next section, with an analysis of partial Wz-movement. It will be argued that partial wA-movement data are exactly the kind of construction one would expect to find if the C&D-theory of movement is correct. Certain syntactic differences among the languages that have partial movement will be analyzed in an optimality theoretic fashion. The economy aspect of the approach defended here will become clear in section 3, where we show that deletion in chains may be incomplete if certain constraints are ranked above the principle of pronunciation economy. In section 4, we show that semantic and phonological conditions may imply that deletion is distributed over both copies in a movement chain. One particularly promising aspect of this account is that it allows us to reduce head movement to phrasal movement without being confronted with notorious problems, like
Economy of Pronunciation
109
missing freezing effects, that arise in other approaches (e.g., Kayne 1998, Koopman & Szabolcszi 1999, Mahajan 1999).
2 TYue Partial W7i-Movement An integration of the C&D-theory into an OT-approach to syntax (see Grimshaw 1997, Pesetsky 1998a, Legendre (in press), among many others) involves the following ingredients: Copying and deletion apply freely in the GEN-component of grammar, but the effects they have on grammatical outputs are determined by a number of principles on syntactic structure and sentence pronunciation. For concreteness, we will assume without further discussion that the creation of copies ("movement") is forced by the need to check features of an attracting head (as in Chomsky 1995, but nothing hinges on this assumption), 2 and we assume that movement is successive-cyclic. Chains that are created in this way will "originally" contain multiple copies of the same phonetic and semantic material. These copies may, but need not be subjected to a deletion operation, so that GEN will generate at least the candidates in (6) 3 for a question like who do you think she will invite? (6) a. b. c. d. e. f. g· h.
who do who do who do wfee do who do w h e do w h o do w h o do
you you you you you you you you
think think think think think think think think
who who whe who who who whe who
she she she she she she she she
will will will will will will will will
invite invite invite invite invite invite invite invite
who whe who who who whe who whe
In the standard case of movement, only one of these copies is actually pronounced. This follows from an interaction of the principles PRONECON and RECOV.4 PRONECON favors those structures in which the deletion of phonetic matrices in chains is maximized, but deletion is subject to recoverability, so that normally 5 exactly one copy will be retained in each chain. In other words, in most situations, only (6-e-g) are potential winners. (7) a.
Pronunciation Economy (PRONECON) 6 * Phonetic Matrix
110 Gisbert Fanselow & Damir Cavar b.
Recoverability (RECOV)7 The content of unpronounced elements must be recoverable from a local antecedent.
Ceteris paribus, this approach leads to the expectation that any copy in a chain may be the one that is spelled out, with all the others being deleted. So-called "partial wA-movement"8 as can be found in Bahasa Indonesia (cf. (3), repeated here as (8), and Saddy 1991, 1992) or Malay (Cole & Hermon 1998) seems to bear this prediction out. In a wA-question, the wA-phrase may either appear in situ (8-a), or be realized in its scope position (8-c), but it can also show up in the specifier positions of any of the CPs that may intervene between the root position of the wA-phrase and its scope position, as (8-b) illustrates ("true partial wA-movement"). (8) a. b. c.
Bill tahu Tom men-cintai siapa Bill knows Tom loves who Bill tahu f siapa yang Tom cintai ] Bill knows who FOC Tom loves Siapa yang Bill tahu Tom cintai 'Who does Bill know that Tom loves?'
Cole & Hermon (1998) argue that partial wA-movement is not focus movement; see also Basilico (1998) for arguments that partial movement in Slave cannot be reduced to focus movement. The most straightforward analysis for (8) (considered but rejected in Cole & Hermon 1998) assumes that siapa has in fact been attracted to its scope position in all examples, forming the chain indicated in (9-a). Due to the interaction of PRONECON and RECOV, all but one of the copies of siapa must not be pronounced. In the optimal state of affairs, any of the copies may be the one that is realized overtly, as the abstract structures (9-b-d) illustrate, which (roughly) correspond to (8). (9) a. b. c. d.
I cp siapa ... | cp siapa ... siapa J] [cp siapa ... |CP siapa... siapa J] I CP siapa ... ICP siapa ... siapa J] [CP siapa ... Icp siapa ... siapa ]]
There is at least one argument for analyzing true partial wA-movement along these lines. As Saddy (1991, 1992) and Cole & Hermon (1998) observe, partially moved wA-phrases behave as if they have moved to the scope position at least in terms of island conditions: There must be no movement island between the partially moved wA-phrase and its scope position. Thus, a
Economy of Pronunciation 111 wA-phrase cannot be moved out of an adjunct clause in Malay, and partially moved vWz-phrases must not occur within adjuncts either, as (10) (taken from Cole & Hermon 1998: 227,236) illustrates. The same holds, for example, for subject islands or for wA-islands: H7i-phrases cannot be extracted out of such islands, and wA-phrases that seem to have undergone "partial" wA-movement are not tolerated in these constructions either. (10) a. *Apa (yang) Ali dipecat kerana dia beli t what (that) Ali was-fired because he bought b. *Ali dipecat apa (yang) kerana dia beli t 'What is the thing such that Ali was fired because he bought it?' Such observations are explained straightforwardly if - as (9) suggests - "partial" movement is in fact full wA-movement, involving a "non-standard" deletion part though: The constellation in (11 -a) created by copying is ruled out by standard island theories. It does not matter then whether the uppermost or an intermediate copy of this (illicit) wA-chain fails to undergo deletion triggered by P R O N E C O N . (11) a. *u>A-phrase ... [island ··· wA-phrase ... wA-phrase ... | b. *wh phrase ... [island ··· wA-phrase ... wh phrase ... | c. *wA-phrase ... [island ··· wh phrase ... wh phrase ... | Island facts thus favor the C&D-analysis of partial wA-movement. Saddy (1991) and Cole & Hermon (1998) point out, however, that argumentai wA-phrases in situ can appear within syntactic islands (although they cannot be moved out of these islands), as (12) illustrates. (12)
Ali dipecat kerana Fatimahfikir dia mem bel i apa Ali was-fired because Fatimah thinks he bought what
(12) illustrates, in fact, a fairly widespread property: Unlike their adjunct counterparts, argumentai wA-phrases in situ do not obey any island constraints in a number of languages (but not in all), among them Chinese (see, e.g., Huang 1981). If island conditions affect overt and covert movement in the same way - as they have to if the difference between the two movement types is one of pronunciation only - (12) cannot involve movement. Rather, argumentai wA-phrases in situ must be assumed to be bound by a (null) question operator in the appropriate Comp (see, e.g., Aoun & Li 1993, Tsai 1994, Cole & Hermon 1998).9 More precisely, the [+wA|-Comp as in (13) may or may not have a syntactic
112 Gisbert Fanselow & Damir Cavar. feature F that attracts a w/z-phrase.10 If the feature F is missing, the w/z-phrase cannot move to the specifier position of Comp, but a binding relation can be established for w/z-phrases that are "referential" enough to be bound (i.e., for argumentai w/z-phrases). Then no island effects are to be expected.11 If the attracting feature F is present, the w/z-phrase moves, so that island effects arise, irrespective of where the w/z-phrase is spelled out in the end. (13)
Comp I +wh ]... w/z-phrase ...
Chinese illustrates the prediction that certain in situ w/z-phrases can be island sensitive: VWz-adjuncts can stay in situ, but they must not appear in islands. Note that adjuncts cannot be bound by an (argumentai) question operator base generated in Comp. Therefore, w/z-adjuncts can form a part of a question only if a chain is built up which links the adjunct to its scope position. W/z-adjuncts can thus be realized phonetically in situ only if a copy-chain (respecting islands) is built up to the scope position, in which the lowest copy surfaces after the deletions as forced by (7).12 We therefore follow Cole & Hermon (1998) in making the assumption that two strategies for forming questions coexist in Malay at least: copying of the w/z-phrase to its scope position, and the binding of w/z-arguments in situ. See Pesetsky (1998b) for a related but slightly different view on English, German and Slavic questions. A phonetic sequence such as (14) in which an overt copy of the w/z-phrase appears in its base position is thus ambiguous in our account (but not in Cole & Hermon 1998): buah apa may be bound by a [+w/z)-Comp, or it may be the copy of a chain link to the matrix Spec,C position that is spelled out phonetically. Given that the binding-zTz-iz'ta strategy is, in general, more liberal than the formation of questions by movement, (nearly) all examples that are grammatical under a movement analysis are generatable with a binding analysis, too - so that the existence of an ambiguity is both hard to establish and also hard to refute. (14)
Mary (mem)-beli buah apa di kedai Mary prefix-buy fruit what at shop 'What fruit did Mary buy at the shop?'
This prediction of an ambiguity may come closer to what holds in Singaporean Malay than in Bahasa Indonesia. According to Saddy (1991), w/zphrases in situ in Bahasa Indonesia are special in that they always take widest possible scope with respect to other operators, and one can take this as an argument against the systematic structural ambiguity of w/z-phrases in situ that
Economy of Pronunciation 113 we predict; however, the relevant judgments seem not to be shared by native speakers of Singaporean Malay (Cole & Hermon 1998: 225), so that we may uphold our analysis at least for the latter language. As for the situation in Bahasa Indonesia, systematic differences between wA-phrases that are phonetically realized in situ and those that appear in other positions can be accounted for in the following way. Müller (1997) argues for a principle like (15) as one of the determinants of parametric variation among languages with respect to question formation (we have adapted the formulation to the needs of the system we develop here). (15)
WH-IN-SPEC
A wA-phrase must be phonetically realized in the specifier position of a CP. WH-IN-SPEC blocks the phonetic realization of w/z-phrases in situ whenever a chain has been formed which contains more than two members (and if at least one of the chain members is a specifier of CP). Thus, unless other principles override it, (15) implies that wA-phrases in situ are not part of a chain reaching Spec,CP, as required for Bahasa Indonesia. As long as RECOV is ranked above WH-IN-SPEC, however, wA-phrases that are bound by a null operator in Comp (rather than being part of a chain created by copying) can nevertheless be realized in situ, because their omission would violate recoverability. For languages like Chinese in which wA-phrases show up in root positions only, the effects of (15) must be counteracted by a further principle like STAY*, which has been proposed in a somewhat different form in various works (see, e.g., Grimshaw 1997, Müller 1997, Legendre et al. 1998, and Ackema & Neeleman 1998) and which can be formulated as in (16) in our approach. (16)
STAY* (Nonstandard formulation)13 If the phonetic matrix of a c-commands a member of the chain of β, then it c-commands the phonetic matrix of β.
When STAY* » W H - I N - S P E C , the Chinese type of question formation arises (no visible effects of copying whatsoever); when W H - I N - S P E C STAY*, the Bahasa Indonesia system comes into being (wA-phrases in chains are always displaced phonetically,14 but bound wA-phrases may be realized in situ). Finally, if there is a tie between STAY* and W H - I N - S P E C , the grammar of Singaporean Malay arises, in which wA-phrases that belong to movement chains may surface in situ and in derived positions.
114
Gisbert Fanselow & Damir Cavar
A further observation on Malay and Bahasa Indonesia discussed in Saddy (1991) and Cole & Hermon (1998) seems to be incompatible with our analysis and figures as the key argument against the spellout account of partial movement in Cole & Hermon (1998: 251). 15 Transitive verbs in Bahasa Indonesia and Malay optionally combine with certain prefixes like meng. These must be absent, however, when a wA-phrase has moved across them, but they can be present when the wA-phrase is in situ. This rule holds for long whmovement as well (see Cole & Hermon 1998: 230-234 for details). Crucially, in partial w/z-movement constructions, meng must be absent between the root position and the position of the phonetically realized wA-phrase only; it can appear between the overt position of the w/z-phrase and the latter's scope position, as (17) indicates. (17)
Ali (mem) beritahu kamu tadi apa yang Fatimah (*men)-baca Ali meng told just now what that Fatimah meng read 'What did Ali tell you just now that Fatimah was reading.'
Whether this observation creates a problem for the C&D-analysis of "partial" movement (as Cole & Hermon 1998 claim it does) or not, depends, of course, on the details of the rule system that governs the distribution of meng. (18) appears to capture the empirical facts, and if such statements are allowed as (language-particular?) constraints in grammars, particle distribution cannot even be used as an argument against a fully representational interpretation of a C&D-analysis of partial movement. (18)
*MENG, w h e n e v e r MENG is c - c o m m a n d e d by the phonetic matrix
of w/z-phrase a and when it c-commands a trace of a . A more convincing analysis of the particle facts can be constructed, however, if we assume a notion of cyclic optimization, as proposed by Müller (1999) and Heck & Müller (1999). In standard OT, the generator component of grammar (GEN) constructs a set of syntactic objects from an input. These syntactic objects are the candidates for the evaluation procedure (EVAL), which selects optimal candidates, which are then grammatical sentences. In a cyclic model of optimization, this interaction of GEN and EVAL applies sequentially, building up and optimizing larger and larger syntactic objects. For concreteness, suppose that by applying merger and copying operations to lexical entries or syntactic objects previously formed, GEN generates a set of syntactic objects, until the elements so formed correspond to cyclic nodes
Economy of Pronunciation
115
(NPs and CPs) or to "phases" in the sense of Chomsky (1998). These candidates are then subjected to the EVAL procedure, yielding an optimal candidate. The optimal candidates for the expression of cyclic categories/phases so formed may then be fed into the GEN component again, in order to generate even larger structures, until the level of cyclic nodes or phases is reached again, at which the E VAL procedure selects the optimal structure again. In such a system, the question of which of the copies created by movement can be retained, and which copies are deleted phonetically, poses itself each time the construction of alternative structural representations has reached the cyclic node level. Consider, then, a stage in a derivation in which a w/z-phrase has been copied to a higher position, crossing an occurrence of meng in this context (19-a). Suppose that Σ is cyclic, so that optimization can and must start. Because of PRONECON, one of the two W/Z-phrase copies must disappear. 16 If the upper copy loses its phonetic matrix (19-b), nothing seems to have to happen to meng, i.e., it can be retained. In structures in which meng has been retained, the uppermost specifier position of CP therefore does not have a phonetic matrix, and it will not be able to re-acquire this phonetic matrix in later copying steps for more or less obvious reasons. 17 Therefore, above a retained meng, no copy in a w/z-chain originating lower than meng can have a phonetic matrix. Assume, however, that there is a principle requiring that meng must be deleted (19-c) when the upper copy is retained phonologically. This can (and must) be checked locally in each cyclic domain relevant for optimization. Thus, the empirical generalizations that concern mewg-distribution are very well compatible with a C&D-approach when it is executed cyclically. 18 (19) a. b.
I ς w/z-phrase meng w/z-phrase | wh phrase meng w/z-phrase
A more straightforward version of this account assumes that cyclic whmovement always targets the specifier positions of the relevant "phases", i.e., it assumes that cyclic w/z-movement always passes through Spec,CP and the specifier position of a functional projection below the subject position but above VP (see Chomsky 1986,1998; the relevant functional head might, e.g., be AGR-O relative to some earlier versions of the minimalist program, or the "outer specifier of vP" as in Chomsky 1998). The ban on the use of the meng-prefix in situations in which it has been "passed" by overt movement
116
Gisbert Fanselow & Damir Cavar
can then be reduced to (20), which bears an obvious similarity to the DoublyFilled Comp Filter. (20)
*IAGR-O-P V W - P H R A S E [ AG R-O meng | v P ... |j], if V W - P H R A S E h a s a phonetic matrix.
When the w/z-phrase has been copied to Spec,AGR-O, a phase is completed and the output must be optimized. If meng deletes, the w/z-phrase in S p e c , A G R - O may or may not be the one that retains its phonetic matrix. If meng fails to delete, the lower copy of the w/z-phrase - and not the one in the specifier position of A G R - O must be the one that retains its phonetic matrix. The upper copy (being stripped of its phonetic features) is, however, the one that will undergo further (and therefore invisible) movement (see footnote 17 for a precise argumentation). Thus, the fact that meng is never "crossed" by overt movement is derived. That the w/z-phrase is, apparently, never realized phonetically in Spec,AGRO seems to f o l l o w from the interaction of STAY* and WH-IN-SPEC. STAY* only favors the root position of a w/z-phrase, while WH-IN-SPEC disfavors the realization of w/z-phrases in anything but Spec,CP. We now seem to run into a problem, however: If optimization applies cyclically, WH-IN-SPEC forces the phonetic realization of a w/z-phrase in the lower Spec,CP-1 position when Σ is optimized in (21). Spec,AGR-O is then left with a w/z-phrase lacking a phonetic matrix - and this is the only one that can be copied to higher positions. Spec,CP-2 thus seems to never be able to acquire a phonetic matrix. (21)
[ Σ * Spec,CP-2 ... [ Σ Spec,AGR-O ... Spec,CP-l ... ] ]
Note, however, that we need a further principle anyhow in order to capture languages that neither f o l l o w the w/z-strategy (Chinese) nor partial movement (Malay). Consider in this respect PARSESCOPEO, borrowed from Legendre et al. (1998), but adapted to our current needs: (22)
PARSESCOPEO (Nonstandard formulation) If a has scope over β then the phonetic matrix of a c-commands the phonetic matrix of β.
Suppose PARSESCOPEO is tied with WH-IN-SPEC, while STAY* is low (or tied). If α is a w/z-phrase, the optimization of Σ in (23) will imply that a ' s phonetic matrix appears in S pec A G R - O - 1 in (23) - only STAY* runs counter to this conclusion, but STAY* has a lower rank. When Σ * is optimized, the
Economy of Pronunciation
117
phonetic matrix of a is moved even further, because PARSESCOPEO and WHIN-SPEC pull in the same direction. (23)
ΙΣ*** Spec,CP-2 ... Spec,AGR-0-2 ... | Σ „ Spec,CP-1 ... [ Σ Spec,AGR-0-l . . . a ... |]J]
The w/i-phrase appears now in the lowest Spec,CP position. The derivation bifurcates when Σ * * is reached: a climbs up phonetically if PARSESCOPEO is given more weight, while its phonetic material stays in Spec,CP-1 when the tie is resolved towards WH-IN-SPEC. The former derivation will finally copy the phonetic material of a further to Spec,CP-2 (because the two constraints in question have the same implications for the last derivational step); the latter cannot but leave a phonetically at Spec,CP-1. In other words, where there is a tie between WH-IN-SPEC and PARSESCOPEO, partial wA-movement arises. When PARSESCOPEO dominates WH-IN-SPEC, the phonetic material of the wA-phrase will be realized in the highest chain position under consideration (WH-IN-SPEC cannot block scope driven movement up to Spec,AGR-0) - this characterizes languages with full and multiple wA-movement like Romanian and Bulgarian. The factorial typology leads us to expect that there are also languages in which WH-IN-SPEC dominates PARSESCOPEO. Slave could be such a language: In Slave (Basilico 1998), the wh-in-situ strategy is less restrictive than overt movement, as (24-a-b) show: The complements of so-called "indirect discourse verbs" are barriers for overt movement, but wh-in-situ is licensed. It is thus surprising that (24-c), which involves partial movement within the island, is in fact grammatical, quite in contrast to what one would expect from the situation we found in Malay. (24) a. *?Ode netá nimbáa enáih?á kenéhdzáh where your father tent 3pitch 3tried 'Where did your father try to pitch the tent?' b. Raymond Jane judeni ri yili kodisho Raymond Jane where FOC 3is 3knows c. Raymond [ judeni ri Jane yili | kodisho 'Where does Raymond know Jane to be?' (= 1875a,b of Rice 1989) Slave differs from Malay in a further respect: There are configurations in which partial movement is ruled out, while complete movement is not: Complements of direct discourse verbs are transparent for movement, but disallow partial movement:
118
Gisbert Fanselow
& Damir
Cavar
(25) a.
John beya judeni ráwozée sudeli John my-son where 3opt-hunt 3wants 1 'Where does John want my son to hunt?' b. *John judeni beya ráwozée sudeli c. Hodi nurse egháuhndá néndi where nurse lopt-see-2sg 3told2 'Where did the nurse tell you she would see you?'
Basilico (1998) suggests that complements of direct discourse verbs lack a Spec,CP node, so that (25-b) cannot possibly arise. Thus, the grammaticality of (24-c) is the only surprising property of Slave. We can understand the contrast between (24-a) and (24-c), however, if we assume that complements of indirect discourse verbs are transparent for movement, but that some constellation C of principles rules out that any of the occurrences of w/z-phrases that have been copied out of the complement CP could ever bear a phonetic matrix: W/z-movement then has to be obligatorily "partial". We have already seen what this constellation of principles is: W H - I N - S P E C dominating P A R S E S C O P E O inevitably prevents w/z-phrases from leaving a Spec,CP position they have reached if w/z-movement needs to additionally pass through an AGR-O-position in the next higher clause. The predictions of the factorial typology constructible from STAY*, P A R S E S C O P E O and W H I N - S P E C are therefore borne out.
3
W7i-Copying
3.1 Preliminary Remarks on Question Formation in German Before we discuss w/¡-copying in German, we will start with a few remarks on the embedding of German in the principle system developed so far. German is not a wh-in-situ language, and does not allow partial w/z-movement of the kind we find in Malay languages, at least not in simple questions. In multiple questions, only one w/z-phrase appears in its scope position: (26) a.
Wen hat er wem gezeigt? whoAcc has he whopAT showed 'Who did he show to whom?' b. *Wen wem hat er gezeigt?
Economy of Pronunciation
119
On obvious grounds, (26) can be analyzed in two ways: Taking up ideas proposed by Grewendorf (1999) and Sabel (1998), we may hypothesize that all wA-phrases move to their scope position, but that there is a principle that bans the spelling out of more than one wA-phrase per Spec,CP position. Alternatively, we may follow Müller (1997) in the assumption that there is a principle that bans the (phrasal) movement of more than one wA-phrase to Spec,CP in German. The w/z-phrase in situ would then have to be bound in situ (or undergo feature movement in the system of Pesetsky 1998b). When the two w/z-phrases of a multiple question originate in different clauses, no uniform pattern emerges: In addition to the constellation in (27-a), which closely mirrors the English counterpart and which characterizes standard German, there are dialects in which the multiple question cannot be formed in the way it is in (27-a). 19 In such dialects, the lower wA-phrase either has to undergo "partial" w/i-movement to the specifier position of the complement clause (as in (27-b), which is acceptable for at least some speakers in Potsdam and surroundings), or the lower wA-phrase must be the one that undergoes overt movement (as in (27-c), blatantly violating superiority thereby). 20 (27) a.
b. c.
!Wer denkt, dass sie wen liebt? who thinks that she who loves ' W h o thinks that she loves who?' !Wer denkt, wen sie liebt? !Wen denkt wer, dass sie liebt?
The latter two dialects thus resemble Iraqi Arabic (see Ouhalla 1996) and Hindi (Mahajan 1990) in the sense that (a) wA-phrases in situ cannot take scope out of the minimal finite clause they are contained in (unless they fill this clause's specifier position) and (b) the distribution of wh-in-situ is therefore more constrained than the distribution of w/z-phrases that have undergone overt movement. If German w/z-phrases in situ are not moved covertly, and are subject to additional binding requirements of the sort we find in Iraqi Arabic (Ouhalla 1996), the unavailability of a multiple question interpretation for (27-a) in the relevant dialects is captured fairly easily, while it is less clear why covert movement should have to fulfill less liberal island conditions than overt movement, if the major difference between the two operations is one of the location of spellout. The dialects that rule out (27-a) thus suggest that German wA-phrases in situ do not involve covert movement. This is quite in line with the conclusions arrived at (for what he terms covert phrasal movement) by Pesetsky (1998b) on quite different grounds.
120
Gisbert Fanselow & Damir Cavar
Obviously, the dialects with the stricter binding restrictions on wh-in-situ solve the pertinent problem in two ways, either by allowing partial movement to move the wA-phrase to a position in which it can be bound from outside (27-b) or by moving the lower phrase directly to its scope position (as in (27-c)). The latter strategy cannot help with triple questions for straightforward reasons: The lowest wA-phrase still is separated from its scope position by a finite clause boundary. (28)
*Wen denkt wer, dass sie wem vorgestellt hat? whoAcc thinks who that she wJiodat introduced has 'Who thinks that she introduced who to whom?'
On the other hand, (29) is fine in the dialects that tolerate partial movement, so we must assume that wem fulfills the locality requirements on binding because of the presence of wen in the next Spec,CP position. (29)
!Wer glaubt wen sie wem vorgestellt hat? who believes who she who introduced has 'Who thinks that she introduced who to whom?'
Let us now turn to the different strategies of forming long-distance questions in German: Like Romani or Frisian, German is fairly rich in this respect. (30) a.
!Wen denkst du dass sie liebt? who think you that she loves b. Wen denkst du liebt sie? c. Was denkst du wen sie liebt? what think you who she loves d. !Wen denkst du wen sie liebt? who think you who she loves e. !/*Wen denkst du was sie liebt? 'Who do you think that she loves?'
(30-a) exemplifies standard long wA-movement of arguments, which is grammatical in some, 21 but not all varieties of German (see, e.g., Kvam 1983, Fanselow, Kliegl & Schlesewsky 2000). Extractions from so-called verbsecond-complements (30-b) are well formed in all dialects of German. 22 (30-c) exemplifies so-called "wA-scope marking", and is often analyzed as involving partial wA-movement plus the insertion of a scope marker (see McDaniel 1989, Müller 1997, and the contributions to Lutz et al. 2000). If Fanselow & Mahajan (1996, 2000) are correct, however, the constructions
Economy of Pronunciation 121 involve quite a different analysis. Presupposing that the latter approach is correct, we will ignore the construction in the rest of this paper. (30-d) is, however, a construction of particular interest in the context of pronunciation economy: It appears as if more than one copy within a wA-chain formed by overt movement is spelled out phonetically (Copy-Construction, CC). Similar constructions exist in Frisian (Hiemstra 1986), Afrikaans (du Plessis 1977) and Romani (McDaniel 1989). We will focus our attention on this construction in this section, after some brief comments on remaining options. McDaniel, Chiù & Maxfield (1995: 741) state that structures like (30-e) are ungrammatical in German although they are used, e.g., in Romani with the vwA-marker so. Comparable constructions involving what exist in Child English, but it is indeed hard to find native speakers of German who accept (30-e). Note that the "w/z-scope marker" so appearing in the Romani counterparts of (30-e) is homophonous with the complementizer in Romani, an observation that suggests a straightforward analysis of the construction: was/so/what is not a question word or a scope marker in (30-e), but rather the agreeing form of a complementizer - which agrees with its specifier position hosting a (silent) copy of a w/i-phrase created by overt movement of an element into an even higher clause. We assume that this analysis is correct. (31-a,b) are taken from Anyadi & Tamrazian (1993), who have located speakers who accept these sentences in the Ruhr dialect of German. This does not appear entirely correct, but there are dialects which tolerate these structures. 23 We will comment on (31) at the end of this section. (31) a.
b.
¡Welchem Mann glaubst du wem sie das Buch gegeben hat? which man believe you who she the book given has 'Which man do you think that she gave the book to?' ¡Mit welchem Werkzeug glaubst du womit Ede das Auto with which tool think you what-with Ede the car repariert hat? repaired has 'With which tool do you think that Ede repaired the car?'
122
Gisbert Fanselow & Damit Cavar
3.2 Some Facts about the Copy Construction Let us turn, then, to the Copy Construction (CC), and see how it fits into our analysis. The CC is characterized by a number of interesting generalizations, two of which are fairly standard. First, no copy may appear in the root position of the wA-chain. (32)
*Wen denkst du
wen sie | vp wen liebt |
who think you who she who loves 'Who do you think she loves?' Overt copies show up in Spec,CP only. If infinitive clauses have no Spec,CP position in German, the ill-formedness of (33) is explained immediately. (33) a. *Wen who b. *Wen who
versuchst du wen zu küssen? try you who to kiss batest du mich wen zu küssen? asked you me who to kiss
Keeping an overt copy in a root position (in addition to the copy in the scope position) not only implies violations of PRONECON, it also incurs a further violation of WH-IN-SPEC. AS long as there is no reason to keep a copy there (and there is none), (32)—(33) always give way to alternative structures in which the lowest copy is not spelled out. A second generalization concerns the nature of all overt copies of the whchain (but the lowest one): They may not be syntactically complex: (34) a. *Wessen Studenten denkst du wessen Studenten wir kennen? whose student think you whose Student ' W h o s e student do you think that we know?'
b. * Wieviel
Studenten denkst du wieviei
how many students kennen? know
think
we know
Studenten wir
you how many students
'How many students do you think that we know?' (35)
Womit denkst du womit er sie verletzt hat? with what think you with what he her hurt has 'With what do you think that he hurt her?'
we
Economy of Pronunciation 123 (36) (?*)Mit was denkst du mit was er sie verletzt hat? with what think you with what he her hurt has 'With what do you think that he hurt her?' While CCs that involve wA-phrases consisting of a single word are perfect in those dialects that allow the CC at all, the situation differs radically when the w/i-phrase is syntactically complex: In the CC ungrammaticality arises in quite a number of dialects/idiolects as soon as the upper copy contains two or more words (see (34)). (35)-(36) form a nice minimal pair in this respect - the structures do not differ in meaning but just in the fact that womit is a single word, in contrast to mit was. For some (most?) speakers, a contrast exists between (35) and (36) - with the ungrammaticality of the former example being rather mild only. There is practically nobody who would go beyond (36) in terms of the complexity of the copied w/i-phrase, though. It has been assumed (Fanselow & Mahajan 1996, Höhle 1996) that this anti-complexity restriction affects all copies in the same way, but this claim overlooks the greater flexibility we observe for the lowest copy: (37)
Wen denkst du wen von den Studenten man einladen sollte? who think you who of the students one invite should 'Which of the students do you think that one should invite?'
(38)
Wieviel sagst du wieviel Schweine ihr habt? how many say you how many pigs you have 'How many pigs do you say that you have?'
That there is a syntactically complex wA-phrase in the specifier position of the complement clause is obvious in (38), and can be argued for easily in the case of (37) as well: Note that the boldface material precedes the pronoun man, to the left of which it is normally impossible to scramble PPs. That the boldfaced material of the lower copy forms one constituent (and not two, with the PP being scrambled to second position) is also obvious in those dialects of German that allow a mixing of long movement and CC, so that constellations like (39) may arise, in which the boldface copy wen von den Studenten can have reached the intermediate clause only by mono-constituental wA-extraction, since there is no long scrambling in German. (39)
Wen denkst du wen von den Studenten sie sagte dass man who think you who of the students she said that one einladen sollte? invite should
124 Gisbert Fanselow ά Damir Cavar 'Which of the students do you think she said that one should invite?' (40), on the other hand, shows that it is not sufficient for grammaticality that one copy only in the chain is syntactically complex: (40)
*Wen von den Studenten denkst du wen man einladen sollte? which of the students think you who one invite should
Finally, in those dialects which have few problems with (36), (41) is perfect as well. PPs are strict islands for movement in German, so aus Konstanz could not possibly ever have left an wen aus Konstanz by standard movement. Thus, there is no alternative to an analysis of (41) in which the lower Spec,CP position is occupied by an wen aus Konstanz, and the upper one by an wen. (41)
An wen denkst du an wen aus Konstanz man das schicken at who think you to who from Constance one that send darf? may 'To which person from Constance do you think one is allowed to send that?'
The complexity restriction thus does not apply to the lowest copy. The complexity restriction holding for upper copies renders wA-copying ungrammatical whenever a wA-phrase cannot be split or "separated", as is, e.g., the case for w/i/c/i-phrases. (42) a.
Welches denkst du welches er nehmen wird? which think you which he take will 'Which one do you think he will take?' b. * Welches denkst Du welches Schweinderl er nehmen wird? which think you which piggie he take will 'Which piggie do you think he will take?' c. *Welches Schweinderl denkst Du welches Schweinderl er nehmen wird?
The CC obeys stricter locality restrictions than standard long overt movement, as Höhle (1996) and others have observed; consider (43) - corresponding examples with simple long movement would be grammatical. What we get is exactly analogous to the intervention effect Beck (1996) and Pesetsky (1998b) identify for German wh-in-situ: No operator may intervene between the copies of the wA-phrase.
Economy of Pronunciation
125
(43) a. *Wen glaubt keiner wen sie liebt? who believes nobody who she loves 'Who does nobody believe that she loves?' b. *Wen glaubt jeder wen sie liebt? who believes everybody who she loves 'Who does everybody believe that she loves?'
3.3 Three Analyses There have not been too many proposals for an analysis of the CC, but Inge Hiemstra's (1986) theory of the construction (and its Frisian counterpart) is certainly outstanding in many respects. Published nearly ten years before Chomsky ( 1995), her contribution preempts insights of much work in featural movement theory in a number of respects. The central idea of her analysis of (44) is that when a structure requires wA-movement, this may be carried out as any of the following: — — —
movement of the wA-feature alone, the pied-piping of the ^-features aligned with the wA-feature, or the pied-piping of the whole phrase bearing the wA-feature.
The resulting system is, thus, quite reminiscent of a movement theory generally adopted later in the mid-nineties. The first two options for effecting movement must then be complemented by a theory of spellout for the displaced feature complexes. According to Hiemstra, it is the most unmarked lexical element bearing the relevant features that will realize the feature complex in question. A single |+wh|-feature is therefore realized as was (= (44-a)), and the feature complex [+wh, 3rd ps., acc| as wen (= (44-b)). (44) a.
b. c.
Pure feature movement: Was denkst du wen sie eingeladen hat? what think you who she invited has Pied-piping of ^-features: Wen denkst du wen sie eingeladen hat? Pied-piping of full phrase: Wen denkst du dass sie eingeladen hat? 'Who do you think she has invited?'
126 Gisbert Fanselow & Damir Cavar If Fanselow & Mahajan (1996,2000) are correct, was in (44-a) is a sentential vWí-expletive originating in the object position of the matrix clause. The parallel between (44-a) and (44-b) would thus be a spurious one. It is also not too clear to what extent we want to consider wieviel as in (38) or an wen as in (41) to be mere spellouts of complexes of ^-features. For welche, one would have to give an answer to the question of why it allows 0-featural copying only if what is left behind is a single word item. Thus, a more promising version of her approach would seem to have to move closer to the ideas developed in Chomsky (1995), in assuming that the minimal element that can be moved overtly is the word carrying the attracted feature (= wen, wieviel, welche). We can then analyze the CC as a sequence of two types of movement steps, the first one involving the pied-piping of the phrase dominating the attracted word, the second one confining itself to the overt displacement of the word-level category. With the exception of those dialects that can prepose a minimal PP like an wen in the CC, this approach is descriptively correct, but it leaves open some questions: Why is the whword that moves in the second step not deleted in the phrasal copy, as in (45) (which would be grammatical if the second occurrence of wieviel had been kept phonetically)? (45)
*Wieviel denkst du wieviel Schweine sie sagt dass wir how many think you how many pigs she says that we haben? have 'How many pigs do you think she says that we have?'
Fanselow & Mahajan (1996, 2000) concentrate on the development of an account for (44-a), but add a sketch of an analysis of the CC, too. Their analysis makes use of the fact that the Comp position cannot be empty phonetically in German embedded clauses 24 (except in indirect question complements), and they assume that the copies of the w/z-phrase occupying the lower Spec,C position cliticize onto Comp whenever this position is not filled in another way. For wh-phrases of the size of a single word, this approach works smoothly, but they do not take into account the principled availability of more complex constructions like (37) and (38), for which it is hard to believe that the complete w/z-phrase occupies the Comp position. Pesetsky's (1997, 1998a) crucial insight concerning uneconomical pronunciations of w/i-chains is that they typically arise in contexts where standard movement would violate island conditions. In fact, one often finds the CC in dialects that do not allow long movement of arguments. In a form slightly
Economy of Pronunciation
127
adapted to the general approach we pursue here, the relevant Island Constraint takes the form (46): (46)
ISLAND
*a ... I Σ ... β ... I where α, β belong to a single chain, a or β are unpronounced, and Σ is an island. Suppose in the dialects allowing CCs, CPs are (or, can be) barriers for extraction. In the first derivational step for a long distance question, the w/z-word will be copied to Spec,C at some stage: (47)
wen dass du wen eingeladen hast who that you who invited have
Whether dass may be preserved in the overt presence of phonetic material in Spec,C is, partially, a function of the rank of the Doubly Filled Comp Filter in German - it is violable in some but not all varieties of German (see, e.g., Heck 1997 for a pertinent OT analysis), and we will not go into this issue here. When it comes to the optimization of the pronunciation of (47), a number of principles come into play. In addition to PARSESCOPEO introduced above, LEFTEDGECP taken over from Pesetsky (1998a: 341) 25 seems relevant: (48)
LEFTEDGECP (LEC)
The first pronounced word in a CP must be the complementizer projecting that CP. If PARSESCOPEO has a high rank in German (as seems to be the case), PARSESCOPEO » LEC guarantees that wen can be kept in the initial position of (47) whenever (47) represents a complete complement question or forms part of a larger movement structure. PRONECON implies that one of the two occurrences of wen disappears; if PARSESCOPEO LEC, it is the higher copy that must be retained. We thus end up with (49). (49)
wen dass du we» eingeladen hast who that you who invited have
Suppose that the head and the specifier of a CP/a phase (but no other elements) are accessible in the next optimization cycle. (50) represents the AGROP or vP of a matrix clause that will end up as a matrix question. If Σ is not interpreted as an island, PARSESCOPEO implies that the upper copy of wen is retained, and LEC implies that the lower copy of wen should disappear.
128
Gisbert Fanselow & Damir
(50)
Cavar
wen denkst | Σ wen dass du t eingeladen hast | who think 2sg who that you invite have
If Σ is interpreted as an island, ISLAND blocks the deletion of the lower copy if ISLAND » LEC. In fact, we might capture the dialectal variation we find in German by assuming that Σ = CP is always counted as a barrier and that the ranking of ISLAND and LEC is not fixed in the same way in the various dialects of German: the copy construction arises when ISLAND » LEC, while we get long movement when LEC » ISLAND, as tables 1 and 2 show. (51 ) a. b. c. ο.
wen denkst wen dass du eingeladen wen denkst we» dass du eingeladen wen denkst wen dass du eingeladen wen denkst wen dass du eingeladen Table 1 m- (51-a) (51-b) (51-c) (51-d)
PARSESCOPEO
Table 2 (51-a) ISR (51-b) (51-c) (51-d)
PARSESCOPEO
hast hast hast hast
ISLAND
LEC *
*! *!
*
*!
*
LEC
ISLAND
*! *
*!
*
*!
*
A cyclic application of the principles discussed so far, together with the assumption that the specifiers and heads remain accessible for optimization from outside, thus yields the CC under the ranking given in table 1. Why is it that upper copies must be non-complex? We can derive this from the principles PRONECON and PARSESCOPEO if we interpret them properly. In showing how, we may confine our attention to the derivational step linking a Wz-phrase in Spec,C to its first landing site in the matrix clause. In (53), we ignore one candidate structure to which we will return later. (52)
wen von den Studenten denkst wen von den Studenten dass ... who of the students think who of the students that
(53) a. b.
wen von den Studenten denkst wen von den Studenten dass ... wen von den Studenten denkst wen von den Studenten dass ...
Economy of Pronunciation c.
129
wen von den Studenten denkst wen von den Studenten dass ...
Presupposing the ranking ISLAND » LEC, we need to consider only those candidates that retain phonetic material both in the lower Spec,C and in the first matrix clause landing site position. Obviously, PRONECON is violated more often in (53-a) than in (53-b,c) - it has three more words than the two other candidates. If PRONECON is ranked below ISLAND (so that copying is possible at all), the former principle will still block (53-a). The decision between (53-b,c) seems to follow from PARSESCOPEO if we interpret the principle as applying to semantic units, and if we make the fairly standard assumption that the restrictor of a wA-quantifier should not appear in the scope position of the operator, i.e., if (54-a) is preferred over (54-b) (see, e.g., Chomsky 1993, Fox 1995). (54) a. b.
wh-x (Pred(jc))... wh-x, χ a Pred χ ...
The higher the restriction of a w/z-operator is moved in a tree, the more violations of PARSESCOPEO arise relative to it, so that (53-b) is favored over (53-c). The core properties of the CC thus seem derived. Note, however, that an account of (53-b) vs. (53-c) in terms of PARSESCOPEO makes the incorrect prediction that a w/î-phrase that can be split up must be so. This is false, as (55) shows. (55) a.
(i)
b.
(ii) (i)
c.
(ii) (i)
(ii)
Wen von den Studenten kennst du? who of the students know you 'Who of the students do you know?' Wen kennst du von den Studenten? Was für Studenten kennst du? was for students know you 'What kind of students do you know?' Was kennst du für Studenten? Wieviel Studenten kennst du denn? how many students know you then 'How many students do you know?' Wieviel kennst du denn Studenten?
We therefore need a principle that penalizes structures that have been contiguous at level L but cease to be so at level L'.
130
Gisbert
(56)
Fanselow
& Damir
Cavar
Contiguity In Syntax (CIS) The phonetic material corresponding to a constituent must be spelled out in one position only.
CIS disfavors separation, whereas PARSESCOPEO requires it. When the two constraints are tied, the constellation we find in (55) arises.26 The tie with PARSESCOPEO implies a fairly high rank for CIS; in particular, it dominates PRONECON. Therefore, we must understand CIS in such a way that it is satisfied when there is at least one copy of a phrase that is pronounced in an unsplit fashion - otherwise, the CC would be ruled out because it would always imply a CIS violation. We must make sure, however, that the tie between PARSESCOPEO and CIS does not imply that complex wA-phrases (which always contain a restrictor that should be left in situ) do not have to move at all (because the PARSESCOPEO violation by the operator part is always counterbalanced by the PARSESCOPEO violation of the restrictor). This is effected by the principle WH-IN-SPEC introduced above. Consider, then, the consequence of CIS for the two crucial movement steps - the one from the root position to Spec,C, and the subsequent step mapping the wA-phrase into the matrix clause. As table 3 shows, we correctly predict the distribution of grammaticality in the first movement step of (57). (57)
wen von den Studenten du wen von den Studenten einlädst? who of the students you who of the students invite
(58) a. b. c. d.
wen wen wen wen
von von von von
Table 3 (58-a)
den den den den
Studenten Studenten Studenten Studenten
du wen du wen du we« du wen
PARSESCOPEO/CIS * (restrictor)
IS- (58-b)
*(restrictor)
e r (58-c)
*(contiguity)
(58-d)
* (operator)
von von von von
den den den den
ISLAND
Studenten Studenten Studenten Studenten
einlädst einlädst einlädst einlädst
WH-IN-SPEC
PE
*
****
LEC
*
If the derivation proceeds with (58-c), nothing new happens: A w/¡-phrase consisting of a single word cannot violate CIS. If (58-b) is chosen, we proceed as discussed in the context of (53-a-c) (repeated here as (59-a-c)), but it is also obvious that we deliberately ignored a candidate above: (59-d). Nevertheless, (59-b) is still optimal: CIS is respected by the lower copy, we do not
Economy of Pronunciation 131 move the restrictor of the w/i-operator further up, and the wA-operator moves to a position c-commanding its scope. (59) a. b. c. d.
... ... ... ...
wen wen wen wen
von von von von
Table 4 (59-a)
den den den den
Studenten Studenten Studenten Studenten
denkst denkst denkst denkst
PARSESCOPEO/CIS
wen wen wen we»
von von von von
ISLAND
den den den den
Studenten Studenten Studenten Studenten
WH-IN-SPEC
*'.(restrictor)
os- (59-b) (59-c)
* ¡(restrictor)
(59-d)
* ¡(contiguity)
dass dass dass dass PE
LEC
****
*
*
*
* *
... ... ... ...
* *
Therefore, we have derived the fact that the copy construction allows complex overt w/z-phrases in the lowest Spec,C position only. As we have remarked above, this restriction can be minimally violated in certain dialects in which a PP may be copied; cf. (36), repeated here as (60). (60) (?*)Mit was denkst du mit was er sie verletzt hat? with what think you with what he her hurt has 'With what do you think that he hurt her?' Given what we have seen so far, the optimal candidate should be one that "strands" the preposition in the lower copy. For the constraint that makes (60) possible by overriding PRONECON, a natural formulation comes to mind. Note that the copying operation moves a PP category upwards, and one may assume that phonetic material that does not contain a preposition in the head position cannot be a phonetic realization of a PP: (61)
LeftEdgePP (LEP) The leftmost element realized phonetically in a PP must be the preposition from which that PP was projected.
If L E P is ranked above PRONECON, the prepositional head may be retained in a CC. Our final task in describing the CC is a discussion of the intervention effect. As (62) shows (see also Beck 1996), an operator like sentential negation cannot intervene between a w/z-phrase and its restrictor - the problem can be solved in various ways, by respecting contiguity (62-a) or by scrambling the w/z-phrase in front of the operator (62-c) before it is split up.
132 Gisbert Fanselow & Damir Cavar (62) a.
Was für Bücher hat er nicht gelesen? what for books has he not read b. * Was hat er nicht für Bücher gelesen? c. Was hat er für Bücher nicht gelesen?
According to Pesetsky (1998b), the relevant anti-intervention constraint implies that a w/i-operator must not be separated from its restrictor by a further operator. This applies to those copy constructions in a straightforward way in which a restrictor is indeed stranded; the same effect shows up in (63), i.e., when the wA-phrase consists of a single word only. (63) fits into Pesetsky's proposal if we assume that, semantically, there is a restrictor part present for wen as well, and that this restrictor part is left behind with the lowest visible copy of the phrase. Alternatively, the intervention constraint might be reformulated so that it requires that links in a chain in which both elements contain either phonetic or semantic material must not be separated by an operator. (63) a. *Wen glaubt keiner wen sie liebt? who believes nobody who she loves 'Who does nobody believe that she loves?' b. *Wen glaubt jeder wen sie liebt? who believes everybody who she loves 'Who does everybody believe that she loves?' The only potential problem that arises, then, is related to ineffability. A sufficiently high rank of the intervention constraint will be able to block the copy construction in the situations where this is called for, so that the winning competitor is a long movement construction. The same consequence arises for constellations in which the w/i-phrase must not be split up (w/uc/i-phrases, or PPs, in certain dialects). This is an acceptable result for those dialects in which the copy construction co-exists with long movement, but it does not capture the ineffability effect that can arise for long distance dependencies when a dialect forbids long movement and an intervention effect rules out the copy construction at the same time. A standard solution (see Legendre et al. 1998) would be to rank the intervention constraint higher than faithfulness constraints concerning scope assignments for the intervening operators.
Economy of Pronunciation
133
3.4 Some Related Issues As we have mentioned above, certain varieties of German (e.g., Lower Rhine area, Bavarian Suabia) allow questions to be constructed in the form given in (64). (64)
Welchen Mann denkst du wen er kennt? which man think you who he knows 'Which man do you think he knows?'
We have little to say about this construction, except for the observation that it is not likely to be a subcase of a CC. While the lower occurrence of a whelement is a non-complex one, it does not copy the wA-operator of the upper wÄ-phrase. It is rather the minimal spellout of the w/z-features that should be present in the lower Spec,C position, as (64) and (65) show: 27 (65) a.
Wieviel Bier denkst du was ertrinkt? how-much beer think you what he drinks 'How much beer do you think that he drinks?' b. *Wieviel Bier denkst du wieviel er trinkt?
One simple analysis would analyze wen and was as agreeing forms of the complementizer. This would be consistent with the observation that (31-b) is judged worse than (31-a) (repeated here as (66)), if we assume that womit makes a bad [+wh|-complementizer. (66) a.
b.
¡Welchem Mann glaubst du wem sie das Buch gegeben hat? which man believe you who she the book given has 'Which man do you think that she gave the book to?' !Mit welchem Werkzeug glaubst du womit Ede das Auto with which tool think you what-with Ede the car repariert hat? repaired has 'With which tool do you think that Ede repaired the car?'
Alternatively, the contrast in (66) might be caused by the fact that German dialects tend not to block standard long movement when PPs are affected, so that (66-b) might be blocked by a candidate involving long movement. We might also consider wen, was, and womit as spellout forms for φfeatures of a wA-phrase that has lost its original phonetic content. In a dialect that ranks ISLAND over LEC, the insertion of "expletive" phonetic material
134 Gisbert Fanselow & Damir Cavar spelling out wh-φ-features is an alternative means of avoiding an ISLANDviolation. Suppose that FI blocks the use of expletive material, and suppose that PRONECON works in such a way that it blocks the repetition of phonetic material only. Then if FI » PRONECON, we still get the CC, but if the ranking is reversed, the situation in (64)-(66) arises. We do not wish to commit ourselves to this analysis, though. German dialects allow at least two further examples of "uneconomical pronunciation". When the highest verbal element of a clause is topicalized as in (67), the problem arises that the second position should be filled by exactly this element, too, given that the V/2 constraint is not violated in German. The problem may be solved by expletive insertion (67-b) or by uneconomical pronunciation (as in (67-a)). Retention of two copies is confined to modals, though. (67) a.
Können kann ich nicht can can I not Ί am not ABLE to.' b. Können tue ich nicht can do I not c. Schlafen tue ich nicht sleep do I not 'Sleeping is not what I do.' d. *Schlafen schlafe ich nicht sleep sleep I not
In a simple split construction (68) (see also next section), the phonetic material belonging to a single constituent is distributed over two places in the sentence without any repetitions, but there are two exceptions to this property of split XPs. In those dialects in which a PP can enter the split construction, the preposition must appear in both positions in which the PP is spelled out partially (68-b). 28 Given the high rank of LEP, this is not unexpected. Likewise, van Riemsdijk (1989) observes that the indefinite article may (and sometimes must) be repeated in split noun phrases - a fact we can relate to the observation that singular count noun phrases may never be realized phonetically without an initial determiner. (68) a.
Teure Bücher habe ich viele expensive books have I many Ί have many expensive books.'
Economy of Pronunciation
135
b.
In Schlössern habe ich noch in keinen gewohnt in castles have I yet in no lived Ί have not yet lived in castles.' c. *In keinen habe ich noch in Schlössern gewohnt d. Einen amerikanischen Wagen kann ich mir nur einen grünen an American car can I me only a green leisten afford Ί can only afford a green American car.' Thus, the split construction supports the idea that the economization of pronunciation is in general quite subject to other constraints.
4 A Few Remarks on the Split Construction and So-Called Head Movement A detailed analysis of the split construction is beyond the scope of the present paper, and would mostly repeat what is said in Cavar & Fanselow (1997, 2000). The following remarks are meant to prepare the discussion of a further advantage of a pronunciation economy account: It gives a straightforward analysis of so-called head movement. That constructions apparently involving rightward movement might (at least in some contexts) have to be reanalyzed as resulting from the stranding of phonetic material in a leftward movement operation has been proposed, e.g., by Kayne (1994) and by Wilder (1995). It also seems obvious that the "stranding" of β (say, a relative clause) in the process of moving Σ (say, a DP) in (69) could be the result of an incomplete deletion 29 in the source position of movement, followed by an erasure of β in the target position, due to PROΝECOΝ. ( 6 9 )
... Ι Σ Α β ι ... = • . . . \ Σ α β I ... [Σαβ I ...=> ... [Σαβ I ...[τ et β | ... =• ...\Σα·β \ ... [τ• souvent Marie | VP embrasse Marie | =>• souvent Marie | vp embrasse t | = > · I vp embrasse t | souvent Marie | yp embrasse t | =>• I yp embrasse t | souvent Marie t
The movement operations that "evacuate" the VP or any other XP before "head" movement are typically unmotivated in terms of feature checking or
Economy of Pronunciation 141 semantic considerations, which may (but need not) create a problem in minimalist accounts, but will not necessarily in OT, since the relevant STAY* violations may be called for by the need to respect a higher ranked NPS or OPW. But we have known since Wexler & Culicover (1980) that movement has a freezing effect on a phrase Ρ in the sense that Ρ becomes an island after movement, as demonstrated impressively in Müller (1998b). "Head" movement constellations do not induce such island effects, however: (81 )
Über wen liest sie eine Geschichte t about who read she a story 'Who is she reading a story about?'
In (81), the verb has been preposed. If this implies phrasal fronting in the sense that eine Geschichte über wen has been extracted out of VP before the rest of VP (= liest) was preposed, then eine Geschichte über wen should have become an island for movement, which it has not: über wen can still be moved out of this phrase. Our account avoids this problem because the splitting up of the VP into a preposed verb and a stranded rest is effected by pronunciation laws (and not by remnant movement) - there is no particular reason why the object should thereby become an island. A comparison of attempts to guarantee that certain movements have no freezing effects (Müller 1999) with our approach may thus help to identify the proper way of eliminating head movement.
Notes Parts of this paper were presented at the Workshop on Conflicting Rules in Phonology and Syntax at the University of Potsdam (Dec. 15-16, 1999), and at the Linguistics Colloquium at the University of Leipzig. We would like to thank the audiences for useful comments and criticism. Particular thanks for helpful hints and for support in various respects related to this article go to Joanna Btaszczak, Caroline Féry, Susann Fischer, Gereon Müller, and Douglas Saddy. We would also like to thank Artemis Alexiadou, Hans-Martin Gärtner, Anoop Mahajan, Matthias Schlesevvsky, Peter Staudacher, and Chris Wilder. Research for this paper was supported by the grant INK 12/B1 "Innovationskolleg Formale Modelle kognitiver Komplexität" financed by the Federal Ministry of Education and Research and administered by the German Research Foundation. The paper was written while Damir Cavar was employed by the University of Potsdam. 1. However, we do not share Pesetsky's view that the optimality theoretic aspect of syntax is confined to such principles of sentence pronunciation.
142
Gisbert Fanselow & Damir Cavar
2. In an OT-framework, one expects that the link between the (uninterpretable) features of an attracting head and the creation of copies can be violated in two ways: There should be movement that does not check such features, and there should be uninterpretable features that are not checked by movement/copy formation. The former assumption helps at least in analyzing non-terminal movement steps in cyclic extractions; see also Chomsky (1998) for an approach in which the strict connection between the triggering of movement and feature checking is given up. 3. We confine our attention here to those candidates that are well formed with respect to conditions such as subjacency, those in which no superfluous movement has taken place, those in which every necessary movement step has been carried out, etc., i.e., we confine our attention to the effects of deletion on chains that are fully grammatical in every other respect. 4. Given that PRONECON never seems to outrank RECOV, it is more adequate to collapse the two principles into a single one that just bans the realization of phonetic material that is predictable from the syntactic environment. This principle would also be more in the spirit of Pesetsky (1998a). 5. That there will be phonetic material in one copy only follows from (7) if we make a further assumption: The phonetic material must not be scattered over the various copies in the chain. See below for an elaboration of this point. 6. PRONECON is an obvious extension of Pesetsky's (1998a: 344) TEL-principle, which penalizes the use of a phonetic matrix for function words. 7. See Pesetsky (1998a: 342) for a slightly different version of this principle. 8. In a considerable number of languages (see Fanselow 2000 for an overview), a further option exists which is also discussed under the heading "partial whmovement": (i)
Was denkst du wie die Rosen riechen? what think you how the roses smell
Wie seems to have matrix scope in (i), yet it is moved to the specifier position of the complement clause only. The apparent scope position of the w/i-phrase is filled by a different ννΛ-element (was 'what'). Malay and Bahasa Indonesia lack this element in the scope position, at least overtly. There are various proposals for the proper analysis of (i) (see, e.g., the contributions to Lutz, Müller & v. Stechow 2000). If Fanselow & Mahajan (1996,2000) are correct, both was and wie are in fact in their scope positions, i.e., the partial nature of wA-movement in (i) is only apparent. 9. Alternatively, the lack of island effects for argumentai ννΛ-phrases in situ can also be explained by assuming that they are subject to featural attraction in the sense of Chomsky (1995) and Pesetsky (1998b), if featural attraction (or "Agreement", see Chomsky 1998) is not constrained by subjacency, as Pesetsky (1998b) argues. In such an account, there are two types of "covert" movement: standard phrasal
Economy of Pronunciation
143
copying followed by the deletion of the phonetic material in the landing site (= the topic of the present paper), and featural attraction. 10. An EPP-feature in the sense of Chomsky (1998), or a D-feature, as argued by Fanselow & Mahajan (2000). 11. More precisely, it is island effects related to subjacency that one does not expect. Intervention effects as discussed by Beck (1996) or Pesetsky (1998b) are not excluded in this way. Furthermore, the binding of w/z-phrases in situ may be subject to binding conditions, as Ouhalla (1996) points out. That the binding option is restricted to wA-phrases in situ is a consequence of the fact that wA-phrases can appear in the specifier position of a CP only if they have been attracted to that position. In other words, if the Comp-position to which the wA-phrase is semantically linked has an attracting feature, this feature must be checked by copying, so that island effects automatically become relevant. 12. The feature attracting wA-phrases to Spec,CP can be optionally absent in languages with "overt wA-movement" (as seems to be the case for French matrix Comps) and in languages without it (Chinese). German is a language with "overt" wA-movement in which the wA-attracting feature of Comp cannot be absent (if we ignore echo questions). Hindi, on the other hand, seems to be a wh-in-situ language in which wh-in-situ phrases must not appear in islands (see Mahajan 1990). In the system presupposed above, this array of facts suggests that the attracting feature of Comp is again always present (wharguments cannot be simply bound by an operator, just as in German). Therefore, there seem to be languages with (German) and without (Hindi) overtly visible wA-movement which require the wA-attracting feature of Comps to be present obligatorily. The only option that appears to be unrealized is a language in which the attracting feature of Comp is obligatorily absent (in such a language, adjunct questions could not arise at all). 13. Movement that is string-vacuous in the strictest possible sense is thus not penalized by STAY* if the principle relates to the realization of phonetic matrices. This may be a welcome result for various kinds of "evacuation" operations necessary in the context of remnant movement that are not feature driven. See, e.g., Müller (1999) for a discussion. 14. Violations of STAY* must be assumed to not be cumulative, because otherwise w/î-phrases would then move to the lowest Spec,C position only. 15. That island effects can be captured straightforwardly in our system has been shown above - the pertinent argument Cole & Hermon bring forward in this respect applies to a particular formulation of what they call "Multiple Spellout" theories only, but not to the system we develop here. 16. More precisely, the GEN component produces some candidates in which all copies in a chain retain their phonetic matrices, and others in which all but one have lost theirs (and still many further candidates), and only the latter have a chance of winning the competition because they violate PRONECON as little as
144
Gisbert Fanselow & Damir Cavar
is possible in the light of RECOV. We will continue to use (slightly misleading) derivational formulations of optimization for pronunciation, however, because they are simpler. 17. More precisely, we assume that "deletion" means that the phonetic information of (part of) a copy is marked as unpronounced. In principle, such a phonetic matrix could be promoted to the status of being pronounced in later derivational steps, but such syntactic objects are not likely to be optimal candidates. To see why, consider the following derivation. Suppose that Β = [ A ... A has been formed, with Β being cyclic, so that the optimal pronunciation must be selected. Suppose B* = |Ar ... A2 ...1 is optimal. Then B* may be subjected to further operations, but note that only the specifier (and perhaps the head) of B* is accessible to such further copying or deletion steps (Chomsky 1998); that is, in particular, the phonetic matrix of A2 can no longer be affected - it cannot be marked as unpronounced in further derivational steps. Thus, when the object G = [ Ao ...Is* A t . . . A2 ] ...1 is subjected to EVAL, Ao must lose its phonetic matrix as long as chains are realized in one position only (= the standard case), because the decision that A2 is pronounced can no longer be retracted. In other words, Ao has a chance for phonetic realization only if other principles than RECOV crucially override PRONECON. See below for such a case. 18. Wilder (1995) shows that deletion operations operating from left (retained) to right (deleted) are subject to intervention effects from certain heads. Mengdistribution fits this proposal, because the phonetic presence of meng excludes the transformation of... XP ... meng ... XP ... to ... XP ... meng ... XP ..., so that we are left with ... XP... meng ... XP... as the only option. 19. At least this holds for questions asking for lists. 20. "!" indicates that the structure is acceptable in some versions of German only. 21. Long movement is, e.g., certainly acceptable in all dialects spoken to the south of the Main river (the so-called white-sausage equator), but also in Eastfalian and in some dialects in Schleswig-Holstein. 22. According to Reis (1995), however, this construction is parenthetical in nature and therefore does not involve movement at all. 23. It is hard to find speakers from the Ruhr area (Dortmund, Bochum) who accept and use these constructions, but they seem fine to at least some speakers from the Lower Rhine area (around Wesel, Kleve), Eastern Westfalia (around Bielefeld), and Bavarian Suabia (Kempten). Thanks to Susanne Anschütz, Daniel Btiring, Sascha Felix, Peter Gebert, Barbara Lenz, Sandra Muckel, Barbara Stiebels and in particular to Susann Fischer for their help in this respect. 24. Recall that in German verb-second complement clauses like (i), the embedded Comp position is filled by the verb attracted to that position. (i)
Ich denke er hat sie geküsst. I think he has her kissed Ί think he has kissed her.'
Economy of Pronunciation
145
25. Pesetsky (1998a) later revises this first version of LEC. The reformulation is not relevant for the type of data we discuss in the main body of the article, though. 26. Given the cyclic nature of optimization, there seems little hope for an attempt to guarantee that the tie between CIS and PARSESCOPEO is not resolved differently in different movement steps. When a phrase is split, it will not be put together in later derivational steps for obvious reasons, but, on the other hand, one expects a phrase to be able to split at later derivational steps as well, (ii), modeled after corresponding Dutch data in Barbiers (1999), suggests that at least for some versions of German and some wA-phrases, the expectation is borne out. In addition, (ii) corroborates the view that long movement passes through an AGR-O position, as Barbiers observes. (i)
Was für Frauen hast denn du gedacht dass er einladen will? what for women have ptc you thought that he invite wants
(ii)
Was hast denn du für Frauen gedacht dass er einladen will?
(iii)
?Was hast denn du gedacht dass er für Frauen einladen will?
27. We are obliged to Susann Fischer for helping us get informants' judgments here. 28. We owe this observation to Josef Bayer, p. c. 29. This has been proposed recently by Hinterhölzl (1999). 30. Caroline Féry (p. c.) suggests the following alternative explanation for split XPs that avoids the assumption of specific focus positions: XPs are split because of their suboptimal phonological properties. Notice that two prominent accents should not be adjacent in a string. If a noun phrase has two independent foci, it must realize two prominent accents. The splitting of the phrase avoids a situation in which these two accents would be too close to each other.
References Ackema, Peter — Ad Neeleman 1998 WHOT? In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis and D. Pesetsky (eds.) Is the Best Good Enough?, 15-33. Cambridge, MA: MIT Press. Anyadi, Stefanie — Armine Tamrazian 1993 W/z-movement in Armenian and Ruhr German. VCL Working Papers in Linguistics 5: 1-22. Aoun, Joseph — Yen-Hui A. Li 1993 W/z-elements in situ: Syntax or LF? Linguistic Inquiry 24:199-238.
146
Gisbert Fanselow & Damir Cavar
Barbiers, Sjef 1999 Remnant stranding and the theory of movement. Paper presented at the Workshop on Remnant Movement, Feature Movement and Their Implications for the T-Model. Potsdam, July 1999. Basilico, David 1998 WA-movement in Iraqi Arabic and Slave. The Linguistic Review 15(4): 301-339. Beck, Sigrid 1996 Quantified structures as barriers for LF movement. Natural Semantics 4: 1-56.
Language
Bobaljik, Jonathan 1995 Morphosyntax. The syntax of verbal inflection. Ph.D. dissertation, MIT. Cavar, Damir 1999 Aspects of the syntax-phonology interface. Ph.D. dissertation, University of Potsdam. Cavar, Damir. — Gisbert Fanselow 1997 Split constituents in Germanic and Slavic. Paper presented at the International Conference on Pied Piping, Jena. Cavar, Damir — Gisbert Fanselow 2000 Discontinuous constituents in Slavic and Germanic languages. Ms., University of Potsdam. Chomsky, Noam 1986 Barriers. Cambridge, MA: MIT Press. Chomsky, Noam 1993 A minimalist program for linguistic theory. In: K. Hale and S.J. Keyser (eds.) The View from Building 20, 1-52. Cambridge, MA: MIT Press. Chomsky, Noam 1995 The minimalist program. Cambridge, MA: MIT Press. Chomsky, Noam 1998 Minimalist inquiries: The framework. Ms., MIT. Chomsky, Noam — Howard Lasnik 1993 The theory of principles and parameters. In: J. Jacobs, A. v. Stechovv, W. Sternefeld and Th. Vennemann (eds.) Syntax: An International Handbook of Contemporary Research., 506-569. Berlin: de Gruyter. Cole, Peter — Gabriella Hermon 1998 The typology of w/i-movement: Wh-questions in Malay. Syntax 1: 221258. du Plessis, Hans 1977 Wh movement in Afrikaans. Linguistic inquiry 8: 723-726.
Economy of Pronunciation
147
Fanselovv, Gisbert 1988 Aufspaltung von NPn und das Problem der 'freien' Wortstellung. Linguistische Berichte 114: 91-113. Fanselovv, Gisbert 1993 Die Rückkehr der Basisgenerierer. Groninger Arbeiten zur chen Linguistik 36: 1 -74.
Germanistis-
Fanselovv, Gisbert 2000 Partial movement. SynCom Project. Utrecht Institute of Linguistics. Fanselow, Gisbert — Anoop Mahajan 1996 Partial movement and successive cyclicity. In: U. Lutz and G. Müller (eds.) Papers on Wh-Scope Marking, 131-161. (Arbeitspapier des Sonderforschungsbereichs 340, No. 76.) Stuttgart & Tübingen. Fanselovv, Gisbert — Anoop Mahajan 2000 Towards a minimalist theory of w/z-expletives, wA-copying, and successive cyclicity. In: U. Lutz, G. Müller and A. v. Stechow (eds.) Wh-Scope Marking. Amsterdam: Benjamins. Fanselow, Gisbert — Reinhold Kliegl — Matthias Schlesewsky 2000 'Long' movement in Northern German: A training study. Ms., University of Potsdam. Fox, Danny 1995 Condition C effects in ACD. MIT Working Papers in Linguistics 27: 105120. van Geenhoven, Veerle 1996 Semantic incorporation and indefinite descriptions. Ph.D. dissertation, University of Tubingen. Grevvendorf, Günther 1999 Multiple w/¡-fronting. Ms., University of Frankfurt. Grimshaw, Jane 1997 Projection, heads, and optimality. Linguistic Inquiry 28: 373-422. Groat, Erich — John O'Neil 1996 Spellout at the LF-interface. In: W. Abraham, S. D. Epstein, H. Thrainsson and J. W. Zvvart (eds.) Minimal Ideas: Syntactic Studies in the Minimalist Framework, 113-139. Amsterdam: Benjamins. Heck, Fabian 1997 Komplementierer und ihre Spezifikatoren. Ms., University of Tübingen. Heck, Fabian — Gereon Müller 1999 Repair is local. Paper presented at the Workshop on Conflicting Rules in Phonology and Syntax. Potsdam, December 1999. Hiemstra, Inge 1986 Some aspects of w/i-questions in Frisian. NOWELE 8: 97-110.
148
Gisbert Fanselow & Damir Cavar
Hinterhölzl, Roland 1999
Licensing movement and stranding in the West Germanic OV languages. Ms, University of Potsdam. Höhle, Tilman 1996 German w...vv-constructions. In: U. Lutz and G. Müller (eds.) Papers ort Wh-Scope Marking, 37-58. (Arbeitspapier des Sonderforschungsbereichs 340, No. 76.) Stuttgart & Tubingen. Huang, C.-T. James 1981 Move vvh in a language without wA-movement. The Linguistic Review 1 : 369-416. Kayne, Richard 1994 The antisymmetry of syntax. Cambridge, MA: MIT Press. Kayne, Richard 1998 Overt vs. covert movement. Syntax 1: 128-191. Koopman, Hilda — Anna Szabolcsi 1999 Verbal complexes. Ms., UCLA. Kuhn, Jonas 1998 Resource sensitivity in the syntax-semantics interface and the German split NP construction. Ms., Universität Stuttgart. Kvam, Sigmund 1983 Linksverschachtelung im Deutschen und Norwegischen. Tübingen: Niemeyer. Legendre, Géraldine in press An introduction to optimality theoretic syntax. In: G. Legendre, J. Grimshavv, and S. Vikner (eds.) Optimality Theoretic Syntax. Cambridge, MA: MIT-Press. Legendre, Géraldine — Paul Smolensky — Colin Wilson 1998 When is less more? Faithfulness and minimal links in w/i-chains. In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis and D. Pesetsky (eds.) Is the Best Good Enough?, 249-289. Cambridge, MA: MIT Press. Lutz, Uli — Gereon Müller — Arnim von Stechovv (eds.) 2000 Wh-Scope Marking. Amsterdam: Benjamins. Mahajan, Anoop 1990
The A/A-bar distinction and movement theory. Ph.D. dissertation, MIT.
Mahajan, Anoop 1999
Against head movement in syntax. Ms., UCLA.
McDaniel, Dana 1989 Partial and multiple w/¡-movement. Natural Language and Theory 7: 565-604.
Linguistic
Economy of Pronunciation
149
McDaniel, Dana — Bonnie Chiù — Thomas L. Maxfield 1995 Parameters for w/i-movement types: Evidence from child English. Natural Language and Linguistic Theory 13: 709-753. Müller, Gereon 1997 Partial w/i-movement and optimality theory. The Linguistic Review 14: 249-306. Müller, Gereon 1998a Order preservation, parallel movement, and the emergence of the unmarked. Ms., ROA 275-0798. Müller, Gereon 1998b Incomplete Category Fronting. Dordrecht: Kluwer. Müller, Gereon 1999 Shape conservation and remnant movement. To appear in: Proceedings of the 30th Annual Conference of the North-Eastern Linguistic Society. Amherst, MA: GLSA. Ouhalla, Jamal 1996 Remarks on the binding properties of w/i-pronouns. Linguistic 27: 676-708.
Inquiry
Pesetsky, David 1997 Optimality theory and syntax: Movement and pronunciation. In: D. Archangeli and D.T. Langendoen (eds.) Optimality Theory: An Overview, 134-170. Blackwell: Oxford. Pesetsky, David 1998a Some optimality principles of sentence pronunciation. In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis and D. Pesetsky (eds.) Is the Best Good Enough?, 337-383. Cambridge, MA: MIT Press. Pesetsky, David 1998b Phrasal movement and its kin. Ms., MIT. Reis, Marga 1995 Extractions from verb-second clauses in German? In: U. Lutz and J. Pafel (eds.) On Extraction and Extraposition in German, 45-88. Amsterdam: Benjamins. Rice, Keren 1989 A Grammar of Slave. Berlin: Mouton de Gruyter. van Riemsdijk, Henk 1989 Movement and regeneration. In: P. Benincà (ed.) Dialect Variation and the Theory of Grammar, 105-136. Dordrecht: Foris. Roberts, Ian 1991 Excorporation and minimality. Linguistic Inquiry 22: 209-217.
150
Gisbert Fanselow & Damir Cavar
Roberts, Ian 1997 Restructuring, head movement, and locality. Linguistic Inquiry 28: 423460. Roberts, Ian 1998 Have/Be raising, Move F and Procrastinate. Linguistic Inquiry 29: 113125. Sabel, Joachim 1998 Principles and parameters of w/z-movement. Habilitation thesis, University of Frankfurt. Saddy, Doug 1991 Wh-scope mechanisms in Bahasa Indonesia. MIT Working Papers in Linguistics 15: 183-218. Saddy, Doug 1992 A versus A-bar-movement and w/¡-fronting in Bahasa Indonesia. Ms., University of Queensland. Schmid, Tanja 1998 Optional and Obligatory IPP Constructions in Westgermanic. Paper presented at the Second Workshop on Optimality Theory Syntax, October 1998, University of Stuttgart. Speas, Margaret 1990 Phrase Structure in Natural Language. Dordrecht: Kluwer. Tsai, Wei-Tien 1994 On economizing the theory of Α-bar dependencies. Ph.D. dissertation, MIT. Wexler, Kenneth — Peter Culicover 1980 Formal Principles of Language Acquisition. Cambridge, MA: MIT Press. Wilder, Chris 1995 Rightvvard movement as leftward deletion. In: U. Lutz and J. Pafel (eds.) On Extraction and Extraposition in German, 273-309. Amsterdam: Benjamins. Wilder, Chris 1997 Some properties of ellipsis in coordination. In: A. Alexiadou and T.A. Hall (eds.) Studies on Universal Grammar and Typological Variation, 59-107. Amsterdam: Benjamins.
On the Integration of Cumulative Effects into Optimality Theory Silke Fischer
1 Introduction The goal of this paper is to discuss the question of whether cumulative theories are indispensable, because they are needed in order to capture certain linguistic phenomena, or whether cumulative effects can be expressed equally well in an optimality-theoretic framework. If so, cumulative theories could be integrated into Optimality Theory (OT). At first sight, the two theories seem to behave very differently. In OT, the number of violations of low-ranked constraints does not play any role as long as the constraint that is decisive for the outcome of the competition is higherranked. In a cumulative theory, on the other hand, the situation is somewhat different, because the underlying principle is that the weights of the involved factors are added up. Thus it can happen that some factors which individually do not have much weight and are therefore unimportant on their own become decisive as soon as they cooccur or appear repeatedly. As empirical background I will use Pafel's cumulative approach to quantifier scope in German (cf. Pafel 1998). I will discuss whether it is possible to "translate" it into OT, where the difficulties lie, and what kind of assumptions one might have to make. What I will not do is discuss Pafel's theory as such, that is, discuss whether it is able to capture the phenomenon of quantifier scope or where its advantages and disadvantages might lie; nor is the aim of this paper to provide an adequate optimality-theoretic account of quantifier scope in general (for this purpose see Heck, this volume). Pafel's theory only serves as a case study for a more theoretical debate; therefore, the approach itself as well as the judgments on the sentences are neither changed nor commented on.
152
Silke
Fischer
2 Pafel's Approach to Quantifier Scope Pafel introduces a number of factors that seem to have an impact on the scopai behavior of quantifiers, i.e., whether they tend to take wide scope over other quantifiers or not. Each factor is assigned some weight. In order to decide which one of two quantifiers in a given sentence tends to take wide scope, one has to determine which factors are relevant for each quantifier in the given context. Then one can calculate the scopai value (SV) of each quantifier by adding up the values of the relevant factors. The scopai behavior can then be determined from the difference between the scopai values as follows: (i)
|SV(Q,)-SV(Q2)|> 1 : The quantifier with the larger SV takes wide scope (i.e., the sentence is unambiguous).
(ii)
|SV(QI)-SV(Q2)|< 1 :
Either quantifier may take wide scope (i.e., the sentence is ambiguous). a.
b.
0 1. Thus only the factor with the larger scopai value should be able to take wide scope, and the corresponding constraint should be higher-ranked than the other one. So it must be concluded that we face a problem with regard to transitivity.
4 The Transitivity Problem As far as the examples (1) to (4) are concerned, it seems to be possible to derive the predictions of Pafel's cumulative theory (CT) by means of an optimality-theoretic analysis somehow. However, there is one essential difference between the two theories, which probably constitutes the main difficulty for the integration of cumulative effects into OT. If we compare the behavior of two quantifiers in Pafel's theory, there are three possible results: The absolute value of the difference between the scopai values might be > 1, = 0, or G |0, 1[. In OT, on the other hand, we basically have two possibilities to describe the relation between two constraints. One can be higher-ranked than the other, or they can be tied. As mentioned before, it seems to be reasonable to assume the following "translation rules" (where A and Β are factors relevant for scope, W(X):= the weight of factor X, and Con(X):= the constraint derived from factor X): (i)
W(A) = W(B)
—•
(ii)
W(A)-W(B) > 1 — •
Con(A) o Con(B) Con(A) » Con(B)
However, the third possibility, where 0· predicts ambiguity SV(Q 2 )-SV(Q 3 ) = 0.5 — • predicts ambiguity
but: c.
SV( Q l )-SV(Q3) = 1 — • predicts no ambiguity
OT:
According to the result in (a.), one would like to say that Α ο Β; but according to the result in (b.), one would like to say that Β o C.
160
—•
Silke Fischer
Because of transitivity, we would have to assume A o C. This contradicts the result in (c), according to which we would expect that A»C.
So if we assumed a strict transitive order, the consequence would be that all factors belonging to the set 7> would translate into tied constraints, where Tp is defined as the set containing the factor F and all those factors whose scopai values are less than 1 step away from the scopai value of an element belonging to Tp. This domino effect would render most of the constraints equally strong and lead to false predictions, as the following example illustrates. This example (Pafel's number 3.104) contains a new factor, SL-PAT, which is assigned to quantifiers with a slight tendency to be interpreted as Patients. It has the weight 1 and translates into the constraint SL, which says that quantifiers must have a slight tendency to be interpreted as Patients. ( 10)
Einem Kind hat er jedes Märchen erzählt. Ia c h i l d l y has fhe]„om [every fairytale \ acc told Qi : Q2:
EX-PRE + SL-PAT IN-DIS
Qi > QÏ'· Q2 > Ql·
S V(Q, ) = 1.5+1 = 2.5 SV(Q 2 ) = 1
possible impossible
Starting with the difference in weight between the two factors E and I, which is 1.5—1 = 0.5, we can assume that E o i . Similarly, from the difference between the weights associated with E and S L & I, which is 2—1.5 = 0.5, we can conclude that Ε o SL & I; so according to transitivity we get the relation I o SL & I. On the other hand, SL & I is tied with E & SL, since the relevant difference is 2.5—2 = 0.5. Again because of transitivity, we therefore get the result that I ο E & SL. But as illustrated in T 9 , this gives us the wrong predictions with regard to sentence (10), in which only the first quantifier can take wide scope. T9: E & SL Candidates us· Qi: einem Kind * * B3P Q 2 : jedes Märchen
I *
If we want to make sure that only Qi wins, E & SL must be ranked higher than I, a ranking which is also suggested by the difference between their corresponding weights, which is 2.5—1 = 1.5.
Cumulative Effects in OT
161
I do not know how to solve this problem without giving up to some extent the idea that constraint orders must be strictly transitive. But if we allow that Α ο Β and Β o C does not necessarily imply A o C, we can account for the examples above with the following diagram: (11)
Β »
C »
D
constraint order a (...A » Β » C ...) constraint order β (...A » C » Β ...)
A
»
C »
D
constraint order γ (...B»A»C...)
In (11), two global ties are involved, which express the relations Α ο Β and Β o C, but still all three resulting constraint orders predict that A is higherranked than C. This is possible because in contrast to usual assumptions, according to which the branches of global ties are continued in the same way, the second tie in (11) does not affect all branches, but is only part of the two constraint orders a and β. So we could propose that the occurrence of global ties need not necessarily affect all branches of the ranking structure. With this assumption the transitivity problem can be solved, which means that the idea of strict transitivity in constraint rankings must be given up (and this might be a controversial result). However, transitivity does not have to be given up completely, since each constraint order in itself remains transitive. It seems to me that this is the easiest way to integrate the non-transitive effects of cumulative theories into OT.5 The question then arises of how the underlying relation between the constraints A, B, and C, which is illustrated in (11), can be formally expressed. Following a suggestion by Ralf Vogel (p.c.), I propose that it can be captured adequately by the relation (A » C) ο B, where this kind of interaction between ties and hierarchical rankings is defined as follows:
162
Silke Fischer
(A » C) ο Β
:=
ΑοΒ» C A» Β» C V A» C» Β
resulting constraint
orders:
V A » CοΒ V Β» A» C (V A » Β » C) (i) (ii) (iii)
A» Β» C Β» A» C A» C» Β
This definition can be generalized in such a way that it can be applied to all sorts of combinations between ties and (bracketed) asymmetric rankings. The crucial point is that the brackets on hierarchical rankings make it possible to preserve this hierarchy even in a tied order. This means that if the tie is resolved, it will yield only those combinations possible between the tied elements in which the hierarchy indicated in brackets is preserved.
5 Combining Constraints In section 3 we considered four sentences that involved the simple constraints E, S, and I. In order to account for the behavior of the quantifiers in these examples, the additional constraint S & I was introduced. But what about constraints like E & S, E & I, or E & S & I? The question that needs to be discussed at this point is what kind of constraint combinations have to be taken into account. Since quantifiers can exhibit all sorts of combined properties, the answer should be that in principle all constraint combinations have to be considered. However, if we examined quantifiers with η different properties, we would have to discuss 2" — 1 constraints. Since the first four sentences have already shown that a certain subset of all constraints seems to suffice to determine the outcome of the competition for a concrete example, it would be helpful to find out what this subset has to look like. Remember that the last observation in section 3 was that E must not only be tied with S & I, but also with S and I, whereas S & I is higher-ranked than S and I, giving rise to the transitivity problem. In the light of the previous section, we can now assume that the underlying formal relation is (S & I » S o I) ο E, which is illustrated by the diagram in (13) (cf. also the calculation in the appendix).
Cumulative Effects in OT
163
This constraint ranking is indeed able to predict the ambiguity of sentence (4) 6 (cf. T 1 0 ). But if the competition is restricted to the same set of constraints (i.e., {S & I, E, S, I}), it does not make the correct predictions for the unambiguous sentences (2) and (3), in which only the first quantifier can take wide scope (cf. T u and Τ12). (4")
Eine Fuge hat jeder Pianist in seinem Repertoire, [a fugue| ÍÍCC has |every pianist|„ om in his repertoire T10: Candidates US' Qi : eine Fuge us* Q2: jeder Pianist
(2")
S&I
E
S
I
*
*
Jede Fuge hat ein Pianist in seinem Repertoire, levery fugue] (lcc has [a pianist|„„ m in his repertoire Τ,,: Candidates us* Qi : jede Fuge * 1®· Q 2 : ein Pianist
S&I
E
S
I
164 (3")
Silke Fischer Ein Pianist hat jede Fuge in seinem Repertoire. |a pianist | n o m has I every fugue lacc in his repertoire T12: Candidates ι®* Qi : ein Pianist * US' Q2: jede Fuge
S&I
E
S
I
According to structure (13), not only Qi but also Q2 is optimal in both tableaux, namely under the constraint orders a and β in the case of Τ π , and under the constraint orders γ and S in the case of T12. The conclusion that can be drawn is that the constraint subset relevant for the examples (2) and (3) has not been completely taken into consideration in T u and T12. Based on our observations concerning sentence (4), it seems reasonable to assume that the relevant subset CON r e / (i.e., the smallest set of constraints to which the competition can be reduced) consists of two members only, namely the combinations of the constraints derived from the properties of each quantifier. As far as the examples (2) and (3) are concerned, this means that the relevant constraint subsets are {Ε & I, S} and {E & S, 1} respectively (cf. T13 and T14). T13:
Candidates us· Qi -. jede Fuge Q2: ein Pianist
E &I
S *
*!
T14: Candidates Qi: ein Pianist Q 2 : jede Fuge
E & S
I *
*!
The following example 7 serves as a further illustration of this generalization concerning CON r e /. It contains two new factors: ST-L-DB refers to strong lexical discourse binding and has the weight 2; FOCUS is assigned to focused quantifiers 8 and has the weight —1. These factors translate into the following two constraints: ST-L-DB (ST): Quantifiers must occur in strong lexical discourse binding contexts. FOCUS (F): (14)
Quantifiers must be focused.
Welche Fuge hat jeder Pianist in seinem Repertoire? I which fugue W has [every pianistl„ om in his repertoire
Cumulative Effects in OT
165
Q ι : EX-PRE + ST-L-DB + FOCUS g ì : SUBJECT + IN-DIS SV(Q,)= 1.5+2-1 =2.5 SV(Q 2 )= 1+1 = 2 Qi > Q2: Q2 > Qi:
possible possible
relevant constraint subset:
{ E & S T & F , S & I } ç CON,
where CON is the set comprising all constraint combinations constraint ranking:
E&ST&FoS&I
T, 5 : E&ST&F Candidates ts· Q ( : welche Fuge * us5 Q 2 : jeder Pianist
S&I *
However, sentences in which the quantifiers share some common properties relevant for scope require a slight modification to the definition of CON re /. Since neither candidate would violate a constraint derived from (one of) these properties, these constraints are irrelevant for the competition and must therefore be excluded from CON re ;. Thus, CONre; can be defined as follows: The first element of CONr Q2: Q 2 > Qi :
possible impossible
relevant constraint
subset:
166
Silke Fischer
(i) {E & ST & F, S & I & F} (ii) {E & ST, S & I } constraint
(i) (ii)
ranking:
E&ST&F»S&I&F E&ST»S&I Tl6(i> Candidates E & ST & F S & I & F us* Qi : welche Fuge * ts· Q 2 : JEder Pianist Tl6(n): Candidates US' Qi : welche Fuge Q 2 : JEder Pianist
E & ST
S&I *
*!
As far as factors with negative weight are concerned, one might alternatively translate them into negative constraints in order to avoid configurations where X » X & Y, which contradicts the definition of local conjunction. The factor FOCUS, for example, would then translate into the following constraint: *F:
Quantifiers must not be focused.
In fact, we could then also try to replace the factor FOCUS (with weight — 1 ), which is associated with focused quantifiers, with a factor *FOCUS with weight 1, which is associated with unfocused quantifiers. In this way we could generally reinterpret factors with negative weight such that they would all be assigned positive weight. With regard to example (14), we would then have the following configuration, which illustrates that the difference between the scopai values and therefore the predictions on possible scope relations remain unaffected by this reinterpretation. (14')
Welche Fuge hat jeder Pianist in seinem Repertoire? [ which fuguejncc has [every pianist\ nom in his repertoire Q,: gì:
EX-PRE + ST-L-DB SUBJECT + IN-DIS + *FOCUS
SV(Q,)= 1.5+2 = 3.5 SV(Q 2 ) = 1+1+1 = 3 Qi > Q2: Q2 > Qi :
possible possible
Cumulative Effects in OT
167
relevant constraint subset: { E & ST, S & I & * F } constraint ranking: E & S T o S & I & * F
T17: Candidates US' Qi : welche Fuge 1®· Q2: jeder Pianist
E & ST
S & I & *F *
*
As far as example (15) is concerned, the factor *FOCUS would not be involved at all, because both quantifiers in the sentence are focused. Hence, *F would not belong to the relevant constraint subset. However, all sentences that contain unfocused quantifiers (like the examples (l)-(4)) are now associated with the factor *FOCUS and therefore with the constraint *F; but as our considerations above have shown, *F will be excluded from CONri./ in case both involved quantifiers are unfocused. Thus the replacement of F/FOCUS by *F/*FOCUS does not affect our earlier examples. Finally, there is another configuration in Pafel's approach that must be mentioned. If a quantifier is not associated with any property that is relevant for scope, it receives the scopai value 0. Thus it is possible for a sentence containing such a quantifier to be ambiguous in case the second quantifier Q2 has a scopai value with -1 < SV(Ç>2) < 1. Assume that Q2 has the property A, which translates into the constraint A. As indicated in T| 8 , Q2 fulfils A in contrast to Qi. Thus we are faced with the situation that Q2 will always win if we do not introduce a further constraint which is violated by Q2 but not byQi. Ti 8 : Candidates * Qi US' Q 2
A *!
In order to get the right result, we have to think of an additional constraint which is satisfied exactly by those quantifiers which do not have any properties that influence the quantifier's scopai behavior. Such a constraint might look as follows: NO PROPERTY (N-PR): Quantifiers must not have properties relevant for scope. On this assumption, the competition works as follows:
168 Silke Fischer (16)
Q,:Q2:
SV(Qi) = 0 A-1 Q2: possible Q2 > Qi: possible relevant constraint subset: {N-PR, A} constraint ranking: N-PR o A Τ,9: Candidates A BSP Qi * E3= Q 2
N-PR *
Note that the constraint N-PR must also come into play if a quantifier shares all its properties with the second quantifier of the sentence. This configuration is illustrated in the following example, where A and Β are properties relevant for scope that translate into the constraints A and Β respectively. (17)
Qi : A + B Q2: A, where |SV(Qi)—SV(Q2)|< 1, i.e., either quantifier can take wide scope.
As discussed above, the constraint derived from the common property A is excluded from CONre/. Thus the relevant constraint subset might be: (>) (ii)
{B>, or {B, N-PR}.
For (ii), the constraint ranking is Β o N-PR, because we know from our assumptions in (17) that |weight(B)|< 1. The results we get for (i) and (ii) are illustrated in T2o(,·) and Τ20(,·,), which show that we have to use the second constraint subset. T2o(o: Candidates nsr Q, * q2
Β *!
T2o(,·,·): Candidates B3F Q, ι®» Q 2
Β N-PR * *
One further situation that can occur in cumulative theories, which we do not find in Pafel's approach however, is that the cumulative occurrence of one and the same constraint violation might change the outcome of the whole competition. Imagine the following configuration:
Cumulative Effects in OT T21:
169
T22:
Candidates Q 1®· C 2
A
Β
*! *
Candidates e r C,
c2
A
Β
*
**!
If it is assumed that A » B, we can account for T21, but not for T22, and if we assume that Β » A, we get the right prediction for T22, but not for T 2 i. In the light of the ongoing discussion, one way out of the dilemma might be to assume that constraint combinations of the sort X & Y are not only possible in case X / Y , but also if X = Y. The resulting constraint would be a reflexive local conjunction (cf. also Legendre et al. 1998), which would have to be interpreted as follows: (18) (i) The constraint X & X =: X 2 is violated iff X is violated twice; (ii) more general: The constraint X" is violated iff X is violated η times. On these assumptions, T 2 i and T22 can be accounted for with the following constraint ranking: Β 2 A » B. Since A » B, C2 wins in T 2 i, and since 2 B » A, Ci wins in T22, as illustrated more precisely in T23. T23:
Candidates 03° C,
c2
B2
A *
*!
6 Conclusion As the discussion showed, it seems to be possible to integrate cumulative effects, as they occur, for example, in Pafel's approach to quantifier scope, into OT if some special assumptions are accepted. In order to get effective constraints, it was first of all necessary to introduce (reflexive) local conjunction, which multiplies the number of constraints enormously and might therefore give rise to criticism. But as could be shown in the previous section, the outcome of the competition only hinges on a small subset of the whole set of constraints. A much more severe problem was approached in section 4 and concerns the transitivity of constraint rankings. Since in cumulative theories, transitivity does not need to hold, we face the problem that we might have to integrate
170
Silke Fischer
non-transitive effects into a transitive order. I think that this is only possible if the idea of strict or global transitivity, where Α ο Β and Β o C necessarily implies A o C, is given up. Thus, I proposed that the occurrence of global ties within global ties might only affect some of the branches. This approach allows on the one hand the integration of non-transitive effects, but preserves on the other hand at least locally the transitive order, because each resulting constraint order remains transitive. Thus, this step is not as radical as it might seem at first sight. Of course, it has to be pointed out that global ties in general increase the amount of complexity tremendously ; however the number of the resulting constraint rankings is again reduced somewhat if global ties do not necessarily have to affect all branches. As far as the formal realization of this relation is concerned, it can be expressed as interaction between ties and bracketed hierarchical rankings. This seems to me to be a natural elaboration of the two basic relations and "o", which is to some extent reminiscent of the interaction between addition and multiplication. Finally, the question arose as to how CONre.;, the smallest set of constraints relevant for a competition, can be defined. It is clear that constraints on which the candidates behave alike can be excluded and that furthermore simple constraints which are also part of relevant local conjunctions need not be taken into consideration. (In the latter case, the simple constraints will not be decisive, since the corresponding local conjunctions are higher-ranked.) Moreover, the cumulative character of the constraints ensures that (A & X) 5i> or ο (Β & X) ^ A » or o B, which allows us to ignore certain higher-ranked local conjunctions on which the candidates differ. As far as the integration of Pafel's approach into OT is concerned, it could therefore be concluded that CON re ; contains only two constraints, namely the constraint combinations derived from the properties associated with each quantifier. There are two questions I have not addressed here. First, it could be asked whether anything would change if CON re / contained more than two constraints or if more than two candidates were involved. The second question concerns the representation of tendencies in OT, as for example the preference for certain readings. One possibility might be that it can somehow be captured by the number of constraint orders which are affected by certain ties, since this is exactly how ambiguities predicted by the relation 0 < |SV(Qi)—SV(Q2)| < 1 are characterized. However, whether this approach would really work would have to be discussed in more detail.
Cumulative Effects in OT
171
Appendix The ranking we finally assumed for the constraints S & I, S, I, and E was (S & 1 » S ο I) ο E, which results in eight constraint orders if the ties are resolved (cf. diagram (13)). This outcome can be predicted very easily if we assume the following definition, which is a generalization of definition (12): Generalization
of definition ( 12):
(Ai » . . . » A„) ο Β :=
Αι ο Β » A 2 » A3 » . . . » A„ ν A, » A 2 ο Β » A3 » . . . » An ν ...
ν A, » A 2 » A3 » . . . » A „ o B Example: (D » Α ο Β) o C This is the underlying formal relation if Α ο Β, A o C, Β o C, D o C, but D » A and D » B. If we apply the definition above, we get the following result: (D » A ο Β) o C
V V V V (V V (V
(D » A » Β) o C
V
(D » Β » A) o C
D oC » A » Β D » A oC » Β D » A » Βo C
V V V
D oC » Β » A D » ΒoC » A D » Β» A o C
D C D D D D
V V V V V V
D C D D D D
» » » » » »
C D A C A A
» » » » » »
A A C A Β C
resulting constraint
» » » » » »
Β Β Β Β C Β
orders:
(i) (ii) (iii) (iv) (ν) (vi) (vii) (viii)
» » » » » »
C D Β C Β Β
» » » » » »
C » C » D» D» D» D» D» D»
Β Β C Β A C D D A A Β Β C C
» » » » » » » » » » » » » »
A A A A) C A) A Β Β C A C A Β
» » » » » » » »
Β A C Β C A Β A
172
Silke Fischer
Notes For comments and discussion I want to thank Fabian Heck, Gereon Müller, Tanja Schmid, Wolfgang Sternefeld, Sten Vikner, and Ralf Vogel. 1. The distinction between (ii-a) and (ii-b) is only mentioned for completeness' sake. It does not play any role in the further discussion, since the question of how this difference can be expressed in an optimality-theoretic framework is not addressed here. 2. If this constraint were translated back into Pafel's theory, it would correspond to a factor with the weight 2, since it involves both properties S (weight 1) and I (weight 1). 3. The question might arise of whether it is legitimate to restrict the competition to the four constraints considered in T5. It is true that there are higher-ranked constraints on which Qi and Q2 differ, namely E & X and any local conjunction containing X and S or I, where X is a constraint that is violated by both candidates. However, the cumulative character of the constraints ensures that (A & X) » or ο (Β & Χ) A » or ο Β. Thus, the outcome of a competition involving the constraints A & Χ, Β & X, A, and Β does not change if A & X and Β & X are not taken into account. 4. It is not possible to provide a concrete example that only involves the two constraints S & I and S or I. These combinations are ruled out, because Pafel's postulation of the two contrasting factors EX-PRE and IN-PRE assures that one of them is always involved. (The latter property is assigned to quantifiers in the "Mittelfeld" that linearly precede other quantifiers.) But I think the general problem becomes clear nevertheless. 5. The situation in which Α ο Β and Β o C, but C » A must be excluded, is not as unusual as it may seem at first sight. It also occurs, for example, in Müller (2000), where it is assumed on the one hand (by transitivity) that A o C, but where on the other hand C » A is excluded because of an underlying meta-constraint which says that A must be higher-ranked than C. 6. The dotted lines in the tableaux indicate that two neighboring constraints X and Y are tied, but that their corresponding weights are not equal. 7. The sentences (14) and (15) correspond to Pafel's examples (3.164') and (3.165). 8. Pafel assumes that w/i-phrases are inherently focused (cf. Pafel 1998: 98).
Cumulative Effects in OT
173
References Heck, Fabian t.v. Quantifier scope in German and cyclic optimization Legendre, Géraldine — Paul Smolensky — Colin Wilson 1998 When is less more? Faithfulness and minimal links in wh-chains. In: P. Barbosa, D. Fox, P. Hagstrom, M. McGinnis and D. Pesetsky (eds.) Is the Best Good Enough?, 249-289. Cambridge, MA: MIT Press. Müller, Gereon 1999 Optionality in optimality-theoretic syntax. GLOTInternational Müller, Gereon 2000 Das Pronominaladverb als Reparaturphänomen. Linguistische 182: 139-178.
4.5: 3-8. Berichte
Pafel, Jürgen 1998 Skopus und logische Struktur. Studien zum Quantorenskopus im Deutschen. Technical Report 129, Arbeitspapiere des Sonderforschungsbereichs 340. Universität Tübingen. Prince, Alan — Paul Smolensky 1993 Optimality Theory: Constraint Interaction in Generative Grammar. Ms., Rutgers University & University of Colorado, Boulder. To appear as Linguisitc Inquiry Monograph, Cambridge, MA: MIT Press. Richards, Norvin 1998 The principle of minimal compliance. Linguistic Inquiry 29: 599-629. Smolensky, Paul 1995 On the internal structure of Con, the constraint component of UG. Ms., Johns Hopkins University.
Quantifier Scope in German and Cyclic Optimization Fabian Heck
Standard Optimality Theory (OT) as developed by Prince & Smolensky (1993) or McCarthy & Prince (1993) is based on the assumption that a grammatical structure S, is derived in the following way: given a certain input I, a function / first generates a (possibly infinite) set Si... S* of possible structures from I and then performs a computation of optimization to filter out all suboptimal structures, leaving only S, as the optimal output O. This is satisfying as long as it suffices to refer to two levels of representation. However, in syntax it has often been argued that one needs more levels of representation, for example the levels of D-structure, S-structure, and Logical Form (cf. Chomsky 1981). If OT is applicable to syntax at all, then the null hypothesis is to assume that the computation / holds between all levels of representation. I propose in this paper that this is indeed the case, and that the computation of generation and optimization proceeds in a cyclic fashion. The application of this hypothesis is concerned with the description of relative quantifier scope in German.
1 Introduction The goal of this paper is to account for the phenomenon of relative quantifier scope in German. The discussion basically deals with sentence pairs of the following type: (l)a. b.
Jeder hat einen Fehler gemacht everybodyNOM has one mistakeAcc made Einen Fehler hat jeder gemacht one mistakeAcc has everybodynoM made
(1-b) is ambiguous. It can either have the meaning described in (2-a) or the meaning described in (2-b):
176 Fabian Heck (2) a. b.
There exists one mistake χ such that for every person y the following holds: y made x. For every person y there exists a mistake χ such that y made x.
Interestingly (1-a) only has the reading (2-b). Hence, the question is, when does a sentence that contains two quantifiers have only one reading and when does it have two readings? First of all, following May (1977), Stechow (1993), Heim & Kratzer (1997), and others I assume that every meaning of a sentence is spelled out unambiguously at the level of Logical Form.1 The relative scope of two quantifiers Qi and Q 2 is encoded by the relationship of c-command (following the definition in Reinhart 1976): If Qi c-commands Q2 at LF, then Qi has scope over Q2. Now, I think that there are two main observations that may lead to a principled account of the given question: First, the relative quantifier scope in German is highly dependent on the given S-structural configuration. That is, if a quantifier Qi c-commands another quantifier Q 2 at S-structure (SS), then Q! will be able to c-command Q2 at LF as well. In other words, the mapping from S-structure to LF is highly structure preserving (cf., for instance, Kiss 1999 for German, and Kroch 1974, Reinhart 1983, and McCawley 1999 for English). Second, it seems that the scope relations can be inverted on the way from S-structure to LF if the derivation has involved S-structure movement. 2 This means that whenever there is a quantifier at S-structure that does not fill its D-structure position, the scope relations are destabilised and there may be an accessible reading that does not correspond to the S-structural configuration. Technically this will be spelled out by reconstructing the moved quantifier to its base position. The basic assumption about the transparent LF, together with the first observation, calls for the syntactic levels of S-structure and LF. The second observation calls for the syntactic level of D-structure (DS). Since I am using Optimality Theory to tackle the problem, I first want to give a motivation for this decision: Often the data suggest that there are different principles at work that stand in conflict with each other, but that nevertheless are all needed. That is, even grammatical structures cannot fulfil every constraint. We nevertheless need all constraints, and hence, constraints must be violable. OT gives us the means to express the concept of a violable but active constraint.
Cyclic Optimization
ill
2 Cyclic Optimization The strategy I will follow here is to reconcile the classical T-model of grammar of Chomsky (1981) with the standard model of Optimality Theory of Prince & Smolensky (1993) and McCarthy & Prince (1993). 3 The result of this reconciliation will be an extended version of OT which will be referred to as the model of Cyclic Optimization. Its basic characteristics are the following. Starting with a kind of predicate-argument structure as input, a generator GEN constructs a set of possible D-structures out of some "lexical" material. 4 This input defines the candidate set: (3)
Definition of candidate Two candidates Q and they descend from the sist of the same lexical
sets: C2 are in the same candidate set if and only if same predicate-argument structure and conmaterial.
This set will then be optimized in the first cycle. The output will be an optimal D-structure DS,. DS, in turn will serve as input for the second cycle, which starts with the generation of a set of possible S-structures, basically using the transformation move-α. This set will again undergo the process of optimization and the output will be an optimal S-structure SS 7 . SS7· will be the input for the last cycle. Again using move-α, a set of possible LFs will be generated and one last time optimization will apply, resulting in an optimal Logical Form LF*. The whole computation can be seen in the diagram below: First Cycle
Second Cycle
Third Cycle
178 Fabian Heck To put it in a nutshell: Optimal LFs are derived from optimal S-structures, which in turn are derived from optimal D-structures. Optimization proceeds cyclically, the different cycles being the syntactic levels of D-structure, S-structure, and LF. Another important property of the model is that it allows the reranking of the constraints as soon as another cycle is entered. This will become important when we examine the impact of one and the same constraint at different levels of representation. As we will see, all the constraints will be present at each cycle, but will be ranked in different ways depending on the level of representation.
3
D-Structure
As I have already mentioned, scope inversion sometimes appears to be derivable by LF-reconstructing a moved quantifier to its D-structure position. If we want this prediction to be verifiable (or falsifiable), we first need an explicit theory of D-structure. This is so because we need to know the base position of the quantifier Q in order to decide if reconstruction of Q to this position can lead to scope inversion or not. Therefore, I will first introduce some assumptions about sentence structure in German and then I will give a (somewhat simplified) OT-account of German D-structure. The assumptions are the following: 1. The subject is generated in SpecVP (cf. Haider 1993). 2. Structural cases like nominative and accusative are associated with fixed positions. Nominative is assigned to SpecVP, accusative is assigned to the sister of the verb. The dative is a lexical case in German and can be freely adjoined within the verbal projection (both assumptions are due to Vogel & Steinbach 1998). 3. Nominative will be assigned by the verb. 4. Adjunction to a non-maximal projection is allowed. This gives us the following structure projected by a ditransitive verb: (4)
[ VP IO [VP Subject [ v IO [ v DO [ v IO V ]]]]]
This means that the indirect object (IO) in (4) may principally occupy three different positions at D-structure: it may be base adjoined above the subject, between the subject and the direct object (DO), or below the direct object. The claim I want to make is that its exact base position can be determined by a process of D-structure optimization. The positions of the subject and the DO are fixed, so the only variation in D-structure will come from the choice of base adjoining the IO at different positions. This choice will give us the
Cyclic Optimization
179
basic or unmarked word order, where the term unmarked is to be understood in the sense of Höhle (1982). A s we will see, the unmarked word order is not dependent on the verb (contra Haider 1992, Haider 1993) nor is it the result of S-structure optimization (contra Müller 1999), but in this approach it is the result of D-structure optimization.
3.1 Constraints I will now introduce the first constraints, partially following Abraham (1986), Hoberg (1981) Lenerz (1977), Stowell (1981), Uszkoreit (1986), and Müller (1999). These three constraints will provide us with the traces we need to derive scope inversion by reconstruction. This is the first step to linking relative scope, an LF-phenomenon, to basic word order, which is a property of D-structure. And here is the first constraint: (5)
Constraint ofAnimacy (ANIM) If a and β are arguments, a [-I-animate| and β |—animate|, then a precedes β.
Evidence for ANIM is given by the following examples, in which in the unmarked case the animate argument always precedes the inanimate argument (see the a-examples). If the order is reversed as in the b-examples, the result is marked: 5 (6) a. b.
(7) a. b.
(8) a. b.
(9) a.
daß that ?daß that
er he er he
der the das the
Mutter das Sorgerecht motheroAT the custodyACC Sorgerecht der Mutter custody ACC the motherDAT
daß that ?daß that
er he er he
das Kind dem the childAcc the dem Einfluß the influenceoAT
daß that ?daß that
einem Patienten a patientoAT ein Medikament a medicineNOM
Einfluß influenceoAT das Kind the childAcc
entzogen withdrawn entzogen withdrawn
hat has hat has
entzogen withdrawn entzogen withdrawn
hat has hat has
ein Medikament a medicineNOM einem Patienten a patientoAT
geholfen helped geholfen helped
hat has hat has
daß Jakob einem Kind ein Märchen erzählt hat that Jacob a childoAT a taleAcc told has
180
Fabian Heck b.
?daß Jakob ein Märchen einem Kind erzählt hat that Jacob a taleAcc a c h i l d o A T told has
The second constraint is called the ( 10)
Constraint of Agentivity (AGENT) If a and β are arguments and a bears the 0-role agent, then a precedes β.
What is particularly interesting here is that AGENT has not had any impact in the examples so far. But AGENT comes into play as soon as A n i m is kept constant: ( 11 ) a.
daß ein Blauhelm einem Flüchtling that a UN-soldierNOM a refugeeoAT b. ?daß einem Flüchtling ein Blauhelm that a refugeeoAT a UN-soldierNOM
geholfen hat helped has geholfen hat helped has
This characteristic property is called emergence of the unmarked (cf. McCarthy & Prince 1994). It means that a constraint that is inactive in many cases suddenly awakes. In OT, this property of constraints is expected because a constraint Q might be overridden by another constraint C2 that is higher ranked than Q . But as soon as C2 does not have an effect anymore for independent reasons, Ci becomes relevant. 6 All this suggests that the partial ranking between the two constraints here is A n i m » A g e n t . The third constraint is the (12)
Constraint of Adjacency ( A D JA ) If a and β are arguments, α bearing structural case and β bearing lexical case, then a is closer to the case assigning verb than β .
In a sense, the evidence for A d j a shows the same characteristics as the examples cited as evidence for A G E N T : this time, if A N I M and A G E N T both are kept constant, the effects of A d j a can emerge: 7 (13) a.
daß that b. ?daß that
(14) a.
er he er he
einem Beispiel eine Nummer an exampleDAT a numberAcc eine Nummer einem Beispiel a numberAcc an exampleDAT
zugeordnet assigned zugeordnet assigned
daß er einem Arzt einen Patienten zugeteilt hat that he a doctorDAT a patientAcc assigned has
hat has hat has
Cyclic Optimization
181
b. ?daß er einen Patienten einem Arzt zugeteilt hat that he a patientAcc a doctoroAT assigned has
As a consequence the complete hierarchy so far is
ANIM
AGENT
ADJA.8
The conclusion is that the a-examples are unmarked D-structures which allow for maximal Focus Projection (cf. Höhle 1982). The b-examples are marked and hence they must have been derived by S-structure movement. 3.2 Analysis We now turn to the explicit computations, which are shown in the OT-tables below. Optimal candidates are indicated by the pointing hand ns\ The tables in (15) and (16) show different unmarked orders for the examples in (6) and (7) respectively, in which the main reason for different word order is a difference in animacy: (15)
daß er der Mutter das Sorgerecht entzogen hat that he the motherDAT the custodyACC withdrawn has Input: entzieh-(er,mutter,sorgerecht) Candidates ANIM c^ Ci : der Mutter ... das Sorgerecht *! C2: das Sorgerecht... der Mutter
(16)
AGENT
ADJA *
daß er die Kinder dem Einfluß entzogen hat that he the childrenAcc the influenceoAT withdrawn has
Input: entzieh-(er,einfluß,kinder) Candidates «s- Q : die Kinder ... dem Einfluß C¿: dem Einfluß ... die Kinder
ANIM
AGENT
ADJA *
*!
Examples like these show very clearly that basic word order can not be totally dependent on the verb. In both cases we face the same verb. However, in one case the basic word order is direct object before indirect object, whereas in the other example it is the inverse. The tables in (17) and (18) show the analysis of the examples in (8) and (11) respectively:
182
(17)
Fabian Heck
daß einem Patienten ein Medikament geholfen hat that a patientoAT a medicineNOM helped has Input: helf-(medikament,patient) Candidates US' C].· einem Patienten ... ein Medikament C2: ein Medikament... einem Patienten
(18)
Anim
Agent
A d ja
* *
*!
daß ein Blauhelm einem Flüchtling geholfen hat that a UN-soldier N oM a refugee DA T helped has Input: helf-(blauhelm,flüchtling) Candidates us- Ci: ein Blauhelm ... einem Flüchtling C2: einem Flüchtling ... ein Blauhelm
Anim
Agent
Adja *
*!
As can be seen, the indirect object occurs to the left of the subject in one case but to the right of the subject in the other case. This is due to a difference in animacy of the arguments, and the emergence of AGENT. Finally, we see an example of what happens if even agentivity is neutralized: (19)
daß er einem Beispiel eine Nummer zugeordnet hat that he an example DA T a numberAcc assigned has I nput: zuordn-(er,beispiel,nummer) Candidates US' Ci : einem Beispiel ... eine Nummer C2: eine Nummer ... einem Beispiel
Anim
Agent
Adja
*!
Thus, we can finish with the first cycle. The optimal D-structure will now be the input for the next cycle, the S-structure generation and optimization. 9
4 S-Structure I will first clarify the basic assumptions about S-structure that are made here: 1. Topicalization is semantically empty movement. It is triggered by some need for clause typing. 2. Scrambling may be semantically empty if it is triggered by information structural needs (e.g., align focus to the right). But it may also be semantically relevant if it is triggered in order to gain scope.
Cyclic Optimization
183
4.1 Constraints The next constraint I will adopt is the economy constraint ECON, which prohibits movement (cf. Chomsky 1995, Grimshaw 1997). (20)
Economy (ECON) Movement is not allowed.
Since there is ECON, a trigger for S-structure movement is needed. In the case of scope induced scrambling this will be formalised by stipulating an abstract scope marker Q which is generated at D-structure and which c-commands the base position of the quantifier it is supercoindexed with (coindexation meaning that the quantifier and the scope marker share the same scope): 10 (21)
I« Q U ... Qi... [ p ... Q¡¡... 11]
(D-structure)
If Q can be generated freely, this may lead to over-generation. This problem will be addressed later, when we have a better understanding of what kind of Q-insertion should be allowed and what kind of insertion should be blocked. It is clear that at least at LF, every quantifier has to be near its scope marker - if there is one - because it is at LF that the scope is interpreted. However, in German it seems as if this already has to happen at S-structure (recall the first main observation from the introduction): (22)
[ e | Q , Qf QÍ, ] [« ... Q,... [p ... t 2 ... ]]]
(S-structure/LF)
To force the quantifier to move up to its scope marker we need the next constraint, which follows quite naturally: (23)
Scope Principle (SP) Every quantifier has to be adjoined to the scope marker it is coindexed with.
This partially explains why the relative structure in German. But it cannot be the lations at S-structure does not guarantee preserved at LF. We therefore will adopt (1996): (24)
scope is highly dependent on Swhole story. Expressing scope rethat these scope relations will be a constraint which is due to Beck
Quantifier Induced Barrier (QUIB) Movement across a scope bearing element is prohibited.
184 Fabian Heck Of course, this constraint should only show its effects at LF because at Sstructure we do have movement across scope bearing elements. In this sense Q U I B is an LF constraint (and as such it was introduced by Sigrid Beck). The reason why I mention it here is twofold: First, to somehow complete the explanation of S-structure sensitivity of relative scope in German, and second, to demonstrate that one and the same constraint can have different impacts on different levels of representation, the difference being a result of different rankings. That is, at S-structure we have the partial ranking SP » Q U I B , whereas at LF we have Q U I B » SP. But now back to S-structure. To assure that scope scrambling will not be prohibited by economy we define the partial ranking SP » ECON, ANIM AGENT » ADJA. The D-structure constraints are ranked sufficiently low to guarantee that they have no influence on this level of representation. 11 Other types of S-structure movement can be derived in a similar fashion. Topicalization will be derived by defining a constraint that forces the Top position 12 to be filled in main clauses (for reasons of clause typing; cf., for instance, Cheng 1991): (25)
Principle of Clause Typing (TYPE) Every clause has to be typed.
It is a well-known fact that focused elements tend to go to the right edge of the sentence (in right branching languages). Scrambling that applies in order to align focus (whatever the ultimate reason for this may be; cf., for instance, Samek-Lodovici 1997, Cinque 1993, Reinhart 1997, or Büring, this volume) is due to the following principle: (26)
Align Focus (ALIGN) Focus marked arguments have to be aligned to the right periphery of the sentence.
The idea is that an unfocused constituent which separates a focused constituent from the right edge of the sentence is supposed to move leftwards in order to get the focused constituent into the rightmost position. This movement will be called anti-focus scrambling. T Y P E and A L I G N must outrank E C O N , because we see that they do apply at S-structure. T Y P E must also outrank A L I G N , because presumably focused elements can be topicalized. 13 It seems as if they both also outrank SP. We therefore get the following partial ranking: TYPE » ALIGN » S P »
ECON.
Cyclic Optimization 185
4.2 Analysis Since the main interest here is relative scope at LF and not S-structure movement, I shall only briefly discuss the consequences of this ranking. The input for the computation will be optimal D-structures - the output of the previous cycle. First of all it is clear that if there is a scope marker, then the quantifier which is coindexed with it has to move in order to fulfil SP: (27)
daß Jakob ein Märchen jedem Kind erzählt hat that Jacob one t a l e A c c every c h i l d o A T told has Inputos: daß Jakob Q jedem Kind [ ein Märchen erzählt hat Candidates TYPE SP ECON * o^ Ci : Γ ein Märchen! Q 1 ...jedem ... ti C 2 : Ql ...jedem ... ein Märchen'
*!
On the other hand, if there is no scope marker present, then scrambling will result in a fatal violation of economy: (28)
daß Jakob jedem Kind ein Märchen erzählt hat that Jacob every c h i l d o A T a taIeAcc told has Inputos: daß Jakob jedem Kind ein Märchen erzählt hat Candidates TYPE SP Ci: jedem ... ein Märchen C 2 : ein Märcheni... jedem ... t)
ECON *!
If we have an object that is both topic marked and scope marked, then on the one hand it has to move to the Top position in order to satisfy TYPE, but on the other hand it has to move to its scope marker. Since we assumed that TYPE SP at S-structure, the object will be topicalized: 14 (29)
Ein Märchen hat Jakob jedem Kind erzählt one t a l e A c c has Jacob every c h i l d o A T told Inputos: -Top daß J. Q' jedem Kind fj+τορΐ ein Märchen Candidates TYPE os* Ci: ein Märchen', ...Q'... jedem ... ti *! C2: [Q ein Märchenj Q ] ...jedem ... t| *! C3: -Top —Q' ...jedem ... ein Märchen'
erzählt hat SP ECON *
* *
*
186
Fabian Heck
More or less the same holds for ALIGN. For reasons of space I will only show the result of the complex situation in which there is a conflict between scope marking and focus alignment: 1 5 (30)
daß einem Flüchtling jeder B L A U h e l m geholfen hat that one refugeepAT every UN-soldier N oM helped has Inputos: Q' [F jeder BLAUhelm ]' einem Flüchtling geholfen hat Candidates ALIGN SP ECON os· Ci: C2: C3: C4:
Q'... einem Flüchtlingi... [ρ jeder]'...ti I Q [F jeder h ] . . . t2... einem Flüchtling [ Q [F jeder ] 2 ] einem Flüchtling] ... ti Q!... [F jeder]' ... einem Flüchtling
*
*
*!
*
*!
* *
*!
*
Candidate Q wins. It dispenses with scope driven movement in order to align focus. However, this conclusion is merely theoretical because some kind of scope driven movement can apply later at the level of L F (but only due to a conspiracy of two constraints, as we shall see). We know this, because (30) has the inverted reading. There is still another candidate, C5, that does better than Q . It is a candidate that raises the subject to its scope marker and scrambles the object even further than this position in order to align focus: (31)
daß einem Flüchtling jeder BLAUhelm geholfen hat that one refugee D A T every UN-soldierNOM helped has Inputos: Q' [F jeder BLAUhelm ]' einem Flüchtling geholfen hat Candidates ALIGN S P ECON * * C5: einem Flüchtlingi... [ Q [F jeder J2] t 2 ti
A s we shall see later, the scope marker in (31) stands in an improper position and therefore violates a constraint designed to avoid the proliferation of improper scope marker insertion. We will come back to this constraint later. To sum up: S-structure is the level at which the relative scope is determined in most cases in German. This means that a quantifier moves to its scope position. The exceptions are cases where movement which is triggered by information structure outranks scope movement and delays it until LF.
Cyclic Optimization 187
5 Logical Form We now turn to the last cycle. The basic assumptions are: 1. The verb is interpreted as an open proposition (following Nohl & Stechow 1995).16 This means that there are argument variables which are generated directly at the verb. As a consequence there is no type driven QR (contra May 1977, May 1985), but quantifiers can be interpreted in situ. 2. Semantically empty movement is obligatorily reconstructed at LF. In addition to QUIB I will introduce two further constraints that show their effects at LF. Together with the first two cycles these constraints will serve as a means to derive some empirical facts about relative scope in German as they have been noted by Pafel (1997). 5.1 Constraints It is well known that some quantifiers tend to take wide scope as a mere lexical property (cf., for instance, Milsark 1974 and Pafel 1997). In German these quantifiers are jeder, mancher, and die meisten, and they will be referred to as the strong quantifiers, as in the following constraint: (32)
Quantifier Raising (QR) Adjoin a strong quantifier somewhere above its base position in the tree.
Together with the next constraint, QR will be responsible for some instances of scope inversion. The next constraint is based on the assumption that semantically empty movement is reconstructed at LF. Reconstruction is to be understood here as syntactic lowering that violates economy. (33)
Reconstruction (REC) Movement is to be undone.
I propose the following LF hierarchy to hold between the constraints met s o far: QUIB »
SP »
REC, Q R »
ECON, TYPE, ALIGN. 1 7 QUIB is ranked
above SP because there is no LF movement over scope bearing elements at LF in German. SP is above REC and QR to ensure that interpretable movement will not be undone at LF. REC and QR are above ECON for those operations to be applicable at all. Finally REC is above TYPE and ALIGN in order to undo semantically empty movement.
188 Fabian Heck I am now going to present some data about relative scope in German and then we will see how the proposed constraints can account for them.
5.2 Scope Inversion by Reconstruction 5.2.1 Reconstructing topicalized quantifiers It is argued in the literature that topicalized quantifiers often behave as if reconstructed to their base position (cf., for instance, Beck 1996, Büring 1996, Höhle 1991, Frey 1993, and Pafel 1997). (34) a. b.
Ein Märchen hat er allen Kindern erzählt one t a l e A c c has he all c h i l d r e n D A T told Einem Kind haben alle geholfen one childoAT have aÜNOM helped
Both examples have an inverted reading besides the reading that corresponds to the S-structural configuration. Scope inversion follows if (34-a,b) are instances of the following scheme:18 (35)a. b.
[...Q,... [,..Q 2 ... [ ...ti... ]]] I ... pm! ... I... Q 2 ... [... Qi... ]]]
(S-structure) (Logical Form)
The topicalized quantifier is reconstructed to its base position. However, sometimes inversion does not seem to be possible: (36) a.
Ein Schüler
hat alle Kinder
verprügelt
one schoolboyNOM has all childrenAcc thrashed
b.
Einem Kind hat er alle Märchen erzählt one childoAT has he all talesAcc told
This follows if (36-a,b) are instances of the scheme in (37): (37) a. b.
L... Q,... [... ti... L... Qa... ]]] I ...pmx ... I ... Qi... [... Q 2 ... ]]]
(S-structure) (Logical Form)
So, if we assume that reconstruction of topicalized constituents is obligatory, then the difference between the two kinds of examples must be due to different base positions. In one case the target of reconstruction is below the embedded quantifier and inversion is possible; in the other, it is above the
Cyclic Optimization
189
embedded quantifier and inversion is impossible. D-structure optimization provides us with the appropriate target positions. 5.2.2 Reconstruction
in the Mittelfeld
People do not agree if there is reconstruction in the Mittelfeld. Whereas Frey (1993) reconstructs unscrupulously in the Mittelfeld, Beck (1996), Büring (1996), and Höhle (1991) claim that reconstruction in the Mittelfeld is impossible. I want to claim that the truth is somewhere in between the two positions: Reconstruction is possible if scrambling is semantically empty, for instance if it has applied for information structural reasons. In the following examples a question precedes the relevant sentence in order to set up a context which suggests that scrambling has applied in order to align focus: (38) a. b.
(39) a.
Für wen gilt, daß er eine Fuge spielen kann? for whom holds that henoM one fugueAcc play can Ich glaube, daß eine Fugei [ρ fast jeder PianIST | t] I believe that one fugueAcc almost every pianist N 0M spielen kann play can Wen
hat er einigen schlechten Einflüssen
entzogen?
has he some bad i n f l u e n c e s o A T withdrawn Ich glaube, daß er einigen Einflüssen] schon | f fast jedes I believe that he some i n f l u e n c e s D A T PART almost every KIND ] t! entzogen hat childAcc withdrawn has whoAcc
b.
(40) a. b.
Wer
hat einem Flüchtling geholfen?
w h o N O M has one r e f u g e e o A T helped Ich glaube, daß einem Flüchtling! schon | ρ fast jeder I believe that one r e f u g e e o A T PART almost every BLAUhelm 1 t, geholfen hat UN-soldierNOM helped has
I think that in these examples reconstruction, and hence inversion, is indeed possible. 1 9 However, if scrambling has applied in order to enlarge the relative scope of a quantifier, this movement should not be reconstructable. These are exactly the cases where movement has not applied for information structural needs.
190
Fabian Heck
5.3
Quantifier Raising
Now, what is the use of strong quantifiers that should be raised by QR? Remember that QUIB was introduced to account for the surface orientation of German relative scope. Since QUIB outranks QR, it seems as if QR could never apply. The answer to this question is based on the following observation: Sometimes inversion seems only to be possible if both movement is involved and a strong quantifier is present: (41) a.
Ein Schüler ι
hat ti alle Kinder
one schoolboyiMOM has b.
daß ein Schüler
alle Kinder
that one schoolboyNOM all (42) a.
Einem Kind] one
b.
(43) a. b.
verprügelt hat
childrenAcc thrashed
daß Jakob einem Kind
all talesAcc told alle Märchen erzählt hat
childoAT all talesAcc
Ein Schüleri
hat ti allen Kindern
one
has
schoolboyNOM
daß ein Schüler that
one
has
hat Jakob ti alle Märchen erzählt
childoAT has Jacob
that Jacob one
verprügelt
all childrenAcc thrashed
schoolboyNOM
all
told
has
geholfen
childrenoAT
helped
allen Kindern
geholfen hat
all
helped
childrenoAT
has
In the examples (41)-(43) inversion is impossible in both the main clauses and the embedded clauses. The main clauses lack a strong quantifier, but movement has applied. The embedded clauses have neither a strong quantifier nor a moved constituent. Now contrast this with the following examples: (44) a.
Ein Schüler]
hat ti jedes Kind
one schoolboyNOM has b.
daß ein Schüler
verprügelt
each childAcc thrashed jedes Kind
verprügelt hat
that one schoolboynoM each childAcc thrashed (45) a.
Einem Kindi one
b.
(46) a.
hat Jakob ti jedes Märchen erzählt
childoAT has Jacob
daß Jakob einem Kind that Jacob one
has
each taleAcc
told
jedes Märchen erzählt hat
childDAT each taleAcc
told
has
Ein Schülerj hat ti jedem Kind geholfen one schoolboyNOM has each childDAT helped
Cyclic Optimization
b.
191
daß ein Schiller jedem Kind geholfen hat that one schoolboyNOM each childoAT helped has
(44)-(46) are completely analogous except that the weak quantifier all- has been replaced by the strong quantifier jed-. Now inversion is possible in the main clauses which involve topicalization, but still impossible in the embedded clauses! So it seems that movement may destabilise the scope configuration, resulting in inversion if additionally there is a strong quantifier present that takes advantage of the déstabilisation. Now, the claim is that inversion is created by the following conspiracy of REC and QR, and by movement induced déstabilisation of the structure. First the strong embedded quantifier is raised by QR across the base position of the topicalized quantifier. Then the topicalized quantifier is reconstructed to its base position. Inversion is the result: (47)
pre\ | YP jedes Kind2 [VP ein Schüleri tírF verprügelt hat |]
(LF)
The only additional stipulation I have to make is that Q U I B is blind to downward movement. In other words, it is allowed to reconstruct across a scope bearing element at LF, but it is not allowed to raise across such an element. This is in concord with speculations about Q U I B (in non-negative cases) in Beck (1995).
5.4 Analysis I will now present the concrete computation. But first some remarks about the tables: 1. This time the input consists of S-structures that contain the relevant S-structure traces that must be there according to D-structure optimization and S-structure movement. 2. The LF-structures in the tables contain only LFtraces for reasons of space and readability. 3. The scope marker Q will count as relevant lexical material with respect to the definition of the candidate sets. 20
192
Fabian Heck
5.4.1 Topicalization Topicalization of the direct object (48)
Mindestens einen Fehler hat jeder gemacht at-least one mistakeAcc has everybodyNOM made
Inputss: Mindestens einen Fehleri hat [ jeder ti gemacht ] Candidates QUIB S P R E C c&Ci'.pm]
... jeder2...
QR
mindestens]
ECON **
*!
C2: pre 1 ... jeder ... mindestens] C3: mindestens ...jeder
*1
F
C4: mindestens ... jeder2... t^r
F
C5: jeder2·.. mindestens ... tir
* *
*! *
*!
*
*
(48) shows how inversion is derived by reconstruction of the topicalized quantifier. C2 loses because it does not QR its strong quantifier. However, the difference is merely theoretical because it does not affect the relative scope. C3 and C4 both do not reconstruct and therefore have a fatal violation of REC. C5 is ungrammatical because it tries to derive inversion by raising across a scope bearing element. Of course, the reading which corresponds to the Sstructure configuration must also be derivable: (49)
Mindestens einen Fehler hat jeder gemacht at-least one mistakeAcc has everybodyNOM made
Inputss: Mindestens einen Fehler1, hat [ Q [jeder ti gemacht ]] QUIB SP R E C Q R Candidates E®" CL pre|mini c 2 preQ...
F
*
Q] ... jeder 2 ... tir jeder ... min'i
*!
F C 3 pre\... [mini Q]... jeder ... t^
*!
F C 4 pre 1... jeder 2 ... [mini Q\... tif
*!
l C 5 min'i... Q ... jeder 2 ... t ^
ECO Ν
*
*
*
**
**
*!
*
*
This is achieved by inserting a scope marker. At S-structure TYPE outranks SP, therefore we topicalize. But at LF it is the inverse, so the topicalized quantifier gets reconstructed to its scope marker. Since Q is present in (49) but not in (48), these two examples are in different candidate sets. This is important because if they were in the same set, then the optimal candidate in
Cyclic Optimization
193
the set without scope markers would block the optimal candidate in the set with scope markers, since the first one exhibits one less violation of economy. The same point can be made with a weak quantifier (the example with Q which derives the S-structure reading is omitted for reasons of space): (50)
Mindestens einen Fehler haben alle Studenten gemacht at-least one mistakeAcc have all studentSNOM made
Inputss: Mindestens einen Fehler] haben [ alle Studenten ti gemacht | Candidates QUIB SP REC QR ECON * os* C) : pm\ ... alle ... mindestens 1 *!*
C2: pre 1 ... alle2 ... t ^ . . . mindestens] Cy. mindestens ... alle C4: alle2·.. mindestens ... t ^
*
*
*!
Again, a candidate that reconstructs wins. This time, however, QR of the embedded quantifier is not licensed (since it is a weak quantifier), and therefore C 2 fatally violates economy. Candidates that do not reconstruct or that raise the embedded quantifier across the topicalized one are ill formed because of fatal violations of REC and Qui Β respectively (see C3 and C4). The same holds for two quantifiers in object position if the one that is more deeply embedded is topicalized (the example with Q is again omitted): (51)
Mindestens ein Märchen hat Jakob at-least
one taleAcc
allen Kindern
has JacobNOM all
erzählt
childrenoAT told
Inputss: Mindestens ein Märcheni hat | Jakob allen Kindern ti erzählt | QUIB SP REC QR ECON Candidates 1 * ρ® C\:pm\ ... alle ... mindestens] C2: pm\ ... alle2... t^ F ...mindestens] C 3 : mindestens ... alle
*!*
C 4 : mindestens ... alle 2 ... t ^
*!
*
C5: alle2... mindestens ... t j
*
*
F
*!
Of course, the analogue of the candidate C4 in (51) could have been listed in table (50) as well. But it is ill formed anyway since it does not reconstruct and string vacuously raises the weak embedded quantifier, causing another violation of economy.
194
Fabian Heck
Topicalization of the subject We now come to some trickier derivation of scope inversion that has already been mentioned. (52) demonstrates how inversion can be derived by the conspiracy of REC and QR (see Ci): (52)
Mindestens ein Mann liebt jede Frau at-least one manNOM loves every womanAcc
Inputss: Mindestens ein Manni liebt | ti ede Frau ] Candidates QUIB S P
REC
QR
ECON
F
**
m- C2: mindestens]... jede2... t ^ C 3 : jede 2 ... mindestens]... t!rF
**
ι®· C\\pre\...
jede2·.. mindestens 1... t^
*
*
C 4 : mindestens] ... jede 2 ... t ^
*!
*
C5: mindestens] ...jede
* !
*!
*
First the quantifier that remained unmoved at S-structure is raised across the S-structure trace of the topicalized quantifier. Then reconstruction of the topicalized quantifier can apply (recall that Qui Β is not sensitive for reconstruction). The S-structure reading is derivable by raising the strong quantifier just string vacuously such that its target position still remains below the target position of reconstruction (see C2).21 However, without a strong quantifier the example has only the reading corresponding to the surface: (53)
Mindestens ein Mann liebt alle Frauen at-least one manNOM loves all womenAcc
Inputss: Mindestens ein Manni liebt [ t] alle Frauen J QUIB S P REC Candidates C \ \ p w \ . . . alle2... mindestens]... tîr
QR
ECON
F
Cz.pm\...
*
mindestens]... alle
* 1*
C3: pre\... mindestens]... alle2... tí¡r
F
C4: mindestens ... alle C5: alle2 ... mindestens ... t ^
*!
*
*
Cyclic Optimization
195
This is so because in this case raising of the embedded quantifier into a position above the target position of reconstruction causes an additional violation of economy which is fatal. 22 Topicalization of the indirect object We can hold the same mechanism responsible for the readings available in a configuration with a topicalized indirect object and a subject in situ (here one example with a strong and another with a weak quantifier): (54)
Drei Beobachtern ist jeder Spieler aufgefallen three observersoAT is every playerNOM noticed 'Three observers have noticed every player.'
Inputss: Drei Beobachtern] ist [ t¡ jeder Spieler aufgefallen J QUIB SP REC QR Candidates F
**
ι®· C\:pr&\... jeder2 ... drei!... tir «3° C2: pm\...
ECON
dreii... jeder2... Í2
**
C3: drei... jeder2... t!fF
*!
C4: drei... jeder
*!
C5: jeder 2 ... drei ... TJF
*
*!
* * *
Again, in one case the strong quantifier raises into a position above the target of reconstruction, thereby causing scope inversion. In the other case it does not raise far enough and the S-structural relative scope is preserved at LF. (55) shows the same thing without a strong quantifier. Here the additional violation against economy is decisive and blocks inversion. (55)
Drei Beobachtern sind alle Spieler aufgefallen three observers DAT are all playersNOM noticed 'Three observers have noticed all players.'
Inputss: Drei Beobachtern] sind | tj alle Spieler aufgefallen | QUIB SP REC QR Candidates us3 Q : ρτθ\... drei]... alle
ECON *
C2: pf»\... alle2 ... drei]... t^ F C3: drei ... alle
*!*
C 4 : alle2·.. drei ... tirF
*!
*
*
C 5 : alle 2 ... drei ... t^ F
*!
*
*
196 Fabian Heck Inversion
impossible
If there is no appropriate trace, reconstruction cannot apply and inversion is blocked, even if a strong quantifier is present: (56)
Jakob hat einigen Kindern jedes Märchen erzählt Jacob has some childrenoAT every taleAcc told
Inputss: Jakob hat einigen Kindern jedes Märchen erzählt Candidates QUIB SP REC
QR
F
ECON *
us- C| : einigen ... jedesi... t^ ... C2: einigen... jedes
*!
C3: j e d e s e i n i g e n ... t^
F
*
*!
In (56) QuiB blocks inversion by movement of the embedded quantifier. The back door of first raising and then reconstructing below the base position is not available since there is no appropriate trace. 5.4.2
Scrambling
If scrambling is triggered by information structure, then the moved item is reconstructable. It is of no importance whether we insert a strong or a weak quantifier in (57) because the target of reconstruction is already below the embedded quantifier: (57)
daß mindestens eine Fuge that at-least
fast
jeder P i a N I S T
spielen
one fugueAcc almost every pianistNOM play
kann can
Inputss: daß mindestens eine Fugei fast [p jeder PiaNIST ] ti spielen kann Candidates QUIB SP R E C Q R ECON US'Ci:pr&\ ... jeder2... t ^ ... mind.i
** *
C2: pre 1 ... jeder ... mindestens] C3: mindestens ...jeder
*1
C4: m i n d e s t e n s ... jeder2... t^ F
*!
*
*
**
C5: jeder 2 ... mindestens ... tirF
*!
*! *
Cyclic Optimization
197
If there is a scope marker that has triggered scrambling, it will now block reconstruction together with the Scope Principle. Again, any attempt to derive inversion by QR violates QUIB: (58)
daß mindestens eine Fuge jeder Pianist spielen kann that at-least one fugueAcc every p i a n i s t N O M play can
Inputss: daß [G; [ mind, eine Fuge ]', Ql ] jeder Pianist ti spielen kann Candidates QUIB SP R E C Q R E C O N m- Ci: [mindestens Q] ... jeder2... t ^ C2: [mindestens Q] ...jeder C3: \pr» 1 QM ...jeder ... mind.', C 4 : jeder 2 ... [mindestens Q] ... tir
*
**
*
* *
*! F
* *
*!
The following examples demonstrate how the very same verb may or may not allow scope inversion by reconstruction, depending on the different Dstructures which are determined by D-structure optimization. 2 3 (59)
daß er mindestens einem Einfluß jedes KIND entzogen that he at-least one i n f l u e n c e o A T every c h i l d A c c withdrawn hat has
Inputss: daß er mindestens einem Einfluß] [Ρ jedes KIND ] ti entzogen hat Candidates QUIB SP REC Q R ECON m-Ci'.pr&i
... jedes2·.. t ^ . . . mind.i
**
C2: pro 1 ... jedes ... mindestens]
*!
C3: mindestens ... jedes2... t2
*!
C4: mindestens ...jedes
*!
C5: jedes2·.. mindestens ... t^ F
*!
*
* *
* *
Because of A n i m the D-structure in (59) must be DO before 10. From this, it follows that (59) must involve movement which can be reconstructed. In contrast, in (60) no movement has applied. Hence, no reconstruction is possible: 2 4
198 Fabian Heck (60)
daß er mindestens ein Kind jedem Einfluß entzogen that he at-least one childAcc every influenceoAT withdrawn hat has Inputss: daß er mindestens ein Kind jedem Einfluß entzogen hat Candidates QUIB SP REC QR ECON us· Q : mindestens ...jedem C2: jedem ι ... mindestens ... t^F
*
*!
C3: mindestens ... jedem 1 ...
*!*
In (61) and (62) the D-structure relations are inverse: The indirect object precedes the direct object. Since (61) exhibits this order, no S-structure movement has applied and hence no trace is there that could serve as the target of reconstruction: (61)
daß er mindestens einer Mutter jedes Sorgerecht entzogen that he at-least one mother DA T every custodyAcc withdrawn hat has Inputss: daß er mindestens einer Mutter jedes Sorgerecht entzogen hat Candidates QUIB SP R E C Q R E C O N «s- Cj: mindestens ... jedesi ...
*
C2: mindestens ...jede C3: jedes 1 ... mindestens ... t^F
*! *!
*
In contrast to this, in (62) there is an appropriate trace and scope inversion is available. Of course, the S-structure reading would be derivable in a variant of (62) that contained a scope marker that blocked reconstruction, and hence, inversion: (62)
daß er einige Sorgerechte schon jeder MUTter entzogen that he some custodiesAcc already every motheroAT withdrawn hat has
Cyclic Optimization
199
Inputss: daß er einige Rechtei schon | F jeder MUTter] t] entzogen hat Q U I B S P REC QR E C O N Candidates D®· C a p r a i ... jeder2... Í 2 F . . . einigei
**
C2: pr&i ... jeder ... einigei C3: einige ... jeder
*T
*! *
C4: einige ... jeder 2 ...
*!
*
C5: jeder2... einige ... t ^
*!
*
*
*
Raising of the strong quantifier after reconstruction applies string vacuously and therefore has no impact.
6 Scope Marker Insertion Finally, I will address the problem of proper scope marker insertion. As I already mentioned, insertion of Q is free and might therefore create readings that are not borne out empirically. Intuitively, there are two configurations that one wants to block: I will call them vacuous scope marking and downward scope marking. The first kind of improper Q-insertion is shown in (63). We can see in (63-b,c) that the scope marker contributes nothing to the information of relative scope at S-structure. It marks more or less the same scope as the quantifier it is coindexed with: Vacuous scope marking a.
[ h o p . . · ] ...[ Q ··· Qi - Q2 - IJ
(D-structure)
b.
ΙΙτορ Qi]... Γ Q'... ti... Qj... Il
(S-structure 1)
c.
llTopQl]..· [[ Ql Q' QLl - t , . . . t 2 . .. 11
(S-structure 2)
d.
! [Top
1 ... \[Qi Q ' Q ^ I ...Ql... t 2 ... Il
(Logicai Form)
This has an unwanted consequence. Suppose that in (63-b) Q 2 is a weak quantifier. Then this quantifier should raise up to its scope marker (at S-structure or at LF, see (63-b,c)), which stands above the base position of the topicalized quantifier Qi. In the next step, the topicalized quantifier gets reconstructed below the raised quantifier: Inversion is the result. This derivation opens the door to a proliferation of unwanted scope inversions. Now, as I said, what is intuitively wrong with the S-structures in (63-b,c) is that the scope marker does no job. It just marks the domain the coindexed
200
Fabian Heck
quantifier already c-commands. 25 What might be strange is that the relevant level for the appropriateness of the scope marker is not D-structure, the level at which the scope marker is inserted (there, the scope marker is not vacuous!), but S-structure. The reason may be that in German it is mainly at S-structure that the relative scope is fixed. The second kind of improper Q-insertion is shown in (64) and is rather obvious. Here, Q is generated below the base position of Q, and therefore causes a change in relative scope at LF: (64)
Downward scope marking a.
[ ... Q',... I ... Q 2 - I . . . Q - ]]]
(D/S-structure)
b.
I ...pre x ... I ... 02». I ... lo/ Q ' Q i 1 ·.·]]]
(Logical Form)
The mechanism presented above allows reconstruction to a scope marker, so why should it not be allowed in this case? It seems that scope marking always proceeds upward in the tree, or should proceed upward at least at one step in the derivation. In (64) this is never the case. Now consider an S-structure like (64-a), which contains no traces left behind by movement. This is a configuration that typically does not allow for scope inversion. But this is exactly what downward scope marking does, so it over-generates and it has to be blocked. What is interesting is that both kinds of improper scope marker insertion have something in common, and hence can be blocked by a single constraint: (65)
Proper Q ( P R O P - Q ) A scope marker Ql is licensed if and only if it c-commands a contraindexed quantifier Q J that breaks the extended chain that is formed by Q1 and its coindexed quantifier Q' , 26
It is clear that this constraint blocks both vacuous scope marking and downward scope marking. 27 Vacuous marking is blocked because if a marker does not c-command a contraindexed quantifier, it is vacuous by definition. And for cases of downward marking that could result in inversion it is exactly the same: The scope marker is vacuous. I have no evidence how P R O P - Q should be ranked within the hierarchy because it does not stand in conflict with any other constraint. It simply filters out candidates with an improper scope marker as they occur at any level of representation. This might be an indication that P R O P - Q is part of G E N .
Cyclic Optimization
201
7 Arguments for the Cycle In this last section I want to review arguments that help to differentiate the approach of cyclic optimization from another approach. Remember that the goal of this paper was to derive the correspondence between quantifier scope and basic word order, as it is suggested by the data. Now, instead of using cyclic optimization one could think of defining candidates as triples (D-structure, S-structure, LF). It would be absolutely reasonable to do so, because there is a well-defined term of optimality for such an approach: A triple t = (DS, SS, LF) is optimal with regard to the other triples t\...tn iff the constraint profile of t is better than the profiles of t\...tn, where the three slots of every triple are computed in a parallel fashion and then the violations are summed up in one big table. I will refer to this approach as the parallel approach. However, I think that there are three arguments that might support the cyclic approach: 1. Complexity of derivation: If in the parallel approach an LF is checked for optimality, there is no way to tell if it will ever have the chance to be a winner because its D-structure and its S-structure are computed at the same time. So this mechanism has to compute the whole set of structures exhaustively. In the cyclic approach, however, a candidate will only be optimal if all of its levels of representation are optimal. An LF that is based on a non-optimal D-structure cannot be part of a winner even if the LF itself is optimal. In the cyclic approach only those LFs that descend from an optimal D-structure and an optimal S-structure are computed. All other potential LFs are filtered out somewhere previously in the computation. 2. Potential reranking: Under a parallel approach the constraint ranking is fixed during the computation. That is, if a triple t = (DS, SS, LF) is evaluated all elements of t are evaluated under the same ranking. Under the cyclic approach, however, it is in principle possible to reorder the constraints on the way from one cycle to the next (cf. McCarthy & Prince 1993:12, Mester 1999, Rubach 2000). I would like to point out why this might be a desirable consequence. Recall the constraint TYPE, that was introduced to derive the fact that the Vorfeld in an German sentence has to be filled with a constituent in unembedded contexts. Clearly, in order for TYPE to have any effect it must outrank ECON and REC. At LF, however, it is exactly those topicalized elements that undergo reconstruction and therefore TYPE must not have any impact at that point of the derivation. There are at least two ways to achieve this. Either, one stipulates that certain constraints are "switched on" at a certain level of
202
Fabian Heck
representation and "switched o f f " at another level. Or one makes use of the mechanism of neutralizing the effects of one constraint by ranking it sufficiently low within the given hierarchy. In the case at hand, this implies that at the surface we are dealing with a partial order TYPE ECON, REC, whereas at L F we are dealing with the inverted order REC » TYPE, ECON. Since reranking is the mechanism that is supposed to be the locus of parametric change in optimality theory anyway, it seems to me that the second strategy is the most natural way to deal with the problem at hand. 3. Accumulation of violations: The last argument is an empirical one. The point is that cyclic optimization makes the prediction that each time a new cycle is entered the reset button is pushed and all the violations so far are deleted from memory. That is, the next competition restarts without any burden from the past cycles. Now suppose that there is a violation of some constraint at D-structure (that is the first slot in the parallel approach), and suppose further that this violation does not have any impact on the computation of the optimal candidate. Take a look at this violation in the parallel approach, where all violations f r o m every slot of the triple are accumulated. There this violation might be decisive for the failure of the whole computation because it might just be the violation that makes the sum of violations of the relevant candidate surmount the sum of violations of some other candidate. Since the two approaches make different predictions at this point, it could serve as a test to check their empirical adequacy. An appropriate configuration for such a test, however, may be hard to find.
Notes This paper is a partial elaboration of my masters thesis. It was supported by the DFG grant MU 1444/2-1 for the project "Optimalitätstheoretische Syntax des Deutschen" at the University of Stuttgart. For comments and discussion I want to thank Jane Grimshaw, Gereon Müller, Tanja Schmid, Arnim von Stechow, Wolfgang Sternefeld, Sten Vikner, Ralf Vogel, and the audience of the DGfS 1999 workshop "Competition in Syntax". Of course, all errors are mine. 1. This assumption is called the concept of transparent LF and is in opposition to May (1985), who allows ambiguity even at the level of LF. 2. By the way, whenever I use the term scope inversion I refer to a configuration that is an inverted variant of the S-structural scope configuration. 3. Actually, in McCarthy & Prince (1993) the authors speculate that it might be useful to define the process of optimization on different levels of morphological
Cyclic Optimization
4.
5.
6.
7.
203
representation. That is, something similar to the current proposal is considered as a possible strategy. However, the considerations remain at best sketchy. There is a variety of different approaches to what should be the input to GEN. The debate is not finished yet, but the assumptions made here are rather unexotic, I think (cf., for instance, Grimshaw 1997). I should stress that the question marks before the b-examples do not indicate ungrammaticality (e.g., due to a violation of economy), but only markedness. In a suitable context the b-examples may be even more appropriate than the aexamples. By and large this amounts to a theory of marked word order as has been proposed by Choi (1996), Costa (1998), or Büring (this volume). The point here is that the base generated structure should be the least specific one and hence should be unmarked in the given neutral context. To show that AGENT can really be blocked by ANIM, one needs examples in which the agent is not animate, because only then is there a potential situation in which the two constraints are in conflict and only then can we tell if animacy is really more important than agentivity. These are very rare; however, (8) may be a relevant instance. The behaviour of psychological verbs that take a dative argument can be explained along the same lines: (i) a.
b.
daß dem Fritz die Tänzerin gefallen that the FritzoAT the dancing-girlNOM pleased 'that Fritz was pleased by the dancing-girl' ?daß die Tänzerin dem Fritz gefallen that the dancing-girlNOM the FritzoAT pleased 'that Fritz was pleased by the dancing-girl'
hat has hat has
However, psychological verbs that take an accusative argument are still mysterious because in the approach of Vogel & Steinbach ( 1998) accusative case remains unaffected by the word order constraints, and this for good reasons. 8. Examples like those in (11) presuppose a theory of linking as proposed by Wunderlich (1997). The point is that we need to assure that the 0-role agent is always linked to nominative case (if there is such a role). If it were not, then the following example (with the intended meaning given below it), in which the arguments that bear dative and the 0-role agent coincide, would be a possible candidate: (i)
daß einem Blauhelm ein Flüchtling geholfen hat that a UN-soldieroAT a refugeeNOM helped has 'that a UN-soldier helped a refugee'
Since (i) respects AGENT and ADJA, in contrast to (11), it would block (11) as suboptimal. However, if we assume that the linking between case and 0-role of an argument is determined by the lexicon entry of each verb, then we can avoid
204
Fabian Heck
this unwanted consequence. Then (i) would never be generated (at least not with the given meaning). 9. It is clear that the constraints that guide S-structure movement and LF movement should not have any effects on the level of D-structure. One gets this for free if one assumes that there cannot be any movement before D-structure has been built up. In other words, the constraints that will be introduced in the next cycles are present in the first cycle as well, but they are without impact for independent reasons. So we do not have to bother about the explicit ranking of these constraints with respect to the constraints met so far in the hierarchy of D-structure. 10. Basically, the idea is that scope markers are generated freely within the structure. In some sense this is different from Diesing (1996), Diesing (1997), and Vikner (this volume), where it is assumed (in a generative semantics style) that the relative scope is already part of the input, which in turn means that the information about relative scope is doubled: First it is encoded in the input and then it is encoded within the syntactic structure. In opposition to this, here the relative scope is determined by an optimized LF only, not by the input. 11. From now on, I shall dispense with explicitly mentioning the D-structure constraints in the ranking because they will be of no importance anymore, neither at S-structure nor at LF. Simply assume that they are the lowest ranked constraints in the hierarchies of S-structure and LF. 12. This may be SpecCP or the specifier of another functional head as the Top head argued for in Müller & Sternefeld (1993). 13. Of course, we cannot satisfy ALIGN first and then move the focused constituent to the Top position, because ALIGN is dependent on S-structure and cannot be satisfied by a trace. This, however, does not mean that I assume every constraint to be dependent on the surface. That traces sometimes can do a job can be seen from examples like the following: (i)
Which people] [ t', seem to each otheri | ti to be intelligent ]]?
14. SP will also be assumed not to be satisfiable by traces, for reasons that hopefully will become clear. 15. Focal stress is indicated by capital letters. 16. It is not clear who originally came up with this idea. But one of the latest proposals which follow this strategy is due to Wolfgang Sternefeld (cf. Sternefeld 1993). 17. Note also that the S-structure hierarchy is rather different from this, namely: TYPE »
ALIGN »
SP »
Q U I B , ECON »
Q R , R E C . R e c a l l t h a t T Y P E is a b o v e
ALIGN because focused elements can be topicalized. TYPE and ALIGN are both above economy for obvious reasons. SP is above QUIB because in German Sstructure movement over a scope bearing element is possible. Economy is above QR because there is no S-structure movement of strong quantifiers (without a scope marker). Finally, REC is ranked below economy because it would be unreasonable to stipulate some kind of Yo-Yo movement. That is, we do not want
Cyclic Optimization
205
to assume that on the way from D-structure to S-structure a movement operation will first move a constituent upwards in the tree and then reconstruct it again. 18. According to May (1985), reconstruction leaves behind a little pro which is then deleted. I will follow him in this, but only for reasons of readability. There is nothing that hinges on this assumption. So any time you encounter a barred pro you know that reconstruction has applied. 19. Of course, inversion should be impossible if there are no appropriate traces. 20. The idea behind this is that candidates with the scope marker and candidates without the scope marker are not in the same candidate (or reference) set, hence not in the same competition. What this amounts to is that candidates with different semantics cannot compete, as has been argued for by Fox (1995), for instance. For an extensive discussion about how the notion of reference set should be defined, cf. Sternefeld (1997). 21. This derivation could as well be blocked by inserting a scope marker for the topicalized quantifier. Then the strong quantifier could not move to a position above the scope marker since this is a scope bearing element. 22. Of course, raising the embedded quantifier into any position whatsoever causes a violation of economy. But here we are only interested in movement that causes scope inversion. 23. Since the following tables focus on reconstructability, they only contain candidates without scope markers. 24. A derivation of (60) that first moves the embedded quantifier Q2 to a scope marker across the matrix quantifier Qi and then in turn raises Qi across Q2 by anti-focus scrambling would end up in a linear order identical to the optimal D-structure. But since an appropriate trace would then be available, we would expect (60) to have an inverted reading if Q2 is focused. And this is not borne out empirically. However, it can be shown that this derivation can be blocked by PROP-Q; see below.
25. To put it correctly, the domain marked by Ql does not contain any scope bearing element that is not already contained in the domain that is marked by Q 1 . 26. A constituent breaks a chain if and only if it is c-commanded by at least one element e\ of the chain and in turn c-commands at least one other element