150 21 28MB
English Pages 410 [412] Year 1987
Domains and Dynasties The Radical Autonomy of Syntax
Studies in Generative Grammar The goal of this series is to publish those texts that are representative of recent advances in the theory of formal grammar. Too many studies do not reach the public they deserve because of the depth and detail that make t h e m unsuitable for publication in article form. W e hope that the present series will make these studies available to a wider audience than has hitherto been possible. Editors: Jan Koster Henk van Riemsdijk Other books in this series: 1. Wim Zonneveld A Formal Theory of Exceptions in Generative Phonology 2. Pieter Muysken Syntactic Developments in the Verb Phrase of Ecuadorian Quechua 3. Geert Booij Dutch Morphology 4. Henk van Riemsdijk A Case Study in Syntactic Markedness 5. Jan Koster Locality Principles in Syntax 6. Pieter Muysken (ed.) Generative Studies on Creole Languages 7. Anneke Neijt Gapping 8. Christer Platzack The Semantic Interpretation of Aspect and Aktionsarten 9. Noam Chomsky Lectures on Government and Binding 10. Robert May and Jan Koster (eds.) Levels of Syntactic Representation 11. Luigi Rizzi Issues in Italian Syntax 12. Osvaldo Jaeggli Topics in Romance Syntax 13. Hagit Borer Parametric Syntax 14. Denis Bouchard On the Content of Empty Categories 15. Hilda Koopman The Syntax of Verbs
16. Richard S. Kayne Connectedness and Binary Branching 17. Jerzy Rubach Cyclic and Lexical Phonology: the Structure of Polish 18. Sergio Scalise Generative Morphology 19. Joseph E. Emonds A Unified Theory of Syntactic Categories 20. Gabriella Her mon Syntactic Modularity 21. Jindrich Toman Studies on German Grammar 22. J. Guéron/H.G. Obenauer/ J.-Y.Pollock (eds.) Grammatical Representation 23. S.J. Keyser/W. O'Neil Rule Generalization and Optionality in Language Change 24. Julia Horvath FOCUS in the Theory of Grammar and the Syntax of Hungarian 25. Pieter Muysken and Henk van Riemsdijk Features and Projections 26. Joseph Aoun Generalized Binding. The Syntax and Logical Form of Wh-interrogatives 27. Ivonne Bordelois, Heles Contreras and Karen Zagona Generative Studies in Spanish Syntax 28. Marina Nespor and Irene Vogel Prosodie Phonology 29. Takashi Imai and Mamoru Saito (eds.) Issues in Japanese Linguistics
Jan Koster
Domains and Dynasties
The Radical Autonomy of Syntax
1987 FORIS PUBLICATIONS Dordrecht - Holland/Providence - U.S.A.
Published
by:
Foris Publications Holland P.O. Box 509 3300 AM Dordrecht, The Netherlands Sole distributor
for the U.S.A. and
Canada:
Foris Publications USA, Inc. P.O. Box 5904 Providence Rl 02903 U.S.A. CIP-DATA
Koster, Jan Domains and Dynasties: the Radical Autonomy of Syntax / Jan Koster. - Dordrecht [etc.]: Foris. - (Studies in Generative Grammar; 30) With ref. ISBN 90 6765 270 9 paper ISBN 90 6765 269 5 bound SISO 805.4 UDC801.56 Subject heading: syntax; generative grammar
ISBN 90 6765 269 5 (Bound) ISBN 90 6765 270 9 (Paper) ® 1986 Foris Publications - Dordrecht No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission from the copyright owner. Printed in The Netherlands by ICG Printing, Dordrecht.
Contents
Preface
vii
Chapter 1. The Invariant Core of Language 1.1. The research program 1.2. The configurational matrix 1.3. Domain extensions 1.4. Conclusion Notes
1 1 8 17 25 29
Chapter 2. Levels of Representation 2.1. Introduction 2.2. D-structure 2.3. NP-structure 2.4. Logical Form 2.5. Conclusion Notes
31 31 38 57 76 98 108
Chapter 3. Anaphoric and Non-Anaphoric Control 3.1. Introduction 3.2. Where binding and control meet 3.3. Some minimal properties of control 3.4. Infinitival complements in Dutch 3.5. Asymmetries between N and V 3.6. Conclusion Notes
109 109 110 113 119 136 141 142
Chapter 4. Global Harmony, Bounding, and the ECP 4.1. Introduction 4.2. On the nature of local domains 4.3. The Cinque-Obenauer hypothesis 4.4. The parametrization of dynasties 4.5. Global harmony 4.6. The grammar of scope 4.7. Conclusion Notes
145 145 147 153 159 172 201 231 236
vi
Domains and Dynasties
Chapter 5. NP-Movement and Restructuring 5.1. Introduction 5.2. Passives and ergatives in Dutch 5.3. Case, agreement, and subject drop in Dutch 5.4. A difference between English and Dutch 5.5. Reanalysis and covalency 5.6. Against reanalysis 5.7. Transparency without reanalysis 5.8. Restructuring in French 5.9. Conclusion Notes
239 239 242 257 265 271 279 288 296 312 314
Chapter 6. Binding and its Domains 6.1. Introduction 6.2. Reflexives in Dutch 6.3. The principles B and C in English and Dutch 6.4. Principle C effects in parasitic gap constructions 6.5. Conclusion Notes
315 315 323 341 356 368 369
Chapter 7. The Radical Autonomy of Syntax
371
Bibliography
377
Index of names
385
General index
388
Preface
Linguistics, like any other field of inquiry, can only make progress through a certain diversity of viewpoints. Although there have been many challenges to "standard" theories of generative grammar, there have been relatively few major controversies within what is often referred to as the Theory of Government and Binding. The theory presented in this study accepts the major goals of Government and Binding, but differs from the standard view in a number of respects. The basic difference is that the theory of Domains and Dynasties entirely rejects the notion "move alpha" and, therefore, the idea of levels connected by "move alpha". Apart from Lexical Structure and Phonetic Representation, only one level is accepted, namely the level of S-structure. In my opinion the traditional level of D-structure can most appropriately be seen as a substructure of S-structure, while the notion of Logical Form is rejected altogether. This study grew out of my reactions to Chomsky's Pisa lectures. Shortly before the Pisa lectures, I had published a version of Subjacency (the Bounding Condition) that appeared to be almost indistinguishable from principle A of the binding theory. This strongly suggested that a generalization was being missed. Currently, more than seven years after the Pisa lectures, a condition like the Bounding Condition also shows up in mainstream GB theories under the name O-subjacency, and also in the idea that all traces are antecedent-governed in a strictly local domain. It seems to me that such a strict locality condition makes traditional Subjacency superfluous and that it brings back into focus what I consider one of the most important problems of the theory of grammar: how is the locality condition for the binding of traces related to the locality domains of other grammatical dependencies? The answer given here is that at an appropriate level of abstraction, there is a uniform locality condition for all grammatical relations of a certain type. The idea of a uniform locality condition leads to the Thesis of Radical Autonomy. According to this thesis, core grammar is characterized by a configurational matrix of properties that are entirely constructionindependent. A further perspective is that the configurational matrix determines the form of a computational faculty that is not intrinsically built for language. Grammar in the traditional generative sense is perhaps only an application of this computational module, in the same way that book-keeping is an application of arithmetic. Language in this view only
viii
Domains and Dynasties
originates through the interaction of the abstract computational module with our conceptual systems, whereas the lexicon can be considered the interface among these components. Rules like LF-movement cannot be fundamental computations from such a perspective since they are specific to certain conceptual contents, which belong to a different and presumably equally autonomous system. Research for this book started in 1979 in a project (Descriptive Language) organized by the University of Nijmegen and the Max Planck Institute for Psycholinguistics and sponsored by the Netherlands Organization for the Advancement of Pure Research (Z.W.O.). The original versions of my theory were discussed with Angelika Kratzer of the Max Planck Institute, and with Dick Klein and John Marshall of the University of Nijmegen, among others. The many visitors to the Max Planck Institute, Robert May and Edwin Williams in particular, also contributed much to the development of my views. Also during this time, I had regular meetings with a group of linguists from the Federal Republic of Germany. This book would probably not exist without the many discussions of Chomsky's Pisa lectures I had with Tilman Höhle, Craig Thiersch, Jindra Toman, Hans Thilo Tappe, and many others. I have very good memories of the friendship and encouragement I experienced in this group. Most of the work on this book was done after I joined the faculty of Tilburg University in 1981. Here, I worked under the excellent conditions created by Henk van Riemsdijk. As ever, I felt greatly stimulated by the harmonious combination of friendship and polemics dating back to our student days. Several aspects of this study were discussed with Henk, and also with my other colleagues at Tilburg, including Reineke Bok-Bennema, Norbert Corver, Jan van Eijck, Anneke Groos, Casper de Groot, Anneke Neijt, Rik Smits, and Gertrud de Vries. Furthermore, I was able to discuss my work with several visitors, such as Ken Hale, Jean-Roger Vergnaud, and Maria-Luisa Zubizarreta. More than anything else, the content of this study was inspired by the seminal work of Richard Kayne. I learned very much from our discussions and from the critical comments that Richie gave me on several parts of the text. Likewise, I was inspired by the work of Guglielmo Cinque and Hans-Georg Obenauer, as is clear from several chapters. In addition, I would like to thank Guglielmo Cinque for his detailed comments on large parts of the text. Other colleagues and friends I would like to thank for comments include Hans den Besten, Elisabet Engdahl, Ton van Haaften, Riny Huybregts, David Lebeaux, Robert May, Carlos Otero, Christer Platzack, Thomas Roeper, and Tarald Taraldsen. I am grateful to Gaberell Drachman of Salzburg University, Austria, for giving me the opportunity to present parts of this book at the Salzburg International Summer Schools of 1982 and 1985. I was much encouraged and stimulated by the discussions and the friendship of the many participants. As for the 1982 Summer School, I would like to acknowledge
Preface
ix
the contributions of Sascha Felix, Wim de Geest, Liliane Haegeman, Hubert Haider, David Lebeaux, Anna Szabolcsi, and Dong-Whee Yang. Of the 1985 Summer School, I would like to mention Elena Benedicto, Clemens Bennink, Leonardo Boschetti, Anna Cardinaletti, Kirsti Christensen, Günther Grewendorf, Willy Kraan, Martin Prinzhorn, Alessandra Tomaselli, and Gert Webelhuth. The Netherlands Organization for the Advancement of Pure Research (Z.W.O.) gave me the opportunity to visit MIT and the University of Massachusetts at Amherst in the fall of 1983 (grant R30-191), which I hereby gratefully acknowledge. At MIT, I discussed parts of chapter 4 with Noam Chomsky, Danny Jaspers, Carlos Quicoli, Luigi Rizzi, and Esther Torrego, among others. At Amherst, I profited from the comments of David Pesetsky and Edwin Williams. Charlotte Koster read the whole text and proposed many improvements of both content and style. Especially chapter 6 owes much to her ideas on learnability. I would like to thank her in more ways than one, as ever! In preparing the final text, I received excellent editorial assistance from Rita DeCoursey of Foris Publications and technical assistance from the staff of my current department at the University of Groningen. In the department, Corrie van Os helped me with the bibliography and Wim Kosmeijer compiled the index. Versions of chapters 1 and 3 were published earlier, respectively in Theoretical Linguistic Research 2 (1985), 1-36, and Linguistic Inquiry 15 (1984), 417-459, and are reprinted here with kind permission of the publishers. Jan Koster Groningen, December 1986
Chapter
1
The Invariant Core of Language
1.1. The research program Recently, Noam Chomsky appropriately characterized the goal of generative grammar as a contribution to the solution of "Plato's problem": how can we know so much given that we have such limited evidence?1 Among the cognitive domains that confront us with this problem, our language is a particularly striking and important example. In studying human language, it is difficult not to be impressed by the richness, subtlety, and specificity of the system of knowledge acquired. Since only a fraction of this richness seems to be encoded in the evidence available to the language learner, much of the architecture of the acquired system must spring from the innate resources of the organism itself. Either the learning child possesses rich powers of abstraction and generalization (general learning strategies), or its inborn capacities involve an articulated and specific system that is only triggered and "finished" by the evidence. There is, to my knowledge, no research program in linguistics that is based on general learning strategies and that is even beginning to come to grips with the richness of our knowledge of language. So far, only the second approach, i.e. the attempt to formulate a highly articulate initial scheme, has attained a promising degree of success. I therefore believe that this is the right approach to Plato's problem in the domain of natural language. This conclusion is sometimes called pretentious or unmotivated, but it is often hard to see what motivates the opposition beyond prejudice. On the one hand there is not the slightest evidence that the data available to the child, or "general learning mechanisms", are rich enough to account for the nature of the system acquired; on the other hand, the program based on the alternative, the assumption of an articulate initial scheme, has led to a very successful research program. I fail to see how critics of the Chomskyan program can account for the total lack of success of the other theories and the continuous development and success of the program criticized. Even if one fully agrees with Chomsky's approach to Plato's problem, there are different ways to execute the research program based on it Generative grammar in Chomsky's sense is a much more pluriform enterprise than it is sometimes believed to be. This pluralism is generally 1
2
Domains and Dynasties
considered healthy and even necessary for progress, as in any other science. It is a truism that one of the most effective tools towards progress is criticizing existing theories by the formulation of challenging alternatives. Given the Chomskyan approach to Plato's problem, then, we can distinguish several largely overlapping but sometimes conflicting lines of research. The most common line of research has always stressed the importance of distinct levels of syntactic representation. Most of these levels are supposed to be connected by a special mapping, nowadays generally referred to as "move alpha". Chomsky, for instance, distinguishes lexical structure, D-structure, S-structure, Logical Form (LF), and Phonetic Form (PF). Van Riemsdijk and Williams (1981) add yet another level to this series, namely the level of NP-structure. My own approach differs somewhat from this commonly assumed picture. It has always seemed to me that with the introduction of trace theory in Chomsky (1973), the original arguments for certain levels have lost their force. To a certain extent, this was also observed by Chomsky at the end of "Conditions on Transformations" (1973): as soon as you have traces there is an obvious alternative according to which traces are basegenerated at S-structure. In this view, D-structure is not necessarily a separate level, but can also be interpreted as a substructure or a property of S-structure. 2 Chomsky has never been convinced of the meaningfulness of the alternative, mainly because of the alleged properties of "move alpha". In Chomsky's view, the alternative could only be formulated with interpretive rules at S-structure that duplicate the unique properties of "move alpha". 3 Since I believe that this latter conclusion is false, I have been trying to develop the alternative in Koster (1978c) and subsequent papers. These attempts have nothing to do with a general preference for frameworks without transformations or with a preference for context-free rules in the sense of Gazdar and others. 4 I agree with Chomsky (1965) that the significant empirical dimension of the research program has little to do with the so-called Chomsky hierarchy. What is significant is the attempt to restrict the class of attainable grammars (perhaps to a finite class) in a feasible way. From this point of view, formulating grammars with or without transformations is not necessarily a meaningful question (apart from empirical considerations). My main argument is that I consider the attempts to isolate the properties of "move alpha" entirely unconvincing. "Move alpha" exists only to the extent that it can be shown to have properties. Neither attempts to establish properties of "move alpha" directly, nor attempts to establish movement indirectly by attributing special properties to its effects (traces) have been successful, in my opinion. At the same time, it is understandable that these attempts to isolate "move alpha" as something special have inhibited research into unified theories, i.e. theories that subsume movement and, for instance, anaphora under a common cluster of properties.
The Invariant
Core of Language
3
Functionally speaking, "move alpha" is insufficiently general for the job that it is supposed to do. Movement can be seen as a transfer mechanism it connects certain categories with deep structure positions (which are also available at S-structure under trace theory) and transfers the Case- and 9license of these positions to the moved categories. It is hardly controversial that not all transfer can be done by movem e n t A standard example demonstrating this is left dislocation: (1)
That book, I won't read it
Originally, such sentences were also derived by movement transformations (see Ross (1967)). But it is generally assumed now that (1) and many similar cases of transfer cannot be accounted for by "move alpha". An example like (1) shows that anaphors like it can transfer 9-roles to N P s (like that book) in non-8-positions. This independently needed transfer mechanism makes "move alpha" superfluous. Obviously, we can do with only one general transfer mechanism from dependent elements to their antecedents. This transfer mechanism is instantiated by (1) and in a similar way by a "movement" construction like (2): (2)
Which book did you read t?
The trace t in (2) appears to behave like the pronominal it in (1) in the relevant aspects. The burden of proof is certainly on those who claim that we need an entirely new transfer mechanism ("move alpha") beyond what we need anyway for (1). Attempts have been made to meet this burden of proof, but the question is whether these attempts have been successful. If "move alpha" is superfluous from a functional point of view, it might still be argued that it can be recognized by its special properties. Chomsky (1981b, 56) argues that the products of "move alpha", traces, have the following three distinct properties: (3)
a. b. c.
trace is governed the antecedent of trace is not in a 0-position the antecedent-trace relation satisfies the Subjacency condition
Note, however, that none of these properties uniquely distinguishes movement from other grammatical dependency relations. It is already clear from (1) that also the antecedents of lexical anaphors (or pronominals) can be in non-8-positions (3b). Also, government (3a) is not a distinguishing property, because all lexical anaphors bear Case and must therefore be governed. 5 The only plausible candidate for the status of distinguishing property has always been Subjacency (3c). It is for this reason that I have focused on this property in Koster (1978c) and elsewhere. The crucial question from my point of view, then, is whether Subjacency is really that different
4
Domains and
Dynasties
from, say, the locality principles involved in the binding theory of Chomsky (1981b). If we take a closer look at Subjacency, it can hardly be missed that the form it is usually given (and which is clearly distinct from the anaphoric locality principles) is entirely based on certain idiosyncrasies of English and a few other languages. Under closer scrutiny, Subjacency as a separate property appears to dissolve. The version originally proposed on the basis of English in C h o m s k y (1973) simply conflates a general locality principle with a small extension for limited contexts in English. Before I demonstrate this with examples, I would like t o stress t h a t I consider Subjacency, or more generally, the idea that " u n b o u n d e d " movement is built up from a succession of local steps, as one of the most important advances of generative grammar in the 1970s. T h a n k s to Subjacency, it has become clear for the first time that grammatical dependency relations that look wildly different at the surface might, contrary to appearances, be instantiations of a c o m m o n underlying pattern. Subjacency has been a crucial conceptual step, and my own attempts at further unification only became possible because of Subjacency, which reduced a mass of seemingly unbounded relations to a simple local pattern. My criticisms d o not concern Subjacency as a strict locality principle, but the particular form given to it in Chomsky (1973), which makes it unsuitable for further unification with other locality principles. If we want a further unification, we have t o get rid somehow of the differences between the locality format for movement (Subjacency) and, for instance, for a n a p h o r a (principle A of the binding theory). At first sight, this is not so easy because there seem t o be some clear differences. These differences can be summarized as follows: (4)
a.
b. c.
Subjacency is often formulated as a condition on derivations, while principle A of the binding theory is a condition on representations Subjacency involves two domain nodes, while principle A only involves one node (the governing category) Contrary to Subjacency, principle A involves opacity factors like I N F L or S U B J E C T
Given the desirability of unification, these differences present themselves as a puzzle: how can we show that "move alpha" and anaphoric binding are governed by the same basic locality principle? Let us consider in turn the differences listed in (4). Originally, Subjacency was formulated as a condition on derivations. But Freidin (1978) and Koster (1978c) claimed that, with traces, it could just as well be formulated as a condition on representations. Also Chomsky (1985a) formulates Subjacency as a condition on S-structure. So, it is questionable whether this point is still controversial: we can simply formulate
The Invariant Core of Language
5
Subjacency as a condition on S-structure, just like principle A, as long as there is no evidence to the contrary. There is also an easy solution to the second difference. In Koster (1978c) it was concluded that even for English, Subjacency could be replaced by a one-node domain statement (like the later principle A for anaphors) for all contexts except one. The standard two-node formulation was based on the peculiar postverbal context of English, which was a bad place to look to begin with. Thus, in general, the bounding facts of English can be formulated by specifying just one bounding node, S' or NP. Much of the subject condition of Chomsky (1973), for instance, follows from a condition that says that elements cannot be extracted from an NP: (5)
*Who did you say that [NP a picture of t] disturbed you?
The one-node format would have been sufficient for these cases, but it did not seem t o be for contrasts like the following: (6)
a. b.
Who did you see [NP a picture of t] *Who did you hear [NP stories about [NP pictures o f t ] ]
Even from English alone, however, it is clear that (6b) is irrelevant for a choice between a one-node and a two-node Subjacency format. The reason is that standard two-node Subjacency is both too strong (7b) and too weak (7a) for English in this context (7)
a b.
*Who did you destroy [NP a picture of t] Which girl did you consider [NP the possibility of [NP a game with t ] ]
As (7a) shows, one node can already lead to unacceptable sentences, while (7b) and many other examples show that extraction across two or even three bounding nodes may still yield acceptable sentences. In short, one node is sufficient for all contexts of English, except the postverbal context, in which we can find almost anything. The conclusion that Subjacency is a one-node condition was reinforced by the fact that even (6a) is ungrammatical in most languages, Dutch among them: (8)
*Wie heb je [NP een foto van f] gezien?
It must therefore be concluded that one node is sufficient for Subjacency in almost all languages known to have "unbounded" movement in all contexts, and in some languages, like English, in all contexts but one. In the exceptional context, two-node Subjacency is just as irrelevant as onenode Subjacency.
6
Domains and
Dynasties
O n the basis of the facts, then, we are justified in also taking the second step towards unification: both bounding and binding involve local domains that specify only one node. Of course, we are left with the problem of how to account for cases like (6a) and (7b), but it seems at least plausible that this problem has nothing to d o with Subjacency. Recently, I have tried t o give a solution for this problem by adopting certain ideas formulated by Kayne (1983). According to this solution (Koster (1984b) a n d chapter 4 below), the basic bounding domain is a one-node domain, which can be extended under very specific and partially universal conditions. A bounding domain can be extended only if the last trace of a chain is structurally governed and if all domains up to the antecedent are governed in the same direction. With some qualifications, to which I will return, I believe that bounding is constrained by t h e one-node format in all other cases. This part of the puzzle is therefore solved by splitting standard Subjacency in two parts: a universal one-node domain specification, and a domain extension based on the language-particular fact that prepositions can be structural governors in English, together with the fact that the direction of government is rather uniform in English. As I will argue below, the one-node domain that we have split off from Subjacency forms the basis of a construction-independent and universal locality principle. With respect to this one-node locality principle, all languages are alike, while languages differ with respect to the extensions, which are also the loci of parametric variation. If this hypothesis solves the first two aspects of the unification puzzle, the next step is trying to solve the third aspect by splitting off the same universal d o m a i n from the binding conditions for anaphors. In the case of anaphoric domains, it is already generally assumed that the locality format involves only one node, the governing category. T h e big problem here is how t o split off the opacity factors, such as I N F L and SUBJECT. It seems to me that the solution is very similar to what we saw in the case of bounding: there is a basic one-node domain defined without opacity factors; these opacity factors only play a role in partially languagespecific domain extensions. As before, English is a poor choice to illustrate this because this language has a relatively impoverished system of anaphors. But in many languages clitics are used in the domain of V, while different pronouns are used for binding into P P s and other constituents. For the clitics, the opacity factors are usually irrelevant: the clitics are simply bound in the minimal X m a x (S') in which they are governed, just like traces. 6 Thus, a clitic governed by V is bound in its minimal S', just like a trace governed by V. Often clitics cannot be bound in any other environment. French, for instance, uses a reflexive se in the domain of a verb, but other forms, like lui-même, in t h e domain of P and other categories (see chapter 6 for a more elaborated account). Dutch forms a very interesting illustration of this point of view. This
The Invariant
Core of Language
7
language has at least two reflexives, zich and zichzelf. T h e crucial fact is that these reflexives overlap in the domain of V, but contrast in other contexts (i.e. in extended domains), for instance in the domain of P. The following examples illustrate this: (9)
a. b.
Jan Jan Jan Jan
wast washes wast washes
zichzelf himself zich himself
It is not the case that both reflexives occur with all verbs in this context, which is probably a lexical fact T h e point is that verbs that select both forms can have them in the same context, namely the domain of V. We can account for the sentences in (9) by a domain statement that does not refer to opacity factors like SUBJECT. We can simply say that both zich and zichzelf are bound in the minimal X m a x of the governor V (under the assumption that this domain is S'). I assume that in the u n m a r k e d case both Dutch reflexives are bound in their minimal X m a x (in practice only the minimal S') without any reference to opacity factors. Opacity factors only play a role in the marked case, under so-called "elsewhere" conditions. Thus, if the reflexives are not bound in their minimal X m a x , they contrast with respect t o the notion subject: zichzelf must be bound in the minimal domain containing a subject, while zich must be free in this domain. T h e contrast is illustrated in the following examples, in which the reflexives are bound across a P P boundary (and therefore not bound in their minimal governing X m a x ): (10)
a. b.
J a n schiet [ p p op zichzelf] J a n shoots at himself *Jan schiet [ p p op zich]
Thus, in Dutch the distinction between the basic domain and the extended domain (which involves opacity factors) can be detected by the fact that the two reflexives overlap in the former domain while they are in complementary distribution with respect to the latter domain. There is much more t o say about Dutch reflexives (see Koster (1985) and chapter 6 below), but the basic approach is clear from these simple examples. The path towards unification, then, can only be followed if we see that neither standard Subjacency (with its two nodes) nor binding principle A (with its opacity factors) formulates the primitive locality domain for the dependency relations in question. Both conditions conflate the common universal part with language-particular extensions. If we split off the extensions, it appears that bounding and binding are governed by exactly the same basic locality principle. T h e approach taken here involves a theory of markedness. T h e un-
8
Domains
and
Dynasties
marked locality principle for all local dependencies in all languages is a simple one-node domain principle that says that an element must be connected with its antecedent in the minimal X m a x in which it is governed. Beyond this, there are only marked extensions from which languages may or may not choose. Both directionality factors in the sense of Kayne (1983) and opacity factors in the binding theory belong to the theory of markedness. The theory of markedness is also the main locus of parametrization. The basic, unmarked domain might be part of all languages without parametrization; this certainly is the strongest possible hypothesis, one that we would like to maintain as long as possible. If all this is correct, the unmarked format for Subjacency (the Bounding Condition of Koster (1978c)) is indistinguishable from the unmarked locality format for binding. None of the properties in (3), then, distinguishes "move alpha" from any other dependency relation in the unmarked case. If "move alpha" can be detected neither by its functional role nor by its properties, then without new evidence, there is no reason to assume that "move alpha" exists.
1.2. The configurational matrix The most fundamental notion of the theory of grammar is the dependency relation. Most grammatical relations are dependency relations of some kind between a dependent element 5 and an antecedent a: (11)
.. . a , . . . , 8 . . . R
In anaphoric relations, for instance, the anaphors are dependent on their antecedent Similarly, subcategorized elements that receive a 8-role or Case are dependent on some governor, usually the head of a phrase. There are many different types of dependency relations, but all have something in common, both functionally and formally. Functionally speaking, dependency relations have the following effect (12)
share property
Any kind of property can be shared by two properly related elements. Antecedent and anaphor, for instance, share a referential index, which entails that they have the same intended referent. A "moved" lexical category and its trace share one lexical content (found at the landing site) and one set of licensing properties (found at the trace position). Formally speaking, all dependency relations have the same basic form, while some have their basic form extended in a certain way. As already
The Invariant
Core of Language
9
indicated in the previous section, domain extensions are languageparticular options that result from parameter setting, and which fall within the limits of a very narrow hypothesis space, which is defined by Universal Grammar. Domain extensions for empty categories involve chains of equally oriented governors, and domain extensions for other anaphors involve the opacity factors or chains of governors that agree with respect to some factor. More will be said on domain extensions in the next section. In this section, I will only define the basic, unextended form of dependency relations. First, I will mention and briefly illustrate the properties of the relation R (of (11)). Then, I will discuss the question to what extent the list of properties has some internal structure. I will conclude this section with a discussion of the scope of the properties in question. As I have discussed elsewhere, it seems to me that basic dependency relations of type R (in (11)) have at least the following four properties: 7 (13)
a b. c. d.
obligatoriness uniqueness of the antecedent c-command of the antecedent locality
The first property, obligatoriness, is almost self-explanatory. All dependency relations with the properties of (13) are obligatory in the sense that the dependent elements in the relation must have an antecedent. Thus, a reflexive pronoun does not occur without a proper antecedent (14)
*I hate himself
A structure like (14), in which no antecedent for the reflexive can be found, is ill-formed, and if there is an appropriate antecedent, it cannot fail to be the antecedent: (15)
John hates himself
In this respect, the binding of reflexives differs from the binding of other pronouns, like the (optional) binding of him in: (16)
John thinks that Mary likes him
As is well known, we can optionally connect him with the possible antecedent John, but we may also leave the pronoun unbound. The second property, uniqueness, applies only to antecedents. Thus, we may connect an antecedent with more than one anaphon (17)
They talked with each other about each other
10
Domains and
Dynasties
But we can only have one antecedent for an anaphor, in other words, split antecedents are impossible: (18)
*John confronted Mary with each other
Again, this is not a necessary property of anaphoric connections. Pronominals differ from bound anaphors in that they can take split antecedents, as has been known since the 1960s: (19)
John told Mary that they had to leave
T h e third property, c-command, is so well known that it hardly stands in need of illustration here. In (20a), himself is not c-commanded by the antecedent John. For pronominals, c-command is not necessary, as shown by (20b): (20)
A B.
*[NP The father of John] hates himself [NP The father of John] thinks he is happy
The form of c-command that I have in mind is the more or less standard form proposed by Aoun and Sportiche (1983), according t o which the minimal X m a x containing the antecedent must also contain the anaphor. T h e fourth property, locality, is illustrated by the following contrast: (21)
a. b.
John hates himself *John thinks that Mary hates himself
Again, it can be observed that pronominals like him are not constrained by the locality principle in question: (22)
John thinks that Mary likes him
The standard form of locality for anaphors is given by principle A of the binding theory of Chomsky (1981b, ch. 3): anaphors must be b o u n d in their governing category. A governing category is the minimal X m a x containing the governor of an anaphor and a S U B J E C T (subject or AGR) accessible to the anaphor. T h e basic form of locality that I am assuming here differs from this standard format. Instead, I will assume that the Bounding Condition of Koster (1978c) is basic, not only for empty categories, but for all local dependencies: (23)
Bounding Condition A dependent element 5 cannot be free in: ... [ p . . . 5 . . . ] ... where P is the minimal X m a x containing 5 (and the governor of 5)
The Invariant
Core of Language
11
This locality principle accounts for the contrast between (24a) and (24b), under the assumption that S' is the relevant X m a x : (24)
a. b.
[s' John hates himself ] *[ S ' John thinks [s' that himself is sick]]
The following acceptable sentence is not accepted by the basic locality principle (23), because himself is not bound in the minimal P P in which it is governed: (25)
John depended [pp on himself ]
This sentence is only accepted by adding a marked option to the basic locality principle. According to this "elsewhere" condition, a reflexive must be bound in the extended domain defined as the minimal X m a x that contains a subject. Thus, principle A of the binding theory is considered a marked, extended domain from this point of view. 8 Apart from this not unsubstantial modification, the properties listed under (13) are well known, especially c-command and locality. What has not received much attention, however, is the fact that the properties in question form a cluster if a dependency relation involves locality it usually also involves c-command and uniqueness. The fact that these properties co-occur suggests that there might be some further structure to this collection. It seems to me that the relation R is in fact a function. According to the definition of a function, there is a unique value in the co-domain for each argument in the domain. Suppose now that we take dependent elements in a given structure as arguments. In that case, we can consider antecedents in the same structure as values. T h e function is not defined in structures without appropriate antecedents, and these structures are rejected. In this way, we account for the obligatoriness of R (property (13a)). Similarly, we account for the uniqueness property: a function always gives a unique value for a given argument, in this case a unique antecedent. Assuming that R is a function, the only two substantial properties are (13c) and (13d): c-command and locality, respectively. It seems to me that these two properties are not unrelated either. In fact, both properties are locality principles. C-command is locality seen from the perspective of the antecedent. It can be formulated as follows: (26)
C-command A potential antecedent a cannot be free in: . . . [p . . . a . . . ] . . . where P is the minimal X m a x containing a
This is very similar to the Bounding Condition (23), repeated here for convenience:
12 (27)
Domains and Dynasties Bounding Condition A dependent element 6 cannot be free in: ... [p ... 8 . . . ] ...
where P is the minimal X m a x containing 8 (and the governor of 8) The similarity between (26) and (27) is just too striking to be accidental. I assume therefore that R is a bilocal function, a function that gives a unique value (the antecedent) for each dependent element, in such a way that the antecedent is in the minimal domain of the dependent element (cf. (27)) and the dependent element in the minimal domain of the antecedent (cf. (26)). If this conclusion can be maintained, the list in (13) can be replaced by a simple function that shows a certain degree of symmetry with respect to the notion "locality". An intriguing question that I will not pursue here is whether there is a counterpart to the notion of domain extension for (26). Recall that one of the most general domain extensions for (27) involves the notion "subject". Under this extension, a dependent element is not accessible in the domain of a subject. If there is full symmetry in this respect, we expect that there are also languages that define their antecedent domain as a similar extension of (26): in such languages potential antecedents are not accessible in the domain of a subject I have argued elsewhere that it is exactly this situation that we find in languages like Japanese, Korean, and many others, in which only subjects can be antecedents for reflexives: if potential antecedents are not accessible in 'the domain of a subject, only the subject itself is accessible in the given domain (Koster (1982b)). If this conclusion is correct, then unrestricted c-command, as in English, is the unmarked condition for antecedents, while the subjects-only option for antecedents is a marked extension, not unlike the extensions that we find for anaphors in principle A of the binding theory. This would be a remarkable confirmation of the view that c-command is the antecedent counterpart of locality, as it is usually defined for the dependent element. In any case, it seems worthwhile to look not only for lists of correlating properties like (13) but also for the deeper structural principles from which these properties follow. The properties in (13) (and the principles from which they follow) define a configurational matrix for almost all grammatical dependency relations. There are surprisingly few relations that are not somehow characterized by the properties of this configurational matrix. In fact, there might be only one major class of exceptions, which I will briefly discuss in a moment. Furthermore, there are anaphoric systems, like the one for the reflexive zibun in Japanese, that seem to be characterized by locality on the antecedent (c-command) but not by locality on the dependent element (as in the case of English anaphors). The major exception that comes to mind is the class of dependencies
The Invariant Core of Language
13
that seem to be characterized by principles of argument structure. Thus, control structures are not generally characterized by the properties in (13). There are control structures without obligatory antecedents (28a), with split antecedents (28b), with non-c-commanding antecedents (28c), and with nonlocal antecedents (28d) (see Koster (1984a) and chapter 3 below): (28)
a. b. c. d.
It is John It is John
impossible [ P R O to help Bill] proposed to Mary [ P R O to help each other] difficult for Mary [ P R O to help Bill] thinks [s it is impossible [ s P R O to shave himself]]
In some cases, the antecedent of P R O must f-command it (in the sense of Bresnan (1982)). Similar observations can be made about anaphor binding in many languages. Even in English, c-command is not always necessary, as was observed by Jackendoff (1972): (29)
A book by John about himself
This does not mean that the configurational binding theory can be replaced for English by a theory based on argument structure. In languages like English and Dutch, possibilities like (29) are limited to certain prepositions, while c-command is much more generally usable. In control structures, principles of argument structure are more prominent in English, but even in the case of control these principles interact with the purely structural notions of (13) (see Koster (1984a) and chapter 3 below). One might argue that Universal G r a m m a r defines two systems: a system based on argument structure, and a purely structural system. The former system might be the older system, while the latter system might be the result of a later evolutionary development. Whatever the merit of these speculations, it seems to me that nonconfigurational principles have a minority position in most natural languages. Most dependency relations fall within the limits of the configurational matrix characterized by (13). At least the following dependency relations have the form specified by (13): (30)
a
b.
c. d.
licensing relations government subcategorization 9-marking Case assignment agreement subject-verb C O MP-verb anaphor binding movement
14
Domains
e. f. g.
and
Dynasties
NP-movement Wh-movement obligatory control predication gapping
For most of these dependencies, Chomsky (1981b, 1982a) postulates different modules, such as government theory, Case theory, binding theory, bounding theory, control theory, etc. Insofar as each of these subtheories has some characteristics of its own, I agree. But it would be a mistake to consider each subtheory a totally primitive structure. T o a large extent, the subtheories are made from the same stuff, namely the properties of the configurational matrix (13). In many cases, the fact that the construction types in (30) have the properties listed in (13) needs little illustration. It is clear, for instance, that the licensing relations, (30a), have the four properties: a subcategorized element is obligatorily dependent (13a) on a unique head (13b). Furthermore, the head c-commands its complements (13c) in a local domain, i.e. the head does not govern into the domain of another governor (13d). Similarly, the agreement relations, (30b), and the predication relation, (30f), have the four properties in a rather perspicuous manner. The other relations are interesting in that they seem to contradict the uniformity hypothesis in one way or another. Obligatory control has already been briefly discussed: a well-defined subclass of control structures has the properties listed in (13), as has been argued in Koster (1984a) and chapter 3 below. Anaphor binding and movement are the most problematic from the point of view of a unified theory. Both seem to involve wildly varying domains, within one language, and also across languages. Some of this variation has already been discussed, and I will return to it in the next section. I will conclude the present section with some nonstandard applications of the configurational matrix. First, I will give a brief review of the properties of the gapping construction, which is constrained by (13) in a nontrivial way. 9 One problem with gapping is that it is not quite clear what kind of representation is appropriate for coordinate structures. Often, coordination has been treated in terms of normal tree structures. Accordingly, the gaps in the gapping construction were handled by the usual transformational or interpretive processes. Thus, in Ross (1967) the gap in (31b) is created by deleting the corresponding verb in (31a): (31)
a. b.
John reads a newspaper and Mary reads a book John reads a newspaper and Mary — a book
Using essentially the same type of representation, others (like Fiengo (1974)) have replaced the deletion transformation by interpretive rules.
15
The Invariant Core of Language
More radical proposals do not consider coordinated structures as basic phrase markers but as the derivative product of a linearization rule. One of the earliest examples is Williams (1978), and more recently De Vries (1983) and Huybregts (to appear) have been exploring three-dimensional representations (based on set union of reduced P-markers in the sense of Lasnik and Kupin (1977)). For present purposes, I will assume representations in the spirit of Williams (1978), which is most readily accessible. In this kind of framework, conjuncts before linearization can be represented in columns: John S' and
NP
M
ary
reads
a newspaper MP a book
In this representation, elements in the same column have the same function. Thus, both John and Mary have the status of subject, and they receive the same 0-role. The conjuncts each occupy one row, and two conjuncts are properly coordinated if the minimal X m a x containing the column of the two conjuncts contains a conjunction. As before, we assume that S' can max function as the minimal x containing the elements governed by V (or INFL). Applied to (32), this means that both John and Mary and a newspaper and a book are properly coordinated. The column with John and Mary, for instance, is accepted by the conjunction and in its minimal S'. The same holds for the column with a newspaper and a book. In coordinate structures, then, the relation R of (11) is interpreted as a relation between conjunctions and columns of type X1 (where X1 is an element from the X-bar system). A special feature of (32) is that the gap of the second conjunct is not considered a deletion site or an empty V. The properties of the verb read are simply equally distributed over the members of the column to which the verb is related. Thus, in (32) both the book and the newspaper are governed by the verb read. If we assume that the relation between conjunctions and columns has the properties in (13), many facts about gapping are explained. Particularly, the local properties of gapping are explained if we assume that columns are only possible if they are licensed by a conjunction in the same local domain (in the sense of the Bounding Condition; see Koster (1978c, ch. 3)). For instance, the facts that Neijt (1981) seeks to explain in terms of Hankamer s Major Constituent Condition seem to follow. A relevant contrast is the following: (33)
a. b.
*Peter was invited by Mary and John Peter was invited by Mary and John
fayHH
by Bill
Contrary to (33b), the gap of the ungrammatical (33a) also includes the preposition by.
16
Domains and Dynasties
The explanation is straightforward, if we assume that gapping is constrained by (13). Consider the underlying representation of (33a):
(34)
a.
Peter S'
and
was invited
NP John Mary
by
MP Bill
PP
This sentence is ungrammatical because Mary and Bill are not properly coordinated, i.e. the maximal column containing these N P s is not licensed by a conjunction in the minimal local domain (which is the P P headed by by). The representation underlying (32b), however, is well-formed:
(34)
Peter
b. S'
and by
pp by
was invited
Np John Mary Np
Bill
In this case, Mary and Bill are part of the more inclusive P P column, thanks to the presence of the second occurrence of by. The P P conjuncts are properly coordinated because their column is licensed by the conjunction and in the minimal domain S'. These examples are representative of the local properties of gapping as described by the Major Constituent Condition of Neijt (1981). The facts straightforwardly follow from the Bounding Condition, which also determines all other local dependencies. Various other hitherto unexplained gapping facts follow from the hypothesis that gapping is constrained by the configurational matrix. So far, it is clear that the list in (30) covers an enormous mass of facts. Many entries are themselves abbreviations for large collections of constructions, "Wh-movement" for instance (see Chomsky (1977)). And yet the list is probably far too short, due to certain arbitrary limitations imposed on the relations considered. One such limitation is the fact that usually only those instantiations of R in (11) are considered in which a does not dominate 5. As soon as we drop this arbitrary limitation, the scope of the configurational matrix is considerably extended. Consider for instance the vertical relation in the Xbar system, and in phrase structure in general. All sister nodes depend on an immediately dominating mother node. The relation between mother and daughters has the properties in (13): the relation is obligatory (13a), there is always a unique mother to a given pair of daughters (13b), and clearly the relation is local (13d):
The Invariant Core of Language (35)
17
[VpV[ppPNP]]
P is the head of P P and not of VP, which (for the P) is beyond the limits imposed by the Bounding Condition. It seems to me, then, that there is a close relationship between the Bounding Condition and X-bar theory. The nodes of a projection form a family within the domain (X m a x ) defined by the Bounding Condition. Similarly, our modified concept of c-command applies (13c): not only are daughter nodes determined by the mother node within their minimal X m a x , but also the mother node determines the daughters within its minimal X m a x . It is somewhat accidental, perhaps, that vertical grammatical relations (like the relations between members of a projection) have hardly been studied from the same perspective as "horizontal" relations like anaphora and movement (an exception is Kayne (1982)). If we abstract away from the distinction related to dominance, it might appear that (13) simply sums up the properties of all local relations of grammar, including both those given in (30) and those implied by the X-bar system. In chapter 2, some applications of this perspective will be discussed. Henk van Riemsdijk has pointed out (personal communication) that scope relations can be seen as an instantiation of "vertical locality". Normally, quantified N P s are assigned a scope either by (an interpretation of) Q R (May (1977)) or by relating the quantified element to an abstract morpheme Q (in the sense of Katz and Postal (1964)). Both procedures have the effect that the properties of the scope relation are given the format of a "normal" dependency relation, in which the dependent element is not dominated by its antecedent. If the dominance/nondominance distinction is irrelevant, we can assign scope to a quantified element without QR or an abstract morpheme. We can simply interpret the scope of a quantified element as a relation between this element and the minimal S that contains (i.e. dominates) it. I will not pursue further the many intriguing consequences of interpreting (13) also as a property of vertical relations. Apart from the applications discussed in chapter 2, I consider the vertical dimension as a topic for future research.
1 3 . Domain extensions So far we have assumed that purely structural grammatical dependency relations have the same unparametrized form in all constructions in all languages (in the unmarked case). This form is determined by the properties in (13), which include the C-command Condition (26) and the Bounding Condition (27) as universal locality principles. For several constructions in several languages nothing further has to be said.
18
Domains and Dynasties
But in many languages the basic domain as determined by the Bounding Condition can be "stretched" in a certain manner. As mentioned before, domain stretching belongs to the theory of markedness. This conclusion is based on the fact that it is not universal and subject to parametric variation. A trace of Wh-movement, for instance, cannot be bound across a P P boundary in most languages. This fact follows from the Bounding Condition (27), which entails that a trace must be bound in the minimal P P (an X m a x ) in which it is governed. In other words, the domain for Wh-traces cannot be stretched beyond the size of a P P (or any other X m a x ) in most languages (with overt Wh-movement). English and the Germanic Scandinavian languages are among the very few languages with preposition stranding, which entails domain stretching beyond P P boundaries. But even in these languages, this marked phenomenon is limited to very narrowly defined conditions, to which I will return in a moment. Standard Subjacency blocks extraction from complex N P s (in the sense of Ross (1967)), but allows extraction from PPs. This shows that Subjacency, taken as a universal locality principle, is too permissive. It fails to indicate that extraction from P P is something rather exceptional, even in English. In retrospect, we can say that standard Subjacency conflates elements of the unmarked locality principle (27) with elements of the language-particular domain stretching that makes preposition stranding possible in certain contexts. In my opinion, one of the most interesting developments during the last few years has been the emergence of theories that try to describe exactly under what conditions domain stretching is possible. As mentioned in the first section, two types of domain stretching can be distinguished. According to the first type, a domain can be extended by specifying an extra category that the domain must contain. This option is probably limited to categories like subject, INFL, or C O MP. Thus, if a category is governed by a preposition, it must be bound within its minimal governing category ( = PP) in the unmarked case. By stipulating that the minimal domain must also contain a subject, the minimal domain P P is extended to the first S containing the PP (this S being the first category up that contains a subject). For English, this is the domain extension chosen for bound anaphors (see Chomsky (1981b, ch. 3) for further details). In languages that do not select this option for certain anaphors, the anaphors in question cannot be bound across P P boundaries. Examples were given in section 1.1 above. Here, I will limit myself to the second type of domain extension, the one that allows violation of Wh-islands in certain languages, among other things. F o r this type of extension, the key insight was provided by Kayne (1983): the path from dependent element to antecedent must meet certain conditions (see also Nakajima (1982)). In particular, Kayne observed that the direction in which the successive projections (up to the antecedent) are governed plays a crucial role in domains the size of which exceeds the size
The Invariant Core of Language
19
of the minimal X m a x . This insight led to some remarkable predictions; for instance, as to the (near) absence of parasitic gaps in SOV languages like German and Dutch (Bennis and Hoekstra (1984), Koster (1983, 1984b), and chapter 4 below). In addition to some minor modifications necessary for languages like Dutch, my interpretation of the directionality constraints differs somewhat from Kayne's. First of all, it seems to me that directionality plays no role in the assignment of scope (whether it is executed as LF movement or not). Second, directionality constraints belong entirely to the theory of markedness in my view. In the unmarked domain theory (entailed by the Bounding Condition (27)), directionality does not play a role (see chapter 4 for further details). It seems to me that Kayne's theory of path conditions can also be generalized for types of long distance dependencies other than Whmovement. Many languages have long distance anaphora, for instance (see Yang (1984)). As in the case of Wh-movement, domain stretching in these cases often depends on the nature of the successive governors. In Icelandic, for instance, long distance reflexivization is possible if all Vs from the reflexive up to the domain of the antecedent are in the subjunctive mood (see Maling (1981) and the literature cited there, and furthermore chapters 4 and 6 below). Possibly, there are very similar conditions on long Wh-movement in certain languages. Alexander Grosu has informed me, for example, that in certain cases of Rumanian long Wh-movement, all verbs of the path from trace to antecedent must take the supine form if the verb of the top domain (containing the Wh-antecedent) has the supine form (see also Georgopolous (1985) for uniform paths of the realis or irrealis). In general, then, long distance dependencies (other than successive cyclic Wh-movement) seem to require certain types of agreement among the successive domain governors. These governors form a chain that we might call a dynasty (Koster (1984b) and chapter 4 below): (36)
A dynasty is a chain of governors such that each governor (except the last one) governs the minimal domain containing the next governor.
Thus, the governors that can stretch the domain for Icelandic reflexives must be in the subjunctive form. The governors that can stretch the domain for Wh-traces must govern in the same direction, and so on. Until evidence to the contrary is found, I assume that there are only very few kinds of dynasties, and that their nature is determined by Universal Grammar. In fact, I know of only three kinds of dynasties, determined by the following types of agreement: directionality (for Wh-movement), interclausal verb agreement (subjunctive, supine, etc.), and agreement of lexical category (see below). If dynasties are defined by UG, the nature of domain extensions is not
20
Domains and
Dynasties
determined by data, a n d is not by itself a matter of parametric variation. Dynasties might just be dormant features of all grammars, which become available in certain cases if independent parameters are set. Thus, preposition stranding involves a certain type of domain extension (beyond the minimal P P containing the trace). It is presumably acquired by the language learner if certain data (for instance, stranded prepositions) show that the language under consideration has prepositions among its structural governors (see Kayne (1984, ch. 5)). Even if the domain extension is usually acquired on the basis of data, there is no reason to assume that the same holds for the nature of the dynasty, which determines where prepositions can be stranded and where not. Similarly, long distance reflexivization might be an option for all languages in which interclausal verb dependency is somehow expressed. W h a t is a matter of parametric variation, then, is the nature of the verbverb agreement, not the fact that it defines a domain extension. D a t a seem to play a role in the factors that trigger domain extensions, and not in the factors that determine their shape. If all this is correct, we have the following domain theory. The shape of grammatical domains is entirely determined by U G , by the Bounding Condition (27) in the unmarked case, and by a very limited n u m b e r of dynasty-governed domain extensions in the marked case. Parameters play a precisely defined and limited role in this theory: they block or open the way to certain domain extensions. In other words, parameters d o not play a role at all in the universal configurational matrix (13) that defines the basic shape of dependencies in all languages. In domain theory, parameters are the switches that separate the unmarked domain and its marked extensions. It is not unlikely that parameters play other roles as well, but there can be little doubt that the theory of parameters can develop beyond a mere statement of differences among languages only if the use of parameters is somehow severely limited. I will now turn t o the role and nature of dynasties in island violations. Until Chomsky (1977), generative grammar had a rather simple theory of islands. There were just a few, like the Complex N P Constraint (CNPC) and the Wh-island Condition, which were both explained by Subjacency. This theory was elegant and suggestive, but it was not entirely satisfactory for a number of reasons. Some reasons have already been mentioned, a m o n g others the stricter nature of island conditions in a language like Dutch. Other languages, like Italian and the Scandinavian languages, turned out t o be more permissive with respect to island violations. But even within English, violations of island conditions vary strongly in acceptability. Some of these differences, such as the subject-object asymmetry in Wh-island violations, were explained in terms of the ECP, but others led to many theories but little agreement a m o n g linguists. O n e of the controversial theories is the directionality theory based on Kayne (1983), which was briefly mentioned before. S o far, it is the only
The Invariant Core of Language
21
available theory that explains why Dutch has only stranding of postpositions (not of prepositions), and why parasitic gaps are practically lacking in Dutch. This theory also explains the sharp difference between English and Dutch with respect to violations of the CNPC. Thus, certain violations of this condition are reasonably acceptable in English: (37)
[Which race did you express [ n p a desire [to win t]]]
The trace is not bound within its minimal domain (expressed by the innermost brackets). So, it can only be bound in an extended domain, in this case the domain indicated by the outermost brackets. The domain extension is well-formed, because the governors of the dynasty all govern in the same direction: the three relevant governors, express, desire, and win, all govern to the right. This kind of directional agreement is required by the theory of Kayne (1983) and its offspring (like Bennis and Hoekstra (1984), Koster (1984b), and chapter 4 below). The Dutch equivalent of (37) is hopelessly ungrammatical: (38)
* [We ike race heb je [een verlangen [te t winnen] uitgedrukt]]
The explanation is straightforward: the N verlangen 'desire' governs to the right, but contrary to what we see in English, the two verbs govern to the left in an SOV language like Dutch. Since there is no dynasty of governors governing in the same direction, the domain extension is not well-formed. A theory based on directionality, though successful in many cases, does not work as an account for the variable acceptability of Wh-island violations, both within one language and across languages. For example, earlier attempts to explain the relative strictness of Wh-islands in Dutch dealt with examples like the following (Koster (1984b)): (39)
*Welk boek weet je [wie t gelezen heeft] which book know you who read has 'Which book do you know who read?'
This fact seemed to be explained by the directionality constraints, under the assumption that the matrix verb governs the clausal complement to the right, while the object in the embedded clause (indicated by the trace) is leftward-governed by the verb. This is in accordance with the fact that tensed complement clauses must occur to the right of the verb, while NPobjects must occur to the left. This explanation is incorrect, as pointed out by Koopman and Sportiche (1985), who have given relatively acceptable violations of Whislands in Dutch: (40)
Met welk mes weet je niet hoe je dit brood with which knife know you not how you this bread
22
Domains and
Dynasties
zou kunnen snijden could cut 'With which knife don't y o u know how you might cut this bread?' Relatively acceptable Wh-island violations can be found in Dutch after all, contrary t o the predictions made by the directionality theory. T h e fact that earlier studies claimed a stricter Wh-island behavior for Dutch than for English is probably due to two factors. First of all, Whisland violations in English are often milder with relative pronouns extracted from dependent questions: (41)
?This is the boy that I know who kissed
In Dutch, such sentences are distinctly worse: (42)
*Dit is de jongen die ik weet wie kuste
This contrast is probably due t o an independent factor, namely the fact that Dutch has so-called d-words (like die) in such cases, which are somewhat more difficult to extract, even in non-island contexts. Furthermore, Dutch has only a very limited supply of infinitival Whcomplements. In English, these are among the best examples of relatively acceptable Wh-island violations, while extractions from tensed clauses (like (39)) are often bad in both languages if subjects are crossed. Examples without Wh-subjects in C O M P lead to relatively mild violations in Dutch: (43)
a.
b.
?Welke boeken wil je weten aan wie hij which books want you know to whom he heeft? has 'Which books d o you want to know t o whom ?Aan wie wil je weten welke boeken hij to whom want you know which books he heeft? has 'To w h o m do you want t o know which books
gegeven given
he gave?' gegeven given
he gave?'
K o o p m a n and Sportiche claim a further contrast between examples like (43a) and (43b): extraction of a direct object is supposed to be worse (43a) than extraction of a subcategorized P P (43b). T o my ear, however, (43a) and (43b) hardly differ in acceptability. It is really not a contrast t o build a theory on. T h e directionality theory is of course also insufficient for contrasts within one language. In earlier work, I observed a contrast between the extractability of adjuncts and, for instance, direct objects on the basis of examples like the following (Koster (1978c, 195-198)):
The Invariant Core of Language (44)
a. b.
23
What don't you know how long to boil? *How long don't you know what to boil?
Huang (1982) sought to relate such differences between the extractability of complements and adjuncts to the ECP: complements are properly governed (in the sense of the ECP), while adjuncts are not. Koopman and Sportiche (1985) further developed this type of theory by stipulating that long extraction across Wh-islands is possible if and only if the long-moved Wh-element comes from a 0-position. An alternative theory has been developed by Hans Obenauer (1984, based on work presented in 1982) and Guglielmo Cinque (1984). According to this theory, extraction beyond the domains defined by Subjacency always involves pro. Since only N P s (and certain designated PPs) have the feature + pro, only these elements can be extracted from Wh-islands. This theory also explains the poor extractability of adjuncts in cases like (44b). In spite of success in cases like this one, neither the H u a n g - K o o p m a n Sportiche theory nor the Cinque-Obenauer theory explains all facts. The former theory, for instance, does not explain Adriana Belletti's observation that extraction of thematic P P s from certain islands is much worse than extraction of NPs: (45)
*With whom did you express [a desire [to talk t]]
For the Cinque-Obenauer approach, such facts and many others (see Koster (1984b)) are unproblematic, because there is no overt pro-form corresponding to the P P s in question. The Cinque-Obenauer theory, on the other hand, does not account for the relative acceptability of (43b). This fact cannot be accounted for by Subjacency, as suggested for similar facts in Spanish by Obenauer (1984). Subjacency would have to be formulated with S' as bounding node for Dutch. But apart from all the other problems with Subjacency (some of which have been mentioned above), this solution would not account for the fact that the following sentence is still relatively acceptable in Dutch: (46)
?Aan wie wil je weten welke boeken hij zegt to whom want you know which books he says [s' dat hij gegeven heeft]] that he given has 'To whom do you want to know which books he says that he has given?'
This sentence is (43b) with one embedding added. The fronted P P comes from the most deeply embedded clause. Therefore, it has to pass two S's, which is a violation of Subjacency in the intended sense. And yet (46) is hardly less acceptable than (43b). Subjacency, in other words, cannot be
24
Domains and Dynasties
the factor that governs the extractability of PPs from islands in these cases. Summarizing, we have the following situation. Many facts, such as the nature of P-stranding in Dutch, the near absence of parasitic gaps in German and Dutch, and the strong contrast between English and Dutch with respect to the CNPC, can only be accounted for at the moment by a theory that incorporates Kayne's directionality constraints in some form. The nonextractability of adjuncts follows from the Huang theory and its further development by Koopman and Sportiche (1985). It also follows from the Cinque-Obenauer theory. The latter theory has the advantage that it also explains Adriana Belletti's observation of the nonextractability of complement PPs in almost all cases, other than (43b) or (46). At least for this reason, the Cinque-Obenauer theory must be accepted as an important supplement to a Kayne-type directionality theory (along with the qualifications made in chapter 4 below, in my opinion). The Koopman-Sportiche theory has one advantage, however. It is the only theory that does not exclude (46). As we have seen, both the application of the directionality theory to this type of example and the Cinque-Obenauer theory wrongly exclude (46). The question, then, is whether we can save this advantage of the Koopman-Sportiche theory in some form. In fact, examples like (43b) and (46) were given a special status in Koster (1984b) in a discussion of similar examples from Italian. In one of the well-known examples from Rizzi (1978), a P P is extracted from a Whisland: (47)
Tuo fratello, your brother raccontato t, told
a cui mi domando che storie abbiano to whom I wonder which stories they have era molto preoccupato was very troubled
Like (46), this example is incompatible with the Cinque-Obenauer theory as interpreted in Koster (1984b). For this reason, I introduced an extra condition, the Extended Bounding Condition, for examples like (47). According to this condition, the unmarked domain (27) is stretched if there is a dynasty of only Vs. Contrary to the directionality-governed dynasty, which only allows extraction of NPs ( = pro), this V-dynasty would allow Wh-fronting of all categories, just like in the unmarked domain (Wh-movement within a single clause). This view has the consequence that Italian counterparts of examples like (46) are predicted to be relatively acceptable, contrary to what the Subjacency account of Rizzi (1978) suggests. To my knowledge, this prediction is borne out. In spite of this, some other data from Koopman and Sportiche (1985) suggest that this formulation (in terms of the Extended Bounding Condition) is too permissive: the account permits extraction of categories of all types (including adjuncts) in domains determined by a pure V-
The Invariant
Core of Language
25
dynasty. Adjuncts, however, cannot be extracted from Wh-islands within the domains in question: (48)
*Waarom wil je weten [wat hij t gelezen heeft] why want you know what he read has 'Why do you want to know what he read tT
It appears that the Koopman-Sportiche generalization is exactly right for extended domains with pure V-dynasties: in those domains only 0-marked categories (NPs or PPs) can be extracted. But as soon as we have dynasties with mixed categories, for instance N and V as in the CNPC, directionality constraints become relevant and only N P s can be extracted (in accordance with the Cinque-Obenauer approach). Both the Huang-Koopman-Sportiche approach and the CinqueObenauer approach, then, are right, be it that they concern slightly different domains. All in all, we have a three-way distinction for Wh-movement, one for the unmarked case (49a), and two for the marked case (49b and c), depending on the nature of the dynasty: (49)
a. b. c.
all categories movable within basic domain (27) (no dynasty) only complements movable in a domain defined by a dynasty of Vs (no directionality) elsewhere: only N P s moved if there is a dynasty of equally oriented governors
The contrast between (49b) and (49c) is not entirely unexpected: quite generally, the acceptability of extractions from islands is a function of the uniformity and simplicity of dynasties. 10 The most important conclusion, however, is that the extraction facts from many languages confirm the reality of the (unmarked) Bounding Condition (27). T o the best of my knowledge, the Bounding Condition defines the only domain (in all languages with Wh-movement) in which categories of all types can be moved to C O M P . Domain extensions (which lead to Wh-island violations) are only possible under very limited conditions that can be met in some languages but not in others, depending on the fixing of certain parameters. A domain extension can be recognized not only by its dynasty conditions, but also by strict limitations on the type of category that can be moved to C O M P .
1.4. Conclusion In recent years, much attention has been paid to parametrized theories of grammar. O n the one hand, this has given linguistic theory the necessary flexibility, but on the other hand, it has led t o a rather unconstrained use
26
Domains and
Dynasties
of parameters. This is somewhat reminiscent of the earlier unconstrained use of features. Like a theory of features, a theory of parameters must be constrained: it can only contribute to explanatory adequacy, beyond the mere description of differences among languages, if it indicates where parameters play a role and where not. A tentative effort towards this goal is the hypothesis of the previous section that parameters d o not play a role in the unmarked core of grammar, but only as switches between this core and the marked periphery. The most i m p o r t a n t conclusion, however, is that there is an invariant core of language after all, in spite of the obvious need for parameters at some point in the theory. This invariant core is a configurational matrix, characterized by the four properties listed in (13), which plays a role in almost all local dependencies in (presumably) all languages. A crucial feature of (13) is that it incorporates a universal locality principle, the Bounding Condition (27), that is believed to hold for all constructions mentioned under (30). This locality principle is in a sense the minimally necessary locality principle for all languages in that it defines domains similar to the maximal projections of X-bar theory. Abstracting away from the dominance/nondominance distinction, we concluded that an obvious generalization can be made: the notion "maximal projection" not only defines the domain for vertical dependency relations, it also defines the unmarked domain for all other local dependency relations. U n d e r the crucial assumption that S' (rather than VP) can be the minimal domain of V, the unmarked locality principle (27) characterizes many of the constructions in (30) without further problems. T h e real challenge for the hypothesis of a universal unmarked locality principle comes f r o m the fact that many constructions, particularly control, bound anaphora, and movement constructions, seem to require a domain definition that somehow deviates from the Bounding Condition. Control, for instance, seems t o allow long distance dependency, and more generally, seems to involve principles of argument structure rather than a purely configurational theory. I have tried to show, however, that a well-defined subclass of control s t r u c t u r e s — namely, obligatory control in the sense of Williams (1980) — has exactly the properties in (13), including the Bounding Condition (27) (see chapter 3 for further details). T h e biggest problem has been the unification of bound a n a p h o r a and "move alpha" in terms of the Bounding Condition. T h e domain statement for bound anaphora, principle A of the binding theory of Chomsky (1981b, ch. 3), deviates from the Bounding Condition in that the minimal relevant X m a x must contain a SUBJECT (in the sense defined in Chomsky (1981b)). An even greater discrepancy exists between the Bounding Condition and the standard locality principle for "move alpha", i.e. Subjacency. Contrary to the Bounding Condition, Subjacency does not specify one, but t w o nodes of type X m a x (traditionally N P and S' (or S)). In short, both b o u n d anaphora and movement seem to require domains
The Invariant Core of Language
27
larger than the one specified by the Bounding Condition. The idea that bigger domains must be defined was reinforced by the study of long distance anaphora in languages like Icelandic (and from a different perspective, Japanese) and by reports concerning languages with permissive island behavior, like Romance and Scandinavian. It is fairly obvious now, I believe, that in many languages with phenomena that seem to require more extended domains, the minimal domain defined by the Bounding Condition (27) can still be detected somehow. In languages with long distance anaphora, different things often happen in the minimal domain. In Dutch, for instance, the two reflexives zich and zichzelf are usually in complementary distribution, but they are bound in the same way in the only minimal domain in which they can have an antecedent, namely the domain of V ( = S'). As we saw in section 1.1, this domain is specified by the Bounding Condition (without reference to the notion subject). The notion subject only appears to play a role if the anaphors in question are not bound in their minimal X m a x : zichzelf must be bound in the domain of a subject (like English himself), while zich must be free in the minimal domain containing a subject. Similarly, clitics are usually bound in their minimal governing X m a x and cannot be bound across major phrase nodes. Again, the domain for these clitics can be defined by the Bounding Condition, without reference to the notion subject. The facts from Dutch suggest that the notion subject does not play a role in the basic domain, but only in an extended domain, which is not universal, as shown by the clitics in many languages. In short, bound anaphors are universally bound within their minimal X m a x . Outside this minimal domain, anaphors are bound in the minimal subject domain, free in the minimal subject domain, or not bound at all. In comparing various languages, we observe that notions like subject, INFL, or C O M P do not define basic domains, but only play a role as domain stretchers. Domain stretching is a marked option in this view. Another method of domain stretching, necessary for long distance anaphora and long movement, is based on the dynasty concept. According to this idea, a domain can be stretched if the governors in the path from dependent element to antecedent agree in some fashion (see chapter 6 for further details). "Move alpha" is the most important case, because its alleged deviant properties have always played a role in the defense of the traditional derivational perspective on grammar. "Move alpha" defines the mapping between various levels of representation. If the properties of "move alpha" cannot be defined, one argument for a particular multilevel approach collapses. 11 As we have seen, Subjacency is the only relevant distinguishing property of "move alpha". If "move alpha" is not characterized by Subjacency, but by the universal Bounding Condition, it loses its distinct character.
28
Domains and
Dynasties
The evidence that "move alpha" is not characterized by Subjacency but by the Bounding Condition is very strong in my opinion. Even in English, the Bounding Condition — simpler than Subjacency — suffices for almost all contexts. The only exception is a certain class of postverbal extractions. But this context is clearly irrelevant because, on the one hand, Subjacency is both t o o weak and too strong for this context, and on the other hand, in many languages (Dutch, for instance) extraction in this context, just as in the other contexts, is perfectly characterized by the Bounding Condition (see Koster (1978c)). The peculiar permissiveness of movement from postverbal contexts in English and a few other languages derives from the possibility of preposition stranding, together with the uniform direction from which the successive projections from trace to antecedent are governed. T h a n k s to some independent structural features of English, this language allows for a domain extension in the very limited context in question, an extension determined by dynasties of uniformly oriented governors. Strong evidence for the Bounding Condition has come from the study of Wh-island violations in recent years. These violations differ much in strength, depending on the nature of the Wh-category moved t o C O M P . The relevant fact here is that in the domain defined by the Bounding Condition all categories (including adjuncts) can be moved to C O M P , while there are severe limitations both on the type of category moved and on the dynasty conditions if a Wh-element is moved to C O M P in an extended domain. The Bounding Condition, in other words, defines the domain in which all categories can be moved to C O M P , relatively free of further conditions. This distinction between the unmarked domain and the extended domain can be observed in most (perhaps all) languages studied from this perspective, even in Italian, as shown by H u a n g (1982) (see chapters 4 a n d 5 for further details). If all this is correct, the theory of the configurational matrix (which includes the Bounding Condition) is a step in the direction of a unified theory of grammatical dependency relations. T h e theory is not only universal in the sense that it applies to all languages, it is also universal in the sense that it applies to all constructions of a certain type. The hypothesis that the core properties of grammar are constructionindependent, I will refer to as the Thesis of Radical Autonomy (see chapter 7). Needless t o say, a theory with this scope is highly abstract. But the promising aspect of it is that in spite of this degree of abstractness, it makes very concrete predictions about a large number of constructions. It determines the locality properties of constructions as diverse as subcategorization, bound anaphora, control, and gapping. In the chapters that follow, I will demonstrate the reality of the configurational matrix in X-bar structures (chapter 2), control structures (chapter 3), structures involving Wh-movement (chapter 4) a n d N P movement (chapter 5), and also in bound anaphora (chapter 6). If the
The Invariant
Core of Language
29
configurational matrix can be detected in all these different constructions, the Thesis of Radical Autonomy is confirmed, which ultimately entails that core grammar is not functionally determined but rather based on mental structures without an inherent meaning or purpose (chapter 7).
NOTES 1. Chomsky (1984). 2. See Sportiche (1983) for a lucid development of this idea. 3. See Chomsky (1981b). 4. See Gazdar (1982), for example. 5. See Bouchard (1984) for the fundamental similarities between empty categories and lexical anaphors in this respect 6. I am assuming throughout this book that S' (rather than VP) is the minimal X m a x for V. This assumption is at variance with the usual assumption that the maximal projection of V is VP, and that I N F L and/or C O M P are the heads of new projections. I have never been quite convinced by this assumption, however. It might be useful to make a distinction between lexical projections (based on the categories V, N, P, and A) and auxiliary projections (based on Q, C O M P , and INFL). For some purposes, then, S' might be the minimal domain for V (i.e. VP plus its auxiliary projections based on I N F L and COMP), and for others VP might be the relevant domain (i.e. the lexical projection without its auxiliaries). Whatever the ultimate truth in this respect, it seems to me that S' often replaces VP as the minimal domain of V. 7. For earlier accounts, see for instance Koster (1982b) and (1984a). 8. Thus, the binding theory for English has the following form; A bound anaphor must be bound in: (i) its minimal X m " , or elsewhere: (ii) in its minimal SUBJECT domain The first part, (i), is the universal Bounding Condition. The second part, (ii), is the languageparticular extension for English. The status of (ii) can be derived from the fact that it is either lacking in other languages, or is a dimension of contrast, as we saw in section 1.1 for the Dutch reflexives. 9. The following discussion of gapping is from Koster (1984c), where these and other facts are somewhat more extensively discussed. 10. See Koster (1984b), for example. 11. It should be noted that I am not arguing against multilevel theories in general. Apart from S-structure (with its "D-structure" and " L F ' properties), I am assuming LS (lexical structure) and PF. The mapping among these levels, however, does not have the properties of "move alpha".
Chapter 2
Levels of Representation
2.1. Introduction T h e construction of levels of representation, like deep and surface structure, connected by movement transformations is the standard solution to a certain reconstruction problem. Thus, there are idiomatic expressions like to make headway, in which the idiomatic connection requires the adjacency of the verb make and the N P headway. Assuming that adjacency is a necessary condition for idiomatic interpretation, the following type of example, in which the idiomatic elements are "scattered", poses the classical problem; (1)
Headway
seems to be made
Since the necessary adjacency is lost here, it must be somehow reconstructed. Deep structure was the answer there must be an underlying level at which make and headway are literally adjacent: (2)
seems to be made headway
T h e surface structure (1) is derived from the deep structure representation (2) by what is now called "move alpha". This solution was generalized to most situations in which a strictly locally defined relation must be reconstructed. Another example is subjectverb agreement: (3)
a. b.
Mary thinks that the boys have lost T h e boys think that Mary has lost
The n u m b e r of the finite verb (have vs. has) is determined by the number of the subject that immediately precedes it. As in the idiom example, an element of the agreement relation (the subject in this case) can be indefinitely far away from the verb: (4)
Which boys d o you think that Bill said that Mary thinks have lost
Since it is entirely obvious that number agreement depends on the local 31
32
Domains
and
Dynasties
subject of a verb, and since t h e relevant subject which boys is n o t o c c u p y i n g the relevant local position, it is again r e a s o n a b l e t o r e c o n s t r u c t t h e d e e p s t r u c t u r e in which the subject a n d the v e r b are adjacent: (5)
d o y o u t h i n k t h a t Bill said t h a t M a r y t h i n k s which boys have lost
T h e s e examples, t o which m a n y o t h e r s could b e added, illustrate o n e of t h e f u n d a m e n t a l p r o b l e m s that t r a n s f o r m a t i o n a l - g e n e r a t i v e g r a m m a r h a s s o u g h t t o solve. T h e s t a n d a r d solution, c o n s t r u c t i n g a level of deep structure, seems very natural. I n fact, it seems t o b e the only r e a s o n a b l e solution in a f r a m e w o r k w i t h o u t traces. T h e s t a n d a r d solution t o the r e c o n s t r u c t i o n p r o b l e m h a s been u n d e r m i n e d by t w o developments. First, it was s h o w n t h a t the p r o p o s e d solution w a s n o t sufficiently general in t h a t t h e r e were similar cases t h a t could not be solved by p o s t u l a t i n g a level of deep structure. Secondly, trace t h e o r y c a m e t o t h e fore, w h i c h suggested w h a t in my o p i n i o n is a m o r e p r o m i s i n g alternative. T o illustrate the first point, c o n s i d e r binding of t h e a n a p h o r himself. L i k e i d i o m i n t e r p r e t a t i o n and n u m b e r agreement, a n a p h o r b i n d i n g is a local relation: (6)
J o h n t h i n k s t h a t the boy a d m i r e s himself
B o t h a n t e c e d e n t a n d reflexive e n t e r i n t o the b i n d i n g relation if they are within t h e same local domain. A s before, the a n t e c e d e n t can be m o v e d f r o m the necessary local position: (7)
Which boy d o e s J o h n think a d m i r e s himself
As before, it is clear t h a t the local p a t t e r n can be restored by reconstructing t h e a n t e c e d e n t position of which boy: (8)
d o e s J o h n think which boy a d m i r e s himself
It is also possible t o r e o r d e r the reflexive instead of the a n t e c e d e n t (9)
a. b.
Himself I d o n ' t think he really likes W h a t he really likes is himself
It is my claim t h a t in these cases t h e s t a n d a r d solution does n o t work. N e i t h e r in the case of topicalization (9a), n o r in t h e case of pseudo-cleft (9b) is it possible t o literally reconstruct himself in the local d o m a i n of the a n t e c e d e n t (the object position of like). I will r e t u r n t o topicalization in w h a t follows. Here, I will briefly illustrate this p o i n t with t h e pseudo-cleft c o n s t r u c t i o n . In a c c o r d a n c e with t h e s t a n d a r d s o l u t i o n t o the r e c o n s t r u c t i o n p r o b l e m , it was originally
Levels of
Representation
33
thought that the deep structure of (9b) literally has himself in the object position of the verb: (10)
[NP it [s' he really likes himself ] ] is —
Deriving (9b) from (10) is not easy. Himself has to be moved to the postcopular position indicated by — , and it must be replaced by what (see Chomsky (1970, 209) for a solution along these lines). This way of deriving pseudo-cleft sentences has been universally abandoned. Roger Higgins (1973) convincingly demonstrated that it does not work. In present terms, the movement of himself is impossible because it violates Subjacency. It would also violate the 0-criterion because himself, an argument, would fill a 9-position at D-structure which is filled by the variable (also an argument) bound by what at surface structure. Last but not least, the binding theory that relates himself to its antecedent does apply at S-structure (see Chomsky (1981b)), so that himself can only indirectly be linked to its antecedent. In short, (9b) is a clear example in which a local relation, the antecedent-reflexive relation, cannot be reconstructed in the standard way by stipulating that there is a deep structure like (10). Apparently, local relations may be reconstructed in a weaker way, namely by the mediating properties of anaphors. In the copular predication (9b), the reflexive himself is interpreted as the value of the pronominal what, which in turn binds a trace at the position where the antecedent-reflexive relation is normally locally determined. The consequences of the fact that the reconstruction problem cannot be solved by standard means in (9b) should not be underestimated. In fact, we can interpret (9b) as a counterexample to the standard approach if the latter is taken to have the following content: local relations can only be satisfied by elements in situ, i.e. by elements that literally occupy the positions involved in the local relations. It seems to me that this is one of the core ideas of the standard level approach; (9b) shows that the standard approach is untenable as a general solution to the reconstruction problem. A somewhat weaker principle is in order. Suppose that local relations are defined for a local domain p. We then need a principle like: (11)
A dependent element 8 and an antecedent a satisfy a local relation in a domain p if a and 5 are in domain P, or if a or 6 are respectively related to a' or 5' in |t.
The standard approach requires "being in" a certain position; the revised approach (11), necessary in view of examples like (9b), says that "being in" the relevant positions is fine, but "being related" to the positions in question is sufficient It is now clear why (11), in conjunction with trace theory, potentially undermines the standard approach. In a theory with traces, the Sstructures of (1) and (4) are (12a) and (12b), respectively:
34 (12)
Domains and Dynasties a b.
Headway seems to be made t Which boys do you think that Bill said that Mary thinks t have lost
Headway is interpreted idiomatically if it is in the object position of make, but according to (11) it is also so interpreted if it is related to an element in the object position of make. The trace t in (12) is precisely the "anchor" element to which headway can be related. Similarly, which boys in (12b) is linked to an element t in the relevant local domain, so that which boys satisfies the locally defined agreement relation. Given the necessity of (11), trace theory is not a complement to the standard approach, but an alternative to it: with traces represented at Sstructure, it is not necessary to have a separate level of D-structure. In a sense, deep structure does not disappear, because its relevant aspects are now coded into S-structure. Chomsky (1973, sect. 17) realized that trace theory suggested the alternative just mentioned, but has never accepted it as the better theory. Since many of the standard arguments lose their force under the assumptions of trace theory, the motivation for a separate level of D-structure, related to S-structure by "move alpha", must be sought elsewhere. In principle, there are two ways to justify D-structure plus movement: either to show that there are properties that are naturally stated only at Dstructure (and not at S-structure), or to demonstrate that "move alpha" has properties that cannot be identified as the properties of rules of construal at S-structure. Note that the second type of argumentation is indirect and weak in principle. T h e only point of this type of argumentation is that "move alpha" can be reformulated as a rule of construal at Sstructure, but that such restatements are unsuccessful if the rules of construal still have the properties of "move alpha", which are distinct from the properties of other construal rules. The theory without "move alpha" would be a notational variant of the two-level theory, at best (Chomsky (1981b)). If "move alpha" has distinct, irreducible properties, the derivational perspective is not really well established, because it is clear that different rules of construal can have different properties at S-structure. Thus, the alleged unique properties of "move alpha" give circumstantial evidence for a derivational approach, at best. If it can be shown, however, that there are no unique principles applying to "move alpha" and not to other rules of construal, a much stronger point can be made: "move alpha" becomes entirely superfluous. This is one of the central theses of this book: the (unmarked) configurational core of "move alpha" can also be found in a subclass of control structures, in bound anaphora constructions, and in many other constructions. In short, my argument against "move alpha" is essentially an argument of conceptual economy. I agree with Chomsky (1981b, 92) that there is n o argument based on conceptual economy if the properties
Levels of
Representation
35
of "move alpha" are not shared by other rules of construal. But I will show that there is much evidence that there is a common core in "move alpha" and the rules of construal. One of the redundancies of the current G B approach is that it has two indexing procedures: free indexing for construal, and indexing by application of "move alpha". By generating S-structures directly, we can do with only one procedure, namely free indexing. The configurational matrix discussed in chapter 1 can be seen as a definition of possible coindexing configurations: coindexing is only permitted between a dependent element 6 and a unique antecedent a within a local domain (J. As we briefly indicated in chapter 1, coindexing can be interpreted in one and only one way: (13)
share property
This mode of interpretation is sufficient for both the antecedent-trace relation and the antecedent-anaphor relation. It is the central interpretive rule of grammar that these two forms of coindexing share with several other relations. Properties are only optionally shared. A category can derive properties from another category only if it does not yet have the properties in question. This is determined by the uniqueness property of the configurational matrix. Thus, an N P can only share the lexical content of another N P if it does not have a lexical content of its own. Similarly, 6-roles and referential indices can only be borrowed by categories that d o not have a 9-role or a referential index of their own. Some examples may illustrate this: (14)
a. b.
Johni saw himself Johnj s a w Bill j
Suppose that all N P s in a tree except anaphors have an inherent referential index. Suppose furthermore that the indices in (14) do not indicate intended coreference but accessibility of rule (13) for the two elements in question. Then, Bill in (14b) cannot share a referential index with John by (13), because this would violate the uniqueness property: an N P can have one and only one referential index. As a consequence, John and Bill must have a different referential index in (14b), which is ultimately interpreted as "disjoint reference". An anaphor like himself, however, does not have an inherent referential index. This might be seen as the definition of the notion "anaphor". But since all N P s of a certain type must have a referential index, himself must borrow the index from its possible antecedent John, which is brought about by (13). Compare now (15a) with (15b): (15)
a. b.
Johnx was arrested (j Johnx saw himself
36
Domains and Dynasties
Again, we have two coindexed N P s in a local relation permitted by the configurational matrix. Again, then, whatever properties are lacking from one of the NPs can be transferred by (13). In the first case, (15a), a 0-role must be transferred. Since John stands in the proper relation to its trace t, it can borrow a 0-role from the trace by (13). Nothing blocks this transfer, because John is not in a position where it is assigned another 9-role. In (15b), we find two NPs that meet the same configurational criteria, but here it is not possible to transfer a 0-role from himself to John. The optional rule (13) allows this transfer, but the result would be filtered out by the uniqueness property (usually referred to as the 0-criterion): since John already has a 0-role it cannot share another 9-role with an element coindexed with it. In short, optional property transfer (13) in conjunction with independent principles like the uniqueness condition not only gives the results of the construal rules, but also the results of "move alpha". It should be said at the outset that I am not claiming that we find the same relation in (15a) and (15b). There is an obvious difference between the antecedenttrace relation found in (15a) and the antecedent-anaphor relation found in (15b). What I am claiming is something different: both (15a) and (15b) involve the same interpretive rule with the same configurational properties, namely (13). The result of this rule is different in these two cases because of independent factors, namely, the fact that John in (15b) already has a 0-role, while in (15a) John is in a non-0-position. But clearly, this difference has nothing to do with the interpretive rule involved, which is (13) in both cases. What I am advocating here, in other words, is a more modular approach to the two different relations in (15a) and (15b), respectively: one interpretive rule together with two different antecedents (0 versus non-0) yields two different relations. The alternative approach sketched here gives a unified account of the common core of "move alpha" and other rules of construal. It not only accounts for the classical cases discussed at the beginning of this chapter but also for the problematic (9b), which was beyond the scope of "move alpha". Let us briefly consider, then, how these cases are accounted for. Take the S-structure representation of (1): (16)
Headway¡ seems to be made r¡
The relevant idiomatic interpretation is forced upon this structure if the complement of made has the lexical content headway. Since the trace t¡ does not have inherent lexical properties, they must be borrowed elsewhere. The trace and its antecedent headway meet the conditions of the configurational matrix, so that (13) applies. This entails that r¡ has the required lexical properties, which it shares with its antecedent Thanks to (13), this result can be derived without reconstruction of a level of Dstructure in which headway actually occupies the position of the trace. Similar considerations hold for the agreement fact (12b), repeated here
Levels of
Representation
37
for convenience: (17)
Which boysi do you think that Bill said that Mary thinks tj have lost
The agreement relation requires the feature "plural" on the trace t ¡. Traces never have such properties inherently, but thanks to (13) the feature can be borrowed from the antecedent which boys, which is inherently plural. Let us now have a closer look at the various levels that have been proposed in the literature: (18)
a. b. c. d. e.
D-structure (Chomsky (1981b)) NP-structure (Van Riemsdijk and Williams (1981)) S-structure (Chomsky (1981b)) Logical Form (Chomsky (1981b)) surface structure (Chomsky (1981b))
There is some consensus about the idea that S-structure is the most fundamental level of syntactic representation. Given the strong and growing evidence for empty categories with their distinct properties, the existence of this abstract level seems well established. Naturally, surface structure is then also relatively unproblematic. It differs from S-structure by certain marginal deletions, and perhaps by certain stylistic rules. All the other levels are highly problematic. They are interrelated by "move alpha", a ghost device the properties of which have never been successfully identified. This can be seen by inspecting the properties of traces, the products of "move alpha". Chomsky (1981b, 56) gives the following distinguishing properties: (19)
a. b. c.
trace is governed the antecedent of trace is not in a 0-position the antecedent-trace relation satisfies the Subjacency condition
None of these properties distinguishes traces from other things. Not only traces but also lexical anaphors are governed. There is strong evidence that P R O can be governed in a subclass of control structures (Koster (1984a) and chapter 3 below); pro (Chomsky (1982a)) is also governed, as a subject in pro-drop languages and also as a resumptive pronoun (chapter 4 below). The second property (19b) is shared by trace and overt resumptive pronouns. It is an error to consider this property the property of a rule ("move alpha"). It is clearly an independent property of certain antecedents. The fact that the subject position of verbs like seem and the subject position of passive constructions is non-9, has nothing to do with "move alpha". The subject positions in question have the same properties without "move alpha", as is clear from structures like it seems that... and 1 from passives like it is said that It is very unfortunate that an
38
Domains and Dynasties
independent property of the antecedent is confused with a property of the rule itself; as if the fact that anaphors can have plural or singular antecedents entails that there are two entirely different rules of bound anaphora. The third property (19c) is the only substantial property that has been attributed to "move alpha". It is one of the main theses of this book that Subjacency is not a distinguishing property either. The gaps that we find in movement constructions appear to be divided into two classes with entirely different properties. The dividing line is not Subjacency, but the Bounding Condition, which also characterizes locality in many other constructions (chapter 4 below). In other words, there is no rule with the properties of (19). Of course, there are relations with these properties. But these relations are not primitive; they are modularly built up from independent elements, such as the properties of antecedent positions, and the all-purpose propertysharing rule (13). This latter rule has the properties of the configurational matrix, which has nothing in particular to do with movement constructions. If "move alpha" is an artefact, it is hard to imagine what else could justify levels like D-structure, NP-structure, or LF. Apart from "move alpha", the standard approach is to isolate properties that can only be naturally stated at one level or another. But as noted before, such arguments are weak in principle because the relevant aspects of D- or NPstructure are represented at S-structure as subparts. Arguments for levels come down, then, to the idea that subparts of S-structure can be distinguished which have their own properties. This conclusion seems hardly controversial. Let us nevertheless have a closer look at the properties that are supposed to characterize the various levels. 2.2. D-structure It is not easy to find out exactly what D-structure is. In Chomsky (1981b, 39) we find the following characterization: (20)
D-structure lacks the antecedent-trace relation entirely. At D-structure, then, each argument occupies a 6-position and each 6-position is occupied by an argument. In this sense, D-structure is a representation of 0-role assignment — though it has other properties as well, specifically, those that follow from X-bar theory and from parameters of the base (e.g. ordering of major constituents) in a particular language.
There are two aspects here: (i) D-structure has no traces, and, (») it is a pure representation of GF-0 (among other things). Note that these two aspects are independent of one another. In practice, D-structure is interpreted as a level without traces, but its significance is obviously based on the second aspect, i.e. its being a pure representation of GF-0. That the
Levels of
Representation
39
two aspects are not interrelated can be seen from an example like the following (21)
Whatj did he see t{!
If D-structure is defined as a level without traces, (21) is of course not a Dstructure, but if it is only defined as a level at which each argument occupies a 0-position, (21) does qualify as a potential D-structure. In G B theory, a Wh-trace is considered a variable, i.e. an argument. So, the representation (21) contains two 0-positions that are both filled by an argument (he and t, respectively). If the essence of D-structure is the pure representation of GF-6, movement to A'-positions is irrelevant: before and after the movement the A-chains have exactly one element, which is typical of D-structures. In practice, (21) is not interpreted as a D-structure, but this then depends on the extra stipulation that D-structure contains no traces, neither NP-traces nor Wh-traces. For Wh-traces, this has nothing t o do with the essence of D-structure (its being a pure representation of GF-6). If we drop the unmotivated stipulation, we can maintain the essence of Dstructure and consider (21) a D-structure (which falls together with its Sstructure, as in so many other cases). This is a welcome conclusion, because there are independent reasons to assume that Wh-phrases must be base-generated in C O M P in certain cases. This is so in languages with overt resumptive pronouns (which are very marginal in English). I will show below that English has empty resumptive pronouns that cannot be related to their Wh-antecedent in C O M P by "move alpha". So, the Whphrase in C O M P in (21) is in one of its possible base positions, and its trace is an argument with a function chain of one member, which is in accordance with the definition of D-structure. Since it is not possible to exclude (21) as a D-structure on the basis of the argument-6-role distribution, and since the Wh-phrase is also in a possible D-structure position, I see only one argument — apart from arbitrary stipulation — against its D-structure status: the properties of "move alpha". If "move alpha" is a condition on derivations with specific properties, and if the antecedent-trace relation in (21) has these properties, then (21) is not a plausible D-structure. Chomsky has argued recently, however, that there are reasons to consider the traditional characteristic of "move alpha", Subjacency, a property of S-structure (LF movement does not obey Subjacency; class lectures, fall 1983, and Chomsky (1986a)). But if Subjacency is a property of S-structure, there are no significant reasons left to deny D-structure status to structures with only Wh-traces. This is a fortiori true for the theory presented here, according to which "move alpha" has n o characteristic properties at all. We must conclude, then, that the D-structure/S-structure distinction is practically meaningless for the many constructions that only involve Whmovement (see Chomsky (1977) for the scope of this rule). If the D-
40
Domains and Dynasties
structure/S-structure distinction is significant at all, it must be based on NP-movement, because only this rule creates A-chains with more than one member. But here we meet other problems. If Wh-movement (for instance in (21)) exists, there must be a distinction between a category as a functional position in a structure and the lexical content of that category. This is clear from the fact that the alleged Dstructure of (21) has the Wh-phrase in the position of the argument, the trace: (22)
C O M P he PAST saw [ N P what];
The 0-role can be assigned to the object N P only in abstraction from its lexical content. The reason is that this lexical content is moved to COMP, where it does not have a 0-role (Chomsky (1981b, 115)). The 0-role is left behind at the now empty NP position (the trace). It is therefore not necessary for "move alpha" to carry along 0-roles. What (22) and (21) have in common from the point of view of the 0-criterion and the Projection Principle is that in both cases there is one 0-role assigned to one argument position, i.e. the object position. In (22), this position has lexical content, and in (21) the lexical content has been moved. What remains constant is the 9-role assigned to the NP position, which then has this 0-role in abstraction from its lexical content This is not what we see in the case of NP-movement: (23)
a. b.
N P was arrested [NP John]; John; was arrested [NP ij]
This case has been treated in different ways. One way is to assign the 0role to the N P John in (23a); when John is moved to the subject position, the 0-role is carried along. The 0-role is then not assigned to the object position, in abstraction from its lexical content, as in (22). This is hardly a fortunate result, because 0-role assignment would be more or less dependent on the content of NPs: if the N P contains a (quasi-) quantifier, the 0-role is assigned to the position (22), and if the N P contains a referential expression, the 0-role is assigned to that expression (i.e. not to the position but to the content of the position: (23a)). The problem can be circumvented by assigning 0-roles to chains, which is more or less standard now (see Chomsky 1982a)). But this is also problematic, because now John no longer has a 0-role itself in (23b). At S-structure, then, the only way to see whether the conditions of the 0criterion are met is by inspecting the chain. But this algorithm, which checks whether John is connected to a 0-position, practically mimics "move alpha". In short, both methods of transmitting a 0-role to a derived A-position lead to problems: either Wh-movement and NP-movement get a different treatment, or "move alpha" is duplicated. But even if these problems can
Levels of Representation
41
be solved, the biggest conceptual problem remains: the derived structure (23b) seems to contain two arguments, a name (John) and an anaphor (the NP-trace). GB theory explicitly states that anaphors are arguments, which is only reasonable (Chomsky (1981b, 35)). Since NP-traces are anaphors for the binding theory (Chomsky (1981b, ch. 3), a structure like (23b) contains two arguments. This is at variance with the 8-criterion and the Projection Principle, which require a one-to-one relation between 9-roles and arguments at all levels. In practice, therefore, NP-traces are supposed to be non-arguments in structures like (23b). This does not follow from the 0-criterion, which only entails that (23b) contains one argument, without telling which of the two NPs is the argument If not only names, but also all anaphors are arguments, (23b) is in fact ruled out by the 0-criterion, unless it is guaranteed somehow that some anaphors (NP-traces) are non-arguments. This must be done by stipulation: (24)
Anaphors are arguments unless they are non-0-bound in a nonCase-position
Even with this stipulation of the worst possible sort, the contradiction remains, because NP-traces must be arguments for binding purposes: (25)
a. b.
They^ seem [ij to like each other(] Theyi were confronted t, with each otherl
In both cases, each other is A-bound by a trace of NP-movement But if NP-traces can enter into a chain of coreference, they must be capable of some referential function themselves, and are therefore arguments by definition. There is also another reason to consider both they and its trace to be arguments in (25a). Both are followed by a VP; if the notion argument makes sense at all, it is reasonable to say that each N P in the predication relation par excellence, the [NP VP] relation, is an argument It seems to me that the ugly stipulation and the contradiction that we observed form strong counterevidence against the second part of the 0criterion (in bold type) (Chomsky (1981b, 36)): (26)
Each argument bears one and only one 0-role, and each 0-role is assigned to one and only one argument
If both the antecedent and the trace (after NP-movement) are arguments, we have one 0-role distributed over two arguments. This is a welcome conclusion, because, as we discussed in chapter 1, the configurational matrix requires a unique antecedent but not a unique dependent element In other words, the core relations of grammar are not biunique. But this fact throws a new light on the 8-criterion (26). As mentioned in chapter 1, licensing relations meet the conditions of the configurational
42
Domains and Dynasties
matrix. If this is the case, the first part of the 0-criterion need not be stipulated. It simply follows from the general uniqueness property of the configurational matrix: the 9-roles can depend on one and only one antecedent, the licensing governor in this case. This fact is completely analogous to what we observe for bound anaphors: they cannot have split antecedents: (27)
*John confronted Mary with themselves
A dependent element like a reflexive can receive only one referential index from one antecedent. Similarly, an argument can receive only one 9-role from one licensing category. But if the second part of the 9-criterion is false, the licensing relation is also in this respect like other core relations. Anaphors must have a unique antecedent, but a given antecedent can take more than one anaphor (28)
They talked with each other about each other
All in all, it appears that the theory of grammar is considerably simplified if we drop the second part of the 9-criterion. It is no longer necessary at all to stipulate the 0-criterion, if licensing is a core relation. Together with the empirical evidence given earlier, this forms very strong evidence for the idea that NP-traces are in fact arguments. Consider now a relevant example: (29)
John; seems [t, to go]
If this S-structure contains two arguments (to one 9-role), its D-structure, by the Projection Principle, also contains two arguments. But then it becomes senseless to postulate a D-structure which is different from its Sstructure for (29). For NP-movement, then, we come to the same conclusion as for Wh-movement: it does not make sense to remove traces from D-structure ( = S-structure). In other words, it does not make sense to distinguish D-structure from S-structure. We have now also located our main difference with the standard GB theory. According to the standard approach, the 9-criterion is a biuniqueness condition that states that the relation between 9-role assigners and arguments is one to one. According to the present approach, the relation between 9-role assigners and arguments is one to one or one to many. As we have seen, this leads to three disadvantages for the standard approach: (i) part of the 9-criterion has to be stipulated, (¡7) it must be stipulated that some anaphors are not arguments, (ii'i) this latter stipulation leads to a contradiction. I will now try to sketch the outlines of a theory without these three disadvantages. As already mentioned, the 9-criterion disappears, because its empirically relevant part follows from the general properties of core
Levels of
Representation
43
relations (in particular from the uniqueness property of the configurational matrix). Although it does not make sense to distinguish Dstructure from S-structure in the alternative theory, the Projection Principle still makes sense. This is so because the existence of Lexical Structure, distinct from S-structure, is not disputed. Thus, if a verb selects an object, this object must always be represented at S-structure. In structures with fronted Wh-objects then, the gap in object position must contain an empty category (a trace in the standard theory). Nevertheless, I would like to slightly modify the Projection Principle, or rather its scope. Much of the standard theory is inspired by the desire to define syntactic structure as a projection from the lexicon. This has not been entirely successful, because of the obligatoriness of subjects. This has led to the Extended Projection Principle: syntactic structures consist of projections from the lexicon plus subjects (Chomsky (1982a, 10)). These are also the 0-positions. In the same spirit, I would like to define the possible 9-positions (argument positions): (30)
0-roles are assigned by: a. b.
heads (for complements) (to direct 0-positions) predicates (for subjects) (to indirect 0-positions)
The first part (30a) is in accordance with the standard Projection Principle. The second part (30b) is an extension that goes slightly beyond the standard extension of the Projection Principle. The standard extension concerns subjects in the sense of Chomsky (1965), i.e. subjects defined as [ N P , S]. It seems to me that this is not sufficient, and that the extension must cover all subjects of subject-predicate relations in the sense of Williams (1980) and subsequent papers. According to this conception, a subject is an N P in the configuration [p N P XP], where X P stands for any maximal projection (including S'). The N P subject in this sense may receive a 0-role by indirect 0-marking (Chomsky (1981b, 38)), but also by binding an element in the predicate XP. Some possibilities are exemplified by (31): (31)
a. b. c.
John broke his arm John; [ y p seems [t; to g o ] ] John; [s- Oj [I don't really like t j ]
In all three cases, the argument John is followed by a predicate. In (31a), John receives a 0-role by indirect 0-marking in the usual sense. In (31b), John receives a 0-role by binding an open place in the following predicate. The 0-role of the open place is transmitted by the property-sharing rule (13).
It seems to me that the subject-predicate relation is the only extension we need: it is the only place where direct projection of 0-roles from the
44
Domains and Dynasties
lexicon fails. Ultimately, all 0-roles come from the lexicon, but they are only indirectly assigned to subjects. Since we gave up the one-to-one requirement between 0-roles and arguments, this indirect 0-marking by "property sharing" with another argument is unproblematic. Topicalized constructions like (31c) have always been very problematic for the standard approach. The open sentence is predicated over John in (31c) (Chomsky 1977)), so that John must be an argument according to any reasonable definition of this term. But if John is an argument, it must have a 9-role. Under the property-sharing approach, this is not a problem, because John is linked to the trace in (31c) by a construal chain. This trace, an argument, has a 9-role that may be shared by the other argument, John. A movement analysis, on the other hand, is impossible for topicalization. John would originate in the trace position, moved to COMP, and from there it would be lifted to the topic position by Vergnaud-raising (see Van Haaften et al. (1983)). But as I will argue below, Vergnaud-raising is impossible for topicalization. In Dutch, topicalization may look like English topicalization, but it may also involve a so-called d-word in C O M P position (Van Riemsdijk and Zwarts (1974), Koster (1978a)): (32)
Die man, die ken ik that man that know I
In this case, not only a 9-role is transferred, but also Case. In languages with rich overt Case-marking, like German, agreement in Case is normal (see Van Riemsdijk (1978)): (33)
Den Hans (acc.), den (acc.) mag ich nicht the John him like I not 'John, I don't like him'
This example shows once again that in general there is no one-to-one relation between antecedents and dependent elements. There is always a unique antecedent (the Case assigner in this example), but there may be more than one dependent element (Case-bearing NPs). The Dutch and German cases definitely do not involve Vergnaudraising, which would create the d-word with its Case ex nihilo (see also Cinque (1983a) and section 3 below for more arguments). So, here we have a crucial example: Case and 9-role assignment to the topic by movement is impossible, while the property-sharing rule may use the construal chain through the anaphoric d-words to transfer to the topic the licenses it needs. The examples with the d-words are particularly interesting because dwords do not usually link idiom chunks to their licensing position, as shown by Van Riemsdijk and Zwarts (1974):
Levels of (34)
Representation
45
a.
Ik geloof er de ballen van I believe there the balls of 'I don't believe any of it' b. *De ballen, dat/die geloof ik er van the balls that believe I there of
Usually, "move alpha" can transfer at least three things: a 9-role, Case, and lexical content. If we compare (32) and (33) to (34), we see a discrepancy: in the first two examples, it appears that d-words can transmit a 0-role and Case, but from (34) it is clear that lexical content cannot be transmitted. This difference does not come as a surprise. As I argued before, the property-sharing rule transmits whatever properties can be transmitted. Normally, the uniqueness condition works as a filter. Thus, Case cannot be transmitted to N P s that already have Case. Similarly, lexical content cannot be transmitted to an N P position that already has lexical content. Thus, the representation of (34b) is as follows: (35)
*De ballen; die; geloof ik t\ er van
A Case-marked trace must have a unique lexical content as antecedent (antecedents are always unique). Die in (35) qualifies as the lexical content of the trace, but then it is impossible for the idiomatic N P de ballen to also qualify as the lexical content of the trace position. Die; cannot be skipped, because according to the configurational matrix, an antecedent is obligatory within a local domain. The transfer of 0-roles and Case is unproblematic, however, in such cases. F o r those, the licensing element (the assigner) is the antecedent So, the trace tt in (35) has a unique antecedent within the local domain, the verb geloof. As noted before, the number of dependent elements is not constrained by a uniqueness condition, so that both the topic and the dword may depend on the assigner of Case and 9-role. So, the rule "share property" works selectively, since its scope is "filtered" by independent principles, such as the uniqueness property of the configurational matrix. This approach solves a paradox about easy-to-please constructions (Chomsky (1981b, 308-314)): (36)
Johnj [ V p is [AP easy [ O j [ P R O to please f;]]]]
John seems to be in a non-8-position because it can be replaced by it (it is easy to please John). Traditionally, it has also been assumed that John has its D-structure position in the trace position, from where it is moved to the matrix subject position (see Lasnik and Fiengo (1974), however, for a deletion approach, and also Chomsky (1977) for a similar approach). A movement analysis for (36) leads to a paradox, as noted by Chomsky (1981b, 309). The problem is that idiom chunks cannot be moved to the
46
Domains and
Dynasties
matrix subject position, as one might expect under a movement analysis: (37)
a. b.
*Good carej is hard to take t\ of the orphans * T o o much; is hard to make i; of that suggestion
It seems t o me that this paradox cannot be solved under the standard assumptions. Chomsky (1981) assumes that the examples in (37) show that a movement analysis is not possible. I agree, but it must be concluded then that the standard assumptions are seriously undermined, because the standard approach crucially assumes that 0-roles are assigned directly, and not by linking. Moreover, Chomsky (1981b, 313) observes that a nonmovement analysis creates a new problem. If John is inserted in D-structure, the Projection Principle requires that its position be a 0-position, which it is not. Chomsky therefore weakens the assumptions about lexical insertion by assuming that John is inserted in S-structure in (36) (while such names are inserted in D-structure elsewhere). This is even interpreted as an argument in favor of D-structure, because the solution of the paradox crucially involves the distinction between S-structure and D-structure (Chomsky (1981b, 346, point (e))). It seems reasonable, however, to interpret the paradox as an argument against D-structure and the standard assumptions. Clearly, John is an argument in (36), which must receive its 0-role directly, if the standard assumptions are correct F o r the alternative approach, however, (36) is unproblematic. John is inserted at S-structure like all other lexical items (the simplest theory) and it may receive a 0-role because it is a subject. Particularly, it must receive a 0-role from its predicate according to (30b). Since there is a construal chain (indicated by the indices in (36)), this 0-role may be shared with the trace coindexed with it, a trace within the predicate as required. As we saw in the Dutch case, idiomatic lexical content is not necessarily transferred in construal chains. It is only transferred if the chain does not contain other lexical material. It is reasonable, however, to assume that the operator O; in (36) has features. Intermediate links in C O M P d o not necessarily have content, but a C O M P - t o - C O M P chain always ends in an operator position, usually marked by the feature + WH (see Chomsky (1977)). It seems appropriate to assume, then, that the feature that makes a C O M P position an operator is also present if the operator is not phonetically realized, as in (36). We can also consider these lexical features of the operator position the realization of the Case assigned to the trace. Under the alternative theory, there is nothing paradoxical about (36). There is a construal chain as indicated, and property sharing is filtered by the uniqueness condition as usual. A 0-role is transferred to John, because it is not in a direct 0-position. Case is not transmitted, however, because John is already in a Case position. Similarly, lexical content is not transmitted, because the lexical content of the trace position is already
Levels of
Representation
47
satisfied by the features of the operator position. But since lexical content is not transmitted, idiom chunks cannot appear in the matrix subject position, as shown by (37). I will now give a brief review of all arguments in favor of D-structure that can be found in Chomsky (1981b), and that are summarized there on page 346. There is some consensus that S-structure is the basic level of syntactic representation. Chomsky notes that the arguments for Dstructure (as a level distinct from S-structure) are "highly theory-internal". In particular, " [ t ] h e existence of a level of D-structure, as distinct from Sstructure, is supported by principles and arguments that are based on or refer to specific properties of this level, which is related to S-structure by the rule Move-a." The arguments in which D-structure plays a role are summarized as follows (page numbers of Chomsky (1981b) added): (38)
a. b. c. d.
e.
asymmetric properties of idioms (ch. 2, note 94) movement only to non-9-position ( . . . and discussion . . . of the distinction between NP-trace and PRO) (p. 46ff.) restriction of an operator to a single variable (p. 203) the requirement that AGR-subject coindexing be at D-structure, as distinct from government by AGR at S-structure, with its various consequences (p. 259) the possibility of inserting lexical items either at D- or Sstructure (p. 312)
We have just discussed argument (38e) and concluded that the facts in question form arguments against D-structure. We can therefore limit our attention to the first four arguments (38a-d). The idiom argument hinges on the fact that some idioms can be "scattered" at S-structure (good care\ was taken t; of the orphans), while others cannot (*the bucketj was kicked tj). In other words, idioms of the first type can undergo movement (bind traces), while idioms of the second type cannot. The argument deserves to be quoted in full (Chomsky (1981b, 146, note 94)): Thus idioms in general have the properties of non-idiomatic structures, and appear either in D-structure or S-structure form, but not only in S-structure or L F - f o r m D-structure, not S-structure or LF, appears to be the natural place for the operation of idiom rules, since it is only at D-structure that idioms are uniformly not "scattered" and it is only the D-structure forms that always exist for the idiom (with marked exceptions), S-structures sometimes being inaccessible to idiomatic interpretation. Thus at D-structure, idioms can be distinguished as subject or not subject to Move-a, determining the asymmetry just noted.
It is true that there are idioms that only exist in their D-structure form, but there are also idioms that only exist in S-structure form (the marked exceptions mentioned in the quotation). Bresnan (1982), for instance, gives passive idioqis like x's goose is cooked (meaning, x is in trouble and there
48
Domains and Dynasties
is no way out). But it is irrelevant whether there are many or few such examples, because the logic of the argument is unclear. What is an idiom rule? Presumably it is a rule that says that a V + N P combination, among others, has an idiomatic interpretation (make + headway, kick + the bucket, etc.). It seems to me that the most natural place for such interpretation (e.g. kick the bucket = 'die') is not D-structure but the lexicon. The crucial fact, then, is that some idioms can be scattered and some cannot. But of course, the most natural place for that information is also the lexicon. The question is how this information must be coded. It should be noted that the fact to be accounted for is not that no element of certain idioms can be moved. The N P part of certain V + N P idioms cannot be moved, but there is no direct evidence from English that the V part is also immobile. A language like Dutch has some obligatory V-movement rules, Vsecond (Koopman (1984)), and V-raising (Evers (1975)). It appears that the V part of all V + N P idioms in Dutch undergoes these rules, including idioms of the type kick the bucket. An example is de pijp uitgaan ('to die', lit. to go out of the pipe): (39)
a. b. c.
dat hij de pijp uit ging that he the pipe out went hij ging de pijp uit t dat hij de pijp t scheen uit te gaan that he the pipe seemed out to go
(non-root order) (root order after V-second) (after V-raising)
I conclude from these facts that the non-scattering of idioms is a fact of the NP, not of the V, in V + NP idioms. The question now is what the nature of this fact is. Chomsky (1981b, 146, note 94), assumes — and that is the crux of the argument — that the NP must be marked as not undergoing "move alpha". This marking can of course be done in the lexical specification of the idiom, but it remains a fact about certain idioms which cannot be moved, and therefore can only be inserted in D-structure. But note that under this interpretation the argument tacitly assumes what it must prove, namely that the crucial fact about certain idiomatic NPs is plus or minus "move alpha". It is not only possible but presumably even necessary to code the properties of the idiomatic NPs in the lexicon in a different way. The fact to be explained is that the bucket in kick the bucket cannot bind a trace at S-structure. Suppose now that we code this in the lexicon as follows: (40)
[ v kick] [ N P the bucket] = 'die' [ — antecedent]
Idioms like care (to take care) and headway are not marked with
Levels of
Representation
49
[ — antecedent], a marking which presumably follows from a more general property, e.g. the property of being nonreferential in some sense. The marking with [ — antecedent], as in (40), now no longer blocks insertion at S-structure, but the result is filtered out if the bucket binds something at Sstructure, for instance a trace. This solution is presumably better than the marking with [—move alpha] at D-structure, because (40) also blocks (41) at S-structure: (41)
*He kicked the bucketj before he had paid for iti
The bucket cannot be the antecedent for the pronominal it either, a fact about binding stated at S-structure. Parts of idioms like care can sometimes be antecedents at S-structure (Chomsky (1981b, 327)): (42)
Carex was taken ij of the orphans, but
was sometimes insufficient
All in all, it can hardly be concluded that the idiom argument supports Dstructure. Idioms surely differ from one another, a fact that is naturally expressed in the lexicon. But the differences in question are best interpreted as differences in S-structure behavior. The second argument (38b), "movement only to non-6-positions", has to do with the 9-criterion. Again, we see that "movement" is already presupposed. But since part of the 9-criterion is preserved in the alternative account, the fact in question receives an explanation that does not substantially differ from the standard account: (43)
N P j , . . . , NP; ^
9
If two N P s are coindexed, property sharing, including sharing of the 0role, is possible. But as we saw before, property sharing is filtered by the uniqueness condition: the second N P in (43) can transmit a 9-role to the first only if it does not have a 9-role of its own. This fact has nothing to do with D-structure, but is explained by the uniqueness property of the configurational matrix, which is a property of S-structure relations. Note that it is also guaranteed under the alternative account that in a function chain G F j , . . . , G F n , it is always G F n that is directly licensed. Suppose it were otherwise, i.e. that a 9-role were indirectly assigned (transmitted) to the last N P in a chain: (44)
...NPn_1,...,NPn e
J
Because of the c-command requirement, each link in a chain c-commands the next link; therefore, N P n _ [ c-commands NP n . Suppose now that N P n
50
Domains and Dynasties
is not directly 0-marked, but that it receives its 0-role from N P n _ j . According to (30), indirect 0-marking goes only from predicates to subjects. Consequently, N P n _ j must be contained in the predicate of which N P n is the subject. But this is only possible if N P n _ ) does not ccommand N P n (the predicate itself c-commands the subject, so that the material contained by the predicate does not c-command the subject). But if N P n _ i does not c-command N P n , these two NPs do not form a link of a chain. Therefore, it is impossible for the last element of a chain to get a 0-role indirectly. The last element must always be in a direct 9-position, and the other elements must be in non-0-positions because of the uniqueness condition. The difference between trace and PRO will be the topic of the next chapter. The third argument (38c) concerns examples like (Chomsky (1981b, 203)): (45)
*WhOj did you give [pictures of t{] to t¡?
This example is supposed to be ungrammatical because of the fact that it contains two variables. The idea is that D-structure cannot contain traces and w/ioj can fill only one variable position at D-structure, so that the Dstructure for (45) always contains a non-argument, [ n p e], at D-structure. This argument is without force, because, as we saw before, the definition of D-structure does not exclude a base-generated Wh-phrase binding two variables (unless it is stipulated that D-structure does not contain Wh-traces). More importantly, the intended explanation is completely overruled by the discovery (or rediscovery) of parasitic gaps: 2 (46)
Which bookj did you return tj before reading e{!
This structure contains two variables that cannot both be filled at Dstructure by which book. It is therefore not surprising that the earlier explanation for the ungrammaticality of (45) is not maintained in Chomsky (1982a). The fourth argument (38d) has to do with the ungrammaticality of the following Italian sentence (Chomsky (1981b, 259)): (47)
*NP; AGRj sembra [s Giovanni leggere i libri] seems to read the books
The intended explanation is based on the idea that assigning nominative Case involves a mechanism with two components: (48)
a. b.
AGR is coindexed with the N P it governs nominative Case is assigned to (or checked for) the N P governed by AGR
Levels of
Representation
51
Clearly, (48b) applies at S-structure (as Chomsky notes), because Case must be checked after Raising. The argument, then, crucially involves the assumption that (48a) applies at D-structure (and not at S-structure). If this assumption is plausible, we might have some confirmation for Dstructure. According to Chomsky, (48a) must apply at D-structure for the following reason. If it is assumed that in pro-drop languages the rule R (which adjoins AGR to V) applies in the syntax, A G R will govern Giovanni in (47): "If AGR could be coindexed with Giovanni by [(48a)], then both conditions for nominative Case assignment would be fulfilled: Giovanni would receive nominative Case in [(47)] and raising of the embedded subject would not be obligatory. But if the agreement phenomenon is determined at D-structure, then the structure [(47)] is barred as required" (Chomsky (1981b, 259-260)). It seems to me that this argument is based on questionable assumptions. A much simpler analysis of (47) assumes that the nominative Case assigner AGR (with or without the rule R) or I N F L is not accessible to Giovanni. Nominative Case is assigned under government by AGR (or INFL), but in this case Giovanni cannot be governed by AGR because Giovanni is already in the domain of another governor, namely the verb sembra. As we discussed in chapter 1, government is determined by minimal c-command, i.e. a governer y cannot govern an element 5 in the domain of another governor y'. Consequently, Giovanni is not governed by AGR in (47), so that nominative Case is not assigned to it and (47) is ruled out by the Case filter without Raising. 3 There are various ways to account for pro-drop phenomena, as indicated by Chomsky (1982a, 78ff.), but there is no evidence that an account involving D-structure is somehow superior, or even plausible. We must conclude, then, that none of the arguments given in Chomsky (1981b) and summarized in (38) supports D-structure. Let us now turn to some direct arguments against D-structure. The first argument is based on a phenomenon analyzed by Obenauer (1984). Obenauer has shown that there is a phenomenon in French which he calls "Quantification at a Distance", the binding of the empty ex by beaucoup in (49b): (49)
a. b.
II a rencontre [_beaucoup de linguistes] II a beaucoupi rencontre [_e\ de linguistes]
The interesting property of beaucoup is that it can also occur in these contexts without binding an empty element, with a similar meaning, codetermined by the verb: (50)
II a beaucoup rencontre Jean
There are similar quantifiers, like combien,
that can undergo
Wh-
52
Domains and Dynasties
movement (51)
a. b.
[Combien de linguistes]; a-t-il rencontré t, [Combien]; a-t-il rencontré [e t de linguistes]
It appears that this type of Wh-movement is not possible across a quantifier like beaucoup: (52)
*[QP Combien]; a-t-il beaucoup rencontré [NP[QP eli de linguistes]
Obenauer calls this pseudo-opacity. This phenomenon is problematic for a movement account, because nothing blocks the movement of combien in (52) (cf. (51b)), and beaucoup occurs in the preverbal position where it can normally occur without having a movement source (see (50)). Obenauer argues persuasively that beaucoup necessarily becomes an A'binder of the trace, if there is a trace at S-structure. Under this assumption, (53) ( = (52)) is straightforwardly ruled out: (53)
*Combienx a-t-il beaucoup; rencontré [ t j de linguistes]
In the theory presented here, this structure is ruled out by the uniqueness principle at S-structure: a trace can have only one antecedent. The point is that this analysis is based on the fact that the relation between beaucoup and the trace is of the same nature as the relation between combien and the trace (the uniqueness condition constrains relations of a given type). But then the relation in question has nothing to do with "move alpha", because the relation between beaucoup and the trace is not created by "move alpha". In terms of a movement account, beaucoup only becomes a trace-binder after the unrelated movement of combien. In other words, (52) clearly involves a relation of the antecedent-trace type that is not created by "move alpha". As Obenauer rightly concludes, such examples favor a representational view of the antecedent-trace relation over a derivational view. A similar argument against the derivational approach can be based on island violations. In some languages, like Swedish, islands can relatively easily be violated (see Allwood (1976)), which might have to do with the productivity of the resumptive pronoun strategy in this language (see Engdahl (1984)). But even in English, relatively acceptable violations of the Complex N P Constraint can be produced (as we briefly discussed in chapter 1): (54)
Which race; did you express [NP a desire [ s to win e;]]
This is not a universal phenomenon, because the Dutch equivalent is totally ungrammatical:
53
Levels of Representation (55)
**Welke race; heb je het verlangen (ora) te which race have you the desire C O M P to uitgedrukt? expressed
e\ winnen win
If Subjacency were discovered on the basis of Dutch, we could simply say that this sentence is ruled out by Subjacency (or some equivalent of it; see chapter 4). It would be a reasonable conclusion, then, that (54), which as a Subjacency violation does not show the characteristic property of movement, cannot be derived by movement. If this conclusion is correct, it must be possible to generate Wh-phrases in C O M P (as in (54)) without "move alpha". Since there is nothing in the definition of D-structure that precludes base-generation of variables (apart from stipulation), (54) must be a possible base structure. Despite the fact that (54) does not show the property of "move alpha", it is usually assumed that it is derived by "move alpha", perhaps under the further assumption that Subjacency is not an absolute principle, but an expression of the unmarked case. The Dutch example, however, shows that Subjacency is an absolute principle: if it is violated, the resulting sentence is very unacceptable. I will show in chapter 4 that relatively acceptable island violations do not involve "move alpha" in the traditional sense, but a resumptive pronoun strategy, as discovered by Cinque (1983b) and Obenauer (1984). A violation of island constraints (as in (54)) is not simply "movement" with less strict Subjacency. Gaps in such structures appear to have properties that are entirely different from the properties of standard traces. Standard traces can be of all categories (NPs, PPs, adjuncts, etc.). Gaps in islands (like ex in (54)), parasitic gaps among them, are usually exclusively of the category NP. Moreover, they can only be found in structures in which certain directionality constraints are met (global harmony, see chapter 4). The directionality constraints in question can be met in SVO languages like English, Romance, and Scandinavian, but not in SOV languages like Dutch or German; hence, the acceptability of (54) in the SVO languages, and the total unacceptability in the SOV languages (see 55)). Gaps in islands, in other words, show a surprising clustering of properties: 4 (56)
a. b. c.
Subjacency is violated only N P s are possible directionality constraints must be met
Gaps that are strictly locally bound, like "traces", lack these three properties. What this indicates, is that the violation of Subjacency in (54) is not an accidental property, but a characteristic property of the construction in question (i.e. the construction with the antecedent-resumptive
54
Domains and
Dynasties
pronoun strategy): the properties (56b-c) are found only if Subjacency is violated. If these conclusions (worked out in chapter 4) are correct, (54) cannot be derived by "move alpha". There is, therefore, strong evidence that Whphrases can be generated in CO M P without involvement of "move alpha". Like Obenauer's argument, this argument favors the representational view over the derivational view for the Wh-phrase-gap relation. We can now summarize our objections against D-structure and "move alpha". D-structure is essentially a reconstruction level for the direct assignment of 9-roles. As we have seen, it is not possible to construct such a level (distinct from lexical structure). Even if D-structure is assumed, 0role assignment must be extended to subjects (indirect 9-marking in the sense of Chomsky (1981b, 38)). This is not sufficient, however, because arguments in topicalization and easy-to-please constructions are neither complements nor subjects that qualify as indirect 9-positions in the sense of the Extended Projection Principle. Indirect 9-role assignment must be extended to all subjects in subject-predicate constructions. If we d o so, the biuniqueness property of the 0-criterion must be dropped, which is a welcome move for principled reasons. As soon as we d r o p biuniqueness (the one-to-one relation between 9-roles and arguments), the distinction between D-structure and S-structure becomes meaningless. " M o v e alpha" is essentially a transfer mechanism. Its status as a unique transfer mechanism, distinct from other construal rules, is not supported by the facts. First of all, it has the same configurational properties as other construal rules (see chapter 1 and the following chapters). And, secondly, it is a transfer mechanism filtered by the same uniqueness condition as other construal rules. This second aspect is very important, because in the standard G B approach much weight is given to the fact that movement is t o non-9positions. This is, however, totally determined by the uniqueness condition that also filters the transfer potential of other construal rules. 9-roles cannot be transmitted to positions that already have a 0-role (or that cannot have one in principle), which is just like the fact that referential indices cannot be transmitted to N P s that already have a referential index (such as nonanaphors), or that cannot have a referential index in principle (like A'-positions). That "move alpha" is just an instance of the general property-sharing rule (13), filtered by uniqueness and other independent factors, is clearly demonstrated by the fact that "move alpha" transmits different things in different circumstances; (i)
"move alpha" transmits Case (and lexical content), but no 9-role, as in Wh-itiovement
(57)
Whom; did you see (,?
Levels of Representation
55
This follows from (30): indirect 0-role assignment is only to arguments (subjects). This fact is significant because it instantiates a general aspect of the property-sharing rule: only those properties are transmitted that the "receiving" category can take independently. In this case, a 0-role is not transmitted, because non-arguments art inherently unable to take a 0-role. (ii)
"move alpha" transmits a 0-role (and lexical content) but no Case, as in NP-movement
(58)
John; was arrested i;
Naturally, a 0-role can be transmitted to a non-0-position. Transfer of a 0role to a 0-position would violate the uniqueness condition. Similarly, John in (58) receives Case independently (from INFL). Again, the uniqueness condition prohibits transport of a second Case to such positions. What Wh-movement in (57) and NP-movement in (58) have in common is the transfer of lexical content. This is not necessary, however: (iii)
"move alpha" transmits a 0-role but no lexical content:
(59)
John; wants [PRO; to be arrested i j
Again, we see that what "move alpha" transmits is dependent on the inherent properties of the "landing site". If the landing site cannot have lexical content for independent reasons (like the P R O position in (59)), no lexical content is transmitted. It must be concluded, then, that "move alpha" cannot be functionally defined: what it does is contextually determined. Since "move alpha" cannot be configurationally defined either, there is no evidence for the idea that it deserves an independent existence in the theory of grammar. It is just an artefact, the result of the combined properties of the propertysharing rule (13) (i.e. the properties of the configurational matrix) and the independent properties of the landing sites. Crucial evidence in favor of this view, apart from its obvious advantage in terms of conceptual economy, can be found in topicalization and other left dislocation structures, and in easy-to-please constructions: (60)
a. b.
Bi/Zj, I don't like /iim1 Johni is easy [O, [to please i j ]
As a transfer mechanism of 0-roles and Case, "move alpha" is neither necessary nor sufficient. We have already seen that "move alpha" does not always transmit a 0-role (57) or Case (58). The examples in (60) show that Case and 0-role can also be transmitted by other construal rules. What is crucial is that the transfer is dependent on the same filtering mechanism in these nonmovement construal rules. Thus, in (60a) both a 0-role and a
56
Domains and
Dynasties
Case are transmitted to Bill, because Bill is an argument in a position to which Case and 0-role are not assigned independently. In (60b), John is in a position with inherent Case (assigned by INFL), but without a direct 8license. As a consequence of the uniqueness condition, Case can be transmitted in (60a) b u t not in (60b). A 0-role must (and can) be transmitted in both cases. Nonmovement construal, in other words, is subject to the same functional selectivity as movement. Where movement and nonmovement construal sometimes differ is with respect to the transfer of lexical content (for instance, idiom chunks): (61)
* G o o d care; is hard to take tj of the orphans
But again, this is not a difference in the nature of the transfer mechanism itself, but the consequence of an independent filter mechanism. If an idiom chunk is related to its licensing position in a construal chain that involves other lexical material (like him in (60a) and the lexical features of the operator Oj in (60b)), the lexical content of the idiom chunk cannot be fully transmitted to (shared by) the licensing position. Again, this is a result of the uniqueness condition: a given position has one and only one lexical content. N o wonder, then, that idiomatic lexical content is only optimally transferred in construal chains with no other lexical features. The difference in grammaticality between (60b) and (61) is usually interpreted as an indication that "move alpha" is not involved in the 0-role transfer of the trace i; in (60b) to John. The implicit assumption, then, is that the transfer of a 0-role and the transfer of lexical content always correlate in "move alpha". As we have seen, however, this assumption is false. In (59), for instance, a 0-role is transferred from t; to PRO;, while lexical content is clearly not transferred. Given the contextually determined functional selectivity of "move alpha", the discrepancy in grammaticality between (60b) and (61) is not a sufficient argument against the involvement of "move alpha" in the 0-transfer from t\ to John; in (60b). But there are other arguments against a movement analysis. That (60a) does not involve movement is quite uncontroversial. We explained the ungrammaticality of (61) by the hidden lexical features of the operator Oj (cf. (60b)). If John in (60b) inherits a 0-role by "move alpha" through the operator position in C O MP, lexical features must be created ex nihilo, and we would have a chain with two different Cases (of John and the trace, respectively), which is not possible for chains in general (Chomsky (1981b, 334)). A structure like (60b), then, forms a strong counterexample to the usual assumptions concerning D-structure and S-structure as a pair related by "move alpha". In any case, (60b) shows a structure with two arguments (John and the trace) sharing only one 0-role. There is n o corresponding Dstructure for such cases, according to standard assumptions. We can give (60b) a D-structure only by dropping the assumption of a one-to-one relation between 0-roles and arguments at D-structure. This brings us
Levels of
Representation
57
once again to the essence of our argument: dropping the one-to-one relation in question comes down to giving up the idea that there is a significant distinction between D-structure and S-structure. It appears, then, that one of the oldest examples presented in favor of a level of deep structure, John is easy to please, forms one of the strongest arguments against what is now called D-structure. In conclusion, it is useful to give a summary of the kinds of positions that we find at S-structure: (62)
a. b.
c.
positions that are directly projected from the lexicon (basic positions) positions that are related to (and share properties with) basic positions: i. Wh-positions in C O M P ii. subjects in non-0-positions iii. topics adjuncts
It has never been controversial that there are positions like the basic positions of (62a), distinct from the positions summed up in (62b). I will maintain this aspect of D-structure by sometimes referring to basic positions as D-structure positions. In chapter 4,1 will show that directionality constraints (global harmony) are computed from D-structure positions only. In this sense, D-structure survives as a substructure of S-structure with specific properties. This fact is compatible with a theory like the one presented in Koster (1978c). What was denied in that theory, and what seems even more untenable now, is that S-structure with the positions mentioned under (62b) and its lexical substructure (the D-structure positions) exist as two different levels that can be bridged by "move alpha".
23.
NP-structure
Van Riemsdijk and Williams (1981) have proposed a level of representation distinct from both D- and S-structure. This level is situated between the application of NP-movement and Wh-movement (63)
D-structure
->
move N P
NP-structure
-»
move Wh
S-structure
The arguments for NP-structure are very much like the standard arguments for D-structure: certain facts are treated most elegantly, in the most revealing way, if certain elements are in certain positions, rather than related to certain positions. It should be noted that it is essentially an argument of elegance, because there is a certain consensus that the
58
Domains and Dynasties
arguments in question are not absolutely compelling if there are traces at S-structure. The point can best be illustrated with an example of predication that Van Riemsdijk and Williams give (p. 205): (64)
a. b.
John ate the meat; raw; How raw, did John eat the meat r,?
Both sentences represent a subject-predicate relation with the meat as subject and (how) raw as predicate. If we call the level at which the subjectpredicate relation is fixed "predicate structure" (as in Williams (1980)), the c-command condition on the predication relation is presumably stated as follows: (65)
In predication structure, a subject must c-command its predicate or a trace of its predicate
Van Riemsdijk and Williams argue that if we assume that predicate structure is in fact the pre-Wh-movement NP-structure, the statement (65) can gain in elegance by dropping the reference to the trace of the predicate (in bold type in (65)). An argument of this type is weak in principle because the full representation of (64b) is as follows: (66)
[AP how raw]; did John eat [NP the meat]; [AP t]i
Since the predication relation is defined for pairs [NP; AP;], it is simply false that the statement of the c-command relation has to refer to the notion "trace" at S-structure. The theory presupposed by (65) is typically a theory without traces, in spite of the fact that (65) mentions the notion "trace". With traces, the c-command condition can be given at S-structure without reference to the notion trace: (67)
A subject NP; in a predicate structure NP; XP; must c-command its predicate XP;
Both in (64a) and (64b) ( = (66)), this simple condition is fulfilled at Sstructure. So, there is no element of elegance in this case. The argument is apparently based on the mistaken assumption that the whole predicate has been moved in (66). What has been moved in (66), however, is not the whole predicate but only the lexical content of the predicate. A given lexical content has a syntactic function (such as "being a predicate") only with respect to a functional position. The Wh-position in C O MP is not a functional position in this sense. The situation is analogous to what we observe when a Wh-phrase is moved from an argument position:
Levels of (68)
Representation
59
What; did you see fj
It is generally assumed that what in C O M P is not an argument in this case; only its "original" position, the position of the trace, is an argument position. Similarly, how raw is not a predicate in (66); only the trace position is. Basically, all the arguments that Van Riemsdijk and Williams give are of this "elegance" type, and basically all of these arguments show the same weakness, namely, that a functional position is not distinguished from its lexical content. I will return to the arguments in detail. But first I will show that predicate structure cannot be NP-structure in the intended sense. Consider a topicalization structure like (69): (69)
Bill; [ s . Oj [I don't like t j ]
Clearly, this is a predication structure with Bill as subject and the open sentence as predicate (see Chomsky (1977)). The topic Bill inherits its Case from the trace position t „ and binding conditions are also transferred from this position: (70)
[Pictures of each otherJj [ O j [they; don't like ij]]
This reveals an inconsistency in the NP-structure model: according to Van Riemsdijk and Williams, NP-structure is the level at which the binding theory applies and at which Case is assigned. This means that the topic must have its NP-structure position in the position indicated by the trace. It is only here that Case is assigned directly and that the binding theory applies without extra transfer. It is for this reason that the NP-structure model derives topicalization by so-called Vergnaud-raising, a rule proposed by Vergnaud (1974). Applied to topicalization, this analysis assumes movement of the topic from the trace position tj to the operator position Oj in (70), followed by Vergnaud-raising to the topic position. In this analysis, the predicate structure in the optimally elegant sense that a lexical subject c-commands a lexical predicate, is only formed after Whmovement of the topic from ij to Oj. I will show that Vergnaud-raising for topicalization is impossible for independent reasons. Here, the example suffices to establish that it is impossible to construct a level at which both Case assignment and predication apply in the intended sense. More generally, the arguments against NP-structure are of the same type as the arguments against D-structure the properties of the mapping, "move alpha", cannot be isolated and established, and the properties attributed to NP-structure itself are not exclusive properties of that level. Let us therefore have a closer look at the properties in question. Van Riemsdijk and Williams (1981) give the following four properties of NP-structure:
60 (71)
Domains and a b. c. d.
Dynasties
the opacity condition (ultimately the binding theory) applies at NP-structure (abstract) Case is assigned at NP-structure contraction (i.e. fo-contraction) operates at NP-structure (certain) filters apply at NP-structure
I will now briefly discuss these arguments, beginning with the idea that the binding theory applies at NP-structure. That Wh-movement does not affect the binding possibilities in the same way as N P - m o v e m e n t is not very surprising, given the fact that the binding theory concerns the relations between arguments, i.e. elements in A-positions. Rules that move material to A'-positions, naturally, have no effect on relations between A-positions. But apart from this, (71a) cannot fulfill its promises. As we saw in (70), the binding theory can only apply at NP-structure if we assume Vergnaud-raising for topicalization. This cannot be right because, as I will show, Vergnaud-raising is impossible for topicalization. If this conclusion is correct, (71a) must be false. One aspect of (71a) deserves special attention. Van Riemsdijk and Williams see it as an advantage of their model that the binding theory does not have t o refer to Wh-traces: at NP-structure the future Wh-traces are still filled by Wh-phrases, which, as n o n a n a p h o r s (and nonpronominals) cannot be bound. This would explain the alleged binding properties of Wh-traces in other models without stipulating a difference between Wh-traces and other empty categories (1981, 174). I strongly agree with the spirit of this explanation, as I will argue below. But the proposed execution of the idea, with the Wh-phrases physically filling the future trace positions, again does not give what it promises. The reason is that there are gaps, like parasitic gaps, that are identified by Wh-phrases but that cannot be literally filled by these Wh-phrases at NP-structure. As I will show in detail in chapter 4, the relation between Wh-phrases and parasitic gaps is definitely not characterized by Whmovement. Here we might add that in languages with rich overt Casemarking, like Finnish, the parasitic gaps can have a Case different from the Wh-phrase, which always agrees with the locally bound gap (the trace; see Taraldsen (1984)). This confirms the idea that the relation between a Wh-phrase and a trace has the properties of what is usually called Whmovement, but that parasitic gaps have different properties. In spite of the fact that parasitic gaps cannot "physically" be filled by their binding Wh-phrases, they have the properties of Wh-traces with respect t o the binding theory (they must be A-free in every governing category; see chapter 6 for more details). This fact undermines the execution that Van Riemsdijk and Williams give to a certain explanatory idea with which I agree, i.e. the idea that the behavior of certain gaps is derived from the n a t u r e of their antecedent. In particular, the facts about parasitic gaps undermine the idea that NP-structure is necessary or even possible in carrying out the intended explanation.
Levels of Representation
61
In short, (71a) is not supported by the binding facts. It is, on the contrary, incompatible with the binding facts if a larger class of facts is considered, in particular the facts about topicalization and parasitic gaps. The second property, (71b), will be my main target for a somewhat more elaborated critique, so I will postpone its discussion until after a brief discussion of (71c) and (d). These two arguments are very similar and show the same weakness: they overlook the fact that the behavior of a position is the joint result of its functional status and its lexical content. The contraction facts on which (71c) is based are well known. A Whtrace blocks contraction (72), while an NP-trace and P R O do not block contraction ((73) and (74), respectively): (72) (73) (74)
a. b. a. b. a. b.
Who; do you want f, to beat Nixon *Whoj do you wanna t; beat Nixon John; is supposed tj to leave John; is sposta tt leave Ij want PRO; to leave I; wanna PRO; leave
The argument is that it is already known that material that is "physically" present between want and to blocks contraction, and that it is only natural to assume that "physically present" material blocks contraction at PR (Phonetic Representation). If contraction applies at NP-structure, before Wh-movement, the Wh-phrase is literally present between want and to in (72), so that contraction is blocked in the most natural way. It seems to me that this approach adds nothing to the standard approach, which explains the difference by the fact that Wh-traces differ from NP-traces and PRO in that Wh-traces have Case. Case is what Wh-traces have in common with lexical NPs, which explains the similarity in contraction behavior. Since there are Case-marked gaps identified by a c-commanding Whphrase that are not the result of Wh-movement, namely parasitic gaps and gaps in islands, it is possible to give a crucial test. We have seen before that gaps in islands differ from traces (see (56) above); particularly, they miss the characteristic Subjacency property of the traces of Wh-movement. If these gaps are not created by Wh-movement (see chapter 4), the NPstructure theory and the standard theory make different predictions. Since the gaps in question are Case-marked, the standard GB theory predicts that contraction across these gaps is just as bad as in (72b). The NPstructure approach, however, predicts that contraction is possible because the gaps are not created by Wh-movement; particularly, the gaps have not been physically filled at any level by the Wh-phrase. Consider now the following data: (75)
a.
??Which man; did you express [a desire [that you want e, to succeed Reagan]]
62
Domains and b.
Dynasties
* Which man; did you express [ a desire [that you wanna ej succeed Reagan]]
Naturally, such island violations yield less acceptable sentences. But to the extent that these d a t a are clear, it seems t o me that we find the same contrast as in (72). If this conclusion is correct, the standard approach is confirmed: Case suffices to block contraction, and "physical presence" of the Wh-phrase — as required by the NP-structure model — is not necessary. T h e fourth argument, (7 Id), is based on a filter proposed for certain Italian d a t a by Longobardi (1980). In Italian, adjacent infinitives are bad under certain circumstances (Van Riemsdijk and Williams (1981, 177)): (76)
*Giorgio comincia ad amare begins
to like
studiare to study
Essentially, Longobardi's filter has the following form: (77)
*Vinf[VpVmf
Again, Van Riemsdijk and Williams observe, the filter is blocked by a Whtrace (in an argument position) but not by an NP-trace or PRO. Moreover, the filter still seems t o apply if the second infinitive is preposed, for instance by clefting (78a) or topicalization (78b): (78)
a. b.
*E [ a n d a r e a Roma]j che potrei desiderare t; it-is to-go to Rome that I-might wish * [Andare a Pisa]j potrei preferire t; to-go to Pisa I-might prefer
Since the configuration of the filter (77) seems to be destroyed in these structures, Van Riemsdijk and Williams assume that there must be a preWh-movement level, NP-structure, at which the two infinitives are still adjacent In this case, it might seem that it is less easy to dismiss the argument than in the predication case (66). In the predication case, the relevant information, the category AP, was still present after Wh-movement. In (78) something seems to be definitely lost, namely the internal structure of the trace. T h e categorial structure of the trace (presumably S') is irrelevant. What really matters is the internal structure of S', i.e. the fact that it contains an infinitive. One solution, proposed by Longobardi, would be layered traces, so that the S'-trace can have an empty V with the feature inf. This solution is not particularly attractive, according to Van Riemsdijk and Williams. It seems t o me that the solution of NP-structure is unattractive for the same reason: it does not seem very plausible that the
Levels of Representation
63
infinitives in (78) directly bind the trace, just as it is implausible that cleft sentences and topicalizations are derived by Vergnaud-raising, as Van Riemsdijk and Williams assume. It is not unlikely that the construal chain involves an extra step, as in: (79)
Whati I prefer t, is to go to Rome
As mentioned before, Higgins (1973) has argued that such pseudo-clefts cannot be derived by movement. (79) is a good example, because the trace is of type NP, while the focus constituent, the infinitive, is of type S'. Similarly, the cleft and topicalization structures might involve a Wh-trace of an NP, bound by an operator in COMP, which is in turn linked to a fronted S'. If an analysis along these lines is correct for (78), then the trace tj is an NP, and both the solution based on layered traces and that based on NP-structure must be incorrect, because these solutions presuppose a trace of the categorial type of the infinitive, i.e. an S'. Later on, I will present a solution that avoids this problem in particular, and the impossible Vergnaud-type solution for topicalization in general. But first I will analyze (71b), the claim that NP-structure is the level at which Case is assigned. Case assignment is for NP-structure what 0-marking is for D-structure. Whereas the postulation of D-structure is inspired by the desire to construct a level where all 0-roles are directly assigned, NP-structure is postulated by a desire, among other things, to construct a level at which Case is directly assigned. The underlying assumption is the same in both cases: direct licensing is more natural than indirect licensing through traces and other links of construal chains. As we have seen in the case of D-structure, it is not possible to construct a level at which all 9-marking is direct. But at least it could be maintained that all 0-roles are ultimately derived from the properties of lexical items. For Case, not even that can be true. Consider first an example in which the Case of a topic is derived from the complement Case of a verb. A relevant example is the German example given earlier (Van Riemsdijk (1978, 167)) (see (33) above): (80)
Den Hansj (acc.), den; (acc.) mag ich f; nicht
In a way, this example is already problematic for the idea that NPstructure is the level of Case assignment. The problem is that (80) contains two Case-marked NPs. It is not possible to create a level at which both NPs receive their Case directly in the object position of the verb. It is therefore again necessary to derive (80) by Vergnaud-raising. But note that this would be a very undesirable kind of chain formation in this case. First of all, "move alpha" would have to be complicated by giving it the ad hoc power of being able to create an extra lexical position. This would result in a chain with two Cases, which is an anomaly (cf. Chomsky (1981b, 334)).
64
Domains and
Dynasties
Secondly, it is hard to maintain that den in (80) is a kind of "visible" trace of the moved topic den Hans. The point is that den can also occur independently (this is essentially Dougherty's anaporn principle): (81)
Denj mag ich f j nicht that one like I not 'That one, I don't like'
Den is just an independent pronominal. So, we could already interpret (80) as a counterexample to the idea that there is a level at which all Case is assigned directly. Things become more problematic if we consider examples in which the Case of the topic is not derived from a complement (or subject) position. Thus, Van Riemsdijk (1978, 168) gives examples like the following: (82)
Der H a n s (nom.), mit dem (dat.) spreche ich nicht mehr the John with him talk I not more 'John, I don't talk to him any longer'
Here, the topic has obligatory nominative Case, while the bound d-word in C O M P has dative Case. Case agreement leads to an ungrammatical sentence: (83)
*Dem Hans (dat.), mit dem (dat.) spreche ich nicht mehr
In (82), then, the nominative Case of the topic is not derived from a clauseinternal position. This shows two things. First, it appears that not all Cases are ultimately derived from positions projected from the lexicon, as 0-roles are. Secondly, (82) shows that Vergnaud-raising is not a general solution to the Case-transfer problem. What we see in (82) is the selectivity of transfer that characterizes rules of construal in general. If a topic already has a Case (82) for some independent reason, Case is not transmitted. If the topic has n o independent Case, it must be transmitted (80). Apparently, nominative Case is optionally assigned to topics (it is also possible as an option in (80)), as a default Case (as suggested by Jan Odijk (personal communication)), or as a generalization of the Case assignment t o subjects (the topics in question are subjects in a subject-predicate relation). If the nominative option is not chosen, Case must be transmitted by a c-commanding d-word, as in (80). If the d-word does not c-command the topic, n o Case is transmitted, and the nominative is obligatorily chosen because of the Case filter (cf. (82) and (83)). T h e idea that NP-structure is the level of Case assignment is motivated by examples like the following: (84)
Whom; did you see t;
Levels of
Representation
65
The idea is that only the trace position is a position of direct Case assignment Since direct Case assignment is the most natural Case assignment, and since whom is only in its natural Case position before Whmovement, i.e. at NP-structure, NP-structure is the natural level of Case assignment. As in the case of 9-marking and D-structure, this view only makes sense if it can be established that there is a one-to-one correspondence between Case positions and Case-bearing NPs. If Case can be transmitted by (nonmovement) construal, so that the one-to-one correspondence breaks down, there is no evidence for NP-structure. It must be shown, in other words, why it is implausible that the Case of whom in (84) is derived from the trace position at S-structure. The view that Case can only be assigned to N P s in direct Case positions and not be transmitted by construal rules is plainly false. Practically all construal rules that connect NPs can transfer Case. We have already seen examples like (80), but also ordinary (non-d) pronouns transmit Case. Van Riemsdijk (1978, 175, note 27) gives examples like: (85)
Den Hans (acc.), ich habe ihn (acc.) gestern gesehen the John I have him yesterday seen 'John, I saw him yesterday'
There is almost full consensus that such cases of ordinary Left Dislocation are not transformationally derived by "move alpha" (see Van Riemsdijk and Zwarts (1974) for arguments). One reason is that epithets can also transfer Case (see Cinque (1983a)): (86)
John, Mary doesn't like that little bastard
Also, in many other examples of nonmovement construal, it appears that Case can be transmitted: (87)
a. b. c. d.
W h a t j he really likes t j (obj.) is himselfj (obj.) He saw something awful; (obj.): himselfj! (obj.) W h a t j did he see t j (obj.)? Himself! (obj.) John saw Billi (obj.) and Peter himself; (obj.)
Another clear case is Sluicing, discussed by Van Riemsdijk (1978, 231ff.). Van Riemsdijk convincingly argues that Sluicing structures are not derived by deleting the whole context as in: (88)
a.
mm
Someone has done the dishes, but I am not sure who fiii