176 65 25MB
English Pages [306] Year 2015
The Case for Lexicase
Linguistics: Bloomsbury Academic Collections
This Collection, composed of 19 reissued titles from The Athlone Press, Cassell and Frances Pinter, offers a distinguished selection of titles that showcase the breadth of linguistic study. The collection is available both in e-book and print versions. Titles in Linguistics are available in the following subsets: Linguistics: Communication in Artificial Intelligence Linguistics: Open Linguistics Linguistics: Language Studies
Other titles available in Linguistics: Open Linguistics include: Into the Mother Tongue: A Case Study in Early Language Development, Clare Painter The Intonation Systems of English, Paul Tench Studies in Systemic Phonology, Ed. by Paul Tench Ways of Saying: Ways of Meaning: Selected Papers of Ruqaiya Hasan, Ed. by Carmel Cloran, David Butt and Geoffrey Williams Semiotics of Culture and Language Volume 1: Language as Social Semiotic, Ed. by Robin P. Fawcett, M. A. K. Halliday, Sydney M. Lamb and Adam Makkai Semiotics of Culture and Language Volume 2: Language and Other Semiotic Systems of Culture, Ed. by Robin P. Fawcett, M. A. K. Halliday, Sydney Lamb and Adam Makkai
The Case for Lexicase An Outline of Lexicase Grammatical Theory Stanley Starosta
Linguistics: Open Linguistics BLOOMSBURY ACADEMIC COLLECTIONS
Bloomsbury Academic An imprint of Bloomsbury Publishing Plc
Bloomsbury Academic An imprint of Bloomsbury Publishing Plc 50 Bedford Square London WC1B 3DP UK
1385 Broadway New York NY 10018 USA
www.bloomsbury.com BLOOMSBURY and the Diana logo are trademarks of Bloomsbury Publishing Plc First published in 1988 by Pinter Publishers Limited This edition published by Bloomsbury Academic 2015 © Stanley Starosta 1988 © Aleli Starosta 2015 Stanley Starosta has asserted his right under the Copyright, Designs and Patents Act, 1988, to be identified as Author of this work. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. No responsibility for loss caused to any individual or organization acting on or refraining from action as a result of the material in this publication can be accepted by Bloomsbury or the author. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: HB: 978-1-4742-4672-9 ePDF: 978-1-4742-4671-2 Set: 978-1-4742-4731-3 Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress Series: Bloomsbury Academic Collections, ISSN 2051-0012
Printed and bound in Great Britain
The Case for Lexicase
T
h
e
C
a
s
e
f o r
L e x i c a s e A n
O u t l i n e
of
G r a m m a t i c a l
L e x i c a s e T h e o r y
Stanley Starosta
Pinter Publishers M A
London and New York
© Stanley Starosta, 1988 Allrightsreserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any means without the prior written permission of the copyright holder. Please direct all enquiries to the publishers. First published in Great Britain in 1988 by Pinter Publishers Limited 25 Floral Street, London WC 2E 9DS British Library Cataloguing in Publication Data A CIP catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data Starosta, Stanley. The case for lexicase. (Open linguistics series) Bibliography: p. Includes index. 1. Generative grammer. 2. Lexicology. 3. Grammar, Comparative and general—Case. I. Title. II. Series. P158.S78 1988 415 87-32794 ISBN 0-86187-639-3 Printed in Great Britain
Typeset by Joshua Associates Limited, Oxford Printed by Biddies Limited, Guildford, Surrey
Contents Foreword by Richard Hudson
vii
Features and Abbreviations
x
Preface and acknowledgements
xi
1 Formal properties of lexicase theory 1.1 Attributes 1.2 Domain 1.3 Constraints
1 1 4 7
2 Pan-lexicalism 2.1 Grammar as lexicon 2.2 Grammar as rules 2.3 Lexicon 2.4 Rules and organization
38 38 39 42 59
3 Formalization 3.1 Organization 3.2 Rule types 3.3 Rule ordering 3.4 Representations
72 72 75 101 103
4 Roles and relations 4.1 Division of labor 4.2 Case relations 4.3 Macroroles 4.4 Verbal derivation
114 114 120 145 155
5 Case forms and case marking 5.1 Case forms 5.2 Case markers 5.3 Conjunction reduction
178 178 193 210
6 Constructions and classes 6.1 NP structure and noun subcategorization 6.2 PP structure and preposition subcategorization 6.3 Clause structure and verb subcategorization 6.4 Coordination
219 219 232 235 246
6.5 Discourse
255
7 Conclusion
257
Bibliography Index
259 268
Foreword By Richard H u d s o n Lexicase has been developing since about 1971, but this is the first book about it; in fact, it is near to being the first published presentation of the theory in any form. This is a pity because the theory has a lot to offer. Most obviously, it defines a range of analytical categories which seem helpful in typology and description, as witness the large number of descriptive linguists who have already used it in analysing a wide range of exotic languages. However, lexicase also addresses some fundamental questions about grammatical theory. Few theoretical linguists have so far paid any attention to its answers. This is perhaps hardly surprising, considering the lack of publications; under the circumstances what is surprising is the quantity of descriptive work. With the publication of this book it is to be hoped that the theory will become better known among theoretical linguists. Let me pick out three general questions to which lexicase gives interesting answers. In each case the lexicase answer is in tune with developments which seem to be taking place in other theories, while still being sufficiently controversial to challenge standard assumptions. (And in each case, as it happens, Starosta's answers are the same as those which I would give, within the theory of Word Grammar.) The three questions are as follows, and the lexicase answers, in a nutshell, are given in parentheses. Q.l How many syntactic levels are there? (Answer: one) Q.2 Are dependency relations basic or derived? (Answer: basic) Q.3 Are the rules of grammar formally distinct from subcategorization facts? (Answer: no) The lexicase answers command our attention not only because of the theoretical arguments given in this book, but also because they have been tried out on a lot of non-European languages as well as on English. The same cannot be said of many of the currently popular theories. Q. 1 How many syntactic levels are there? Like several other current theories, lexicase allows only one syntactic structure per sentence (barring ambiguity, of course), namely a completely surface one (without even empty nodes). The claim is that such structures can be generated directly, but it is interesting to note that in order to permit this the structures generated are made far richer than typical phrase-markers, by the addition of several different kinds of relations and categorizations—'case relations' (roughly, theta-roles), 'case forms' (roughly, abstract 'case'), 'case markers' (surface markers of abstract case) and 'macroroles' (roughly, 'subject' versus 'object' in languages of the accusative type). All these are part of the syntactic structure (which is not
Vl.ll
FOREWORD
distinguished from semantic structure), in addition to quite a rich system of classificatory and subcategorizational features. The question how well it all works is a matter ofjudgement, but the lexicase approach does seem to illustrate a very general conclusion of recent work in syntactic theory: the fewer levels of structure there are, the richer each level has to be. Hardly surprising, maybe, but worth establishing for all that. Q.2 Are dependency relations basic or derived? Since the introduction of X-bar theory it has become uncontroversial to make a formal distinction between the head of a construction and its other daughters. However this is only true of the rules; when it comes to sentence structures all nodes have the same status, so the only way to pick out the head of a construction is to go back to the rule which generated it. Thus the dominant tradition in grammatical theory is one in which the dependency relations between heads and non-heads are not basic but derived. Lexicase links this tradition to a much older one, in which dependency relations are basic and are just as much part of the structure of a sentence as other categorizations such as the word classes and morpho-syntactic features. Of course, main-stream theories are based on the assumption that dependency relations are not in fact basic in sentence structure, so we have to decide between this assumption and the contrary assumption behind lexicase (and a number of other current theories). It is also a matter of debate whether constituent structure has any part to play in grammar once dependencies are taken as basic; according to lexicase it does have a residual role, but others have denied this. The debate will take some time to resolve, but the lexicase answer is a serious contribution to it. Q.3 Are the rules of grammar formally distinct from subcategorization facts? It has been pointed out repeatedly that some phrase-structure rules are redundant if strict subcategorization facts are included in the lexicon, because the rules simply summarize all the possibilities permitted by the subcategorization. If each subcategorization frame defines a distinct construction, then we might conclude that each such frame is in fact a rule, equivalent to a phrasestructure rule—in which case the hitherto watertight boundary between the rules of grammar and the lexicon would of course no longer exist. A different conclusion would be that the phrase-structure rules should be renamed 'lexical redundancy rules', implying that they are part of the lexicon (whatever that might mean). The trouble with both of these views is that they are true only of a certain type of phrase-structure rule, namely one with a lexical head; the rules responsible (under standard assumptions) for subjects and adjuncts are unaffected, because they do not correspond to strict subcategorization facts. In lexicase (and other theories in which dependency is basic), every construction has a lexical head, so subjects and adjuncts can be treated in the same way as complements. This allows lexicase to define all constructions by means of information attached (in the form of features) to lexical items. Of course, some of these features have to be assigned by rule, and lexicase offers a fairly rich typology of such rules, but thesre rules can be seen as each
FOREWORD
IX
contributing to the properties of a lexical item. The consequence is that virtually all of the grammar is concerned with the definition of lexical items, and the boundary between rules and the lexicon disappears. It remains to be seen whether this is nearer to the truth than the standard view, but at least it is a coherent view which is spelled out clearly enough to be assessed. None of the lexicase answers are self-evidently true, nor are any of them obviously false, so they need the same critical consideration as the more familiar answers. The only way to be sure that one's favourite answer is right is by elimination of all serious alternatives; thanks to the present book another serious alternative is now available for scrutiny. Richard Hudson University College, London
Features and abbreviations
Ace adbl Adj adrs Adv AGT anmt APS assn bstr clct cnjc Det dfnt dgre djct dmcl dmmy dprv drcn Dtv fint fore goal humn lctn linr mnnr motn N
Accusative audible adjective addressee adverb Agent animate antipassive association abstract collective conjunction determiner definite degree adjectival domicile dummy deprived direction Dative finite fore goal human location linear manner motion noun
ndpd Nom ntrg ntrr P past pasv PAT plrl posn prdc prnn prpr prsn prtr ptnl qlty rfrn rtel sore spkr srfc tope trmn trns V vrtc xlry xpls
independent Nominative (subject) interrogative interior preposition past passive patient plural possession predicate pronoun proper person operator potential quality referential article source speaker surface topic terminus transitive verb vertical auxiliary expulsion
Preface a n d a c k n o w l e d g e m e n t s
The roots of lexicase go back to a class handout prepared for a syntax course at the University of Hawaii in 1970. Since that time, work has been done in the lexicase framework on the morphology, syntax, lexicon, and/or referential and intensional semantics of more than forty-five different languages, some of it on English but most of it on non-Indoeuropean languages of Asia and the Pacific Basin. The current list of lexicase references runs to ten pages of articles, dissertations, and working papers (cf. lexicase references 1985). Some of the lexicase dissertations have come out as books, and there is a book on the philosophical background of lexicase theory (Starosta 1987). However, in the seventeen years since that first handout, no book has been written to summarize and codify the lexicase theory in a form which would make it accessible to students and professionals outside Hawaii. This volume is intended to be such a book. Although the bulk of lexicase descriptive work over the years has been done on South, Southeast, and East Asian and Pacific languages, most of the illustrative examples for this volume will be taken from English to make the book more generally useful. Examples from other languages, especially nonIndoeuropean ones, will be introduced when this is necessary to provide evidence for an analysis which is difficult to motivate on the basis of English evidence alone, or when English has no instances of the construction in question. This work draws substantially on material previously available only in semi-published form, especially 'Affix hobbling' (Starosta 1977), T h e end of phrase structure as we know it' (Starosta 1985e), 'Lexicase and Japanese language processing' (Starosta and Nomura 1984), and on several long reports by Louise Pagotto (Pagotto 1985a, Pagotto 1985b) and by Pagotto and Starosta (1985a, 1985b, 1986), and also to some extent on 'Lexicase parsing: a lexicon-driven approach to syntactic analysis' (Starosta and Nomura 1986). I would like to dedicate this book to my past, present, and future students, and especially to those adventurous individuals who have done lexicase dissertations with me. I hope they derived as much benefit out of their experiences as I did, and as the lexicase enterprise did. Although there is only one person listed as the author of this work, each of these students has contributed to its completion, and I would hereby like to gratefully acknowledge all their insights, inspirations, criticisms, encouragement, and hard work. Finally, apropos of hard work, my special gratitude goes to two people who contributed more materially to the completion of this volume: Louise Pagotto
Xll
PREFACE AND ACKNOWLEDGEMENTS
and Nitalu Sroka, who spent long hours at their keyboards completing the reference section and the graphics respectively after I had flown off for a summer of relaxation and linguistics on the mainland. Mahalo nui loa!
1. F o r m a l p r o p e r t i e s o f l e x i c a s e t h e o r y
1.1 ATTRIBUTES Lexicase is part of the generative grammar tradition, with its name derived from Chomsky's lexicalist hypothesis (Chomsky 1970) and Fillmore's Case Grammar (Fillmore 1968). It has also been strongly influenced by European grammatical theories, especially the localistic case grammar and dependency approaches of John Anderson (cf. Anderson 1971), his recent and classical predecessors and the word-oriented dependency approaches of Richard Hudson (cf. Hudson 1976, Hudson 1984). Like Chomskyan generative grammar, it is an attempt to provide a psychologically valid description of the linguistic competence of a native speaker. It can be described in a nutshell as a panlexicalist monostratal dependency variety of generative localistic case grammar, as sketched below: 1.1.1 Panlexicalist Lexicase is intended to be a pan-lexicalist theory of grammatical structure (cf. Hudson 1979a, 1979c: 11,19), that is, the rules of lexicase grammar proper are lexical rules, rules that express relations among lexical items and among features within lexical entries. Sentences are seen as sequences of words linked to each other by hierarchical dependency relations. To take an analogy from chemistry, each word is like an atom with its own valence1, an inherent potential for external bonding to zero or more other atoms. A sentence then is like a molecule, a configuration of atomic words each of which has all of its valence bondings satisfied. This means that a lexicon by itself generates the set of grammatically well-formed sentences in a language: each word is marked with contextual features which can be seen as well-formedness conditions on trees, and a well-formed sentence is any configuration of words for which all of these well-formedness conditions are satisfied. For example, a singular common count noun would be marked for the contextual feature [+[+Det] ], indicating that the noun must cooccur with a determiner as a dependent sister, and any tree which failed to satisfy this requirement would be flagged as ungramrhatical. Consequently, a fully specified lexicon is itself a grammar, even if it is not associated with a single grammatical rule. A grammar composed only of a list of lexical entries achieves a kind of descriptive adequacy in the sense of Chomsky's Standard Theory (Chomsky 1965: 24), in the sense that it generates the sentences of a language with their
2
FORMAL PROPERTIES OF LEXICASE THEORY
associated structural descriptions. However, it is a rather unsatisfying kind of grammar for people accustomed to thinking of a grammar, or of any scientific theory, as minimally consisting of a set of generalizations. To create a grammar in this more conventional sense from a 'lexicon grammar', it is only necessary to make a list of the generalizations that can be extracted from the internal regularities among contextual and non-contextual features within items and from the inter-item relationships among lexical entries (cf. Starosta and Nomura 1984: 5). This is in fact the approach which is taken in the lexicase framework. A conventional lexicase grammar then is a grammar of words. It is a set of generalizations about the internal compositions, external distributions, and lexical interrelationships of the words in the language. Proceeding from this basic idea, it is possible to construct a formal and explicit grammatical framework of limited generative power which is capable of stating languagespecific and universal generalizations in a natural way. There are no rules in this framework for constructing or modifying trees, since ('surface') trees are generated directly by the lexicon: the structural representation of a sentence is any sequence of words connected by lines in a way which satisfies the contextual features of all the words and does not violate the Sisterhead or One-bar Constraints or the conventions for constructing well-formed trees. There are also no rules which relate sentences to each other by mapping one sentence representation onto another, or by deriving both from a common underlying representation. Instead, two sentences are related to each other to the extent that they share common lexical entries standing in identical or analogically corresponding case or dependency relations to one another. Regular patterns of correspondence, such as the symmetric selectional relationships which originally motivated the passive transformation, are stated in terms of lexical derivation rules: the relationship between passive sentences and active sentences is formalized in terms of a lexical rule which derives the head word of a passive clause from the head word of the corresponding active clause. This rule matches the Agent of the active verb with the Means actant of the passive verb, and thereby formally establishes the connection between the subject of the active clause and the byphrase of the passive clause without any need for a transformation. 1.1.2 Monostratal Because lexicase accounts for the systematic relationships among words in sentences by means of lexical rules rather than transformations, a lexicase grammar has only one level of representation (the surface level). Thus it differs from the various Chomskyan grammatical frameworks in power, since there is no distinct deep structure and no transformational rules to relate two levels. This means that it is not possible in a lexicase description to create some arbitrary and perhaps linguistically unmotivated representation for a sentence and then map it onto the actual words by means of transformations; lexicase is less powerful because there is a much smaller class of structural descriptions which can be assigned to a sentence, and therefore a much smaller class of
FORMAL PROPERTIES OF LEXICASE THEORY
3
grammars which can be associated with human language. In this as in some other respects it is similar to Generalized Phrase Structure Grammar (GPSG), which also adopts the view that grammars should be monostratal: 'our work . . . has convinced us that the ready acceptance of multistratal syntactic descriptions in earlier generative linguistics was thoroughly undermotivated. The existing corpus of work in GPSG shows that highly revealing systematizations of complex linguistic phenomena can be achieved with the restrictive framework that we adhere to' (Gazdar, Klein, Pullum, and Sag 1985: 10-11).
1.1.3 Dependency Lexicase is also a type of dependency grammar, a system of grammatical representation which, like valency, originated with Lucien Tesniere (cf. Tesniere 1959). Dependency grammar has been implemented in the generative framework by linguists such as David Hays (cf. Hays 1964), Jane Robinson (cf. Robinson 1970) and John Anderson (cf. Anderson 1971), and has been used for example in the analysis of Japanese syntax by Shinkichi Hashimoto (cf. Hashimoto 1934, 1959) and Hirosato Nomura (Starosta and Nomura 1984). Lexicase dependency tree notation derives from John Anderson's dependency case grammar (Anderson 1971) and Engel's dependency valence grammar (Engel 1977), and thus ultimately from the work of Tesniere, while some of the terminology and constraints are adapted from Chomsky's X-bar theory.
1.1.4 Generative Lexicase is generative grammar in the traditional Chomskyan sense. Generative grammar is simply the result of Chomsky's proposal to apply the hypothetico-deductive paradigm of the physical sciences to the study of language. As such, it views grammars as theories which can be falsified on the basis of a confrontation between explicit characterizations (generative grammars) of speakers' linguistic knowledge and the reactions of speakers to predictions made by these grammars. Lexicase shares with the Chomskyan tradition a conception of a grammar as a formal system that characterizes (generates) the infinite set of sentences of a language by describing that part of a speaker's knowledge (linguistic competence) which enables him or her to recognize structurally well-formed sentences of his or her language. That is, it requires that a grammar have psychological reality, but only to the extent of accounting for a speaker's passive knowledge of his linguistic system (competence), not his use of that system (performance). Lexicase also follows Chomsky in viewing the task of writing grammars as a means to the end of constructing a universal theory of innate human linguistic knowledge. It differs from Chomsky's current practice however in that it takes 'generativity' seriously in actually requiring grammatical rules and representations to be expressed formally and explicitly.
4
FORMAL PROPERTIES OF LEXICASE THEORY
1.1.5 Case Lexicase is case grammar in the Fillmorean tradition. It analyzes every nominal constituent (except for predicate nominatives) as bearing a syntacticsemantic relationship to its regent. However, it has evolved away from other Fillmorean case approaches to some extent in its feature-based formalization, in its emphasis on syntactic over semantic criteria for the definition and identification of case relations, and in the requirement that every verb have a Patient in its case frame. 1.2
DOMAIN
1.2.1 Competence and performance I am assuming with Chomsky that a generative grammar is an attempt to represent the speaker's linguistic competence, all the information which he must have access to in order to distinguish between well-formed and illformed sentences of the language. That is, a grammar is a model of something which is inside a speaker's head; it is intended to have psychological reality. This assumption raises several questions. One is of course the old mindbody problem: is mind distinct from body? From the fact that I used the phrase 'inside a speaker's head' in the preceding paragraph, the reader can perhaps guess where my personal sentiments lie: I see no justification for positing a 'mind' as distinct from 'body' if'mind', like 'soul', is intended to be an entity with no physical reality. (Perhaps my physics apprenticeship is showing through here.) If I use the term 'mind' in the present work, it is rather a term for a system of electrochemical linkages and processes in the brain. 1.2.2 Language, situation, and semantics One of the most difficult questions to be decided by any syntactic theory is that of the boundary (if any) between syntax and semantics. Lexicase draws this boundary between syntax-cum-intensional-semantics on the one hand and situational semantics and pragmatics on the other. Lexicase is grammar, by which I mean that it characterizes the properties of words and the sentences in which they occur, not of the real-world situations to which they may correspond. This does not mean that lexicase renounces meaning, but rather that it is concerned with the meaning directly signaled by the sentence itself, meaning which is characterized by the words, dependencies, and coreference relationships symbolized in the single-level lexicase representation. As stated by Gazdar et al. (Gazdar, Klein, Pullum, and Sag 1985: 10): In a very real sense, the syntactic representations we construct are their own 'logical forms'. Insofar as there are structures defined by our syntax to which no meaning is assigned under the semantics we specify, we claim that those structures describe wellformed sentences that do not mean anything coherent, not that grammaticality is defined by reference to the overall predictions of the syntax and semantics combined.
FORMAL PROPERTIES OF LEXICASE THEORY
5
The result of this minimalist approach is that lexicase grammatical representations tend to be quite spare and semantically unproductive. Consequently, colleagues with more baroque tastes have accused me of pushing all the interesting matter under the semantic rug. They contend that by treating syntax and semantics together, the overall system will be simpler and more compact, while treating them separately as I have will result in two systems which add up to much more machinery. Of course, that speculation may ultimately turn out to be true, but the decline of generative semantics, which made exactly that assumption, leads me to believe that it will not. The decision to separate situational and linguistic semantics is an empirical hypothesis about proper scientific domains. If carving nature at this particular joint produces new generalizations in both of the sundered subdomains, as it has recently begun to do, then lexicase was right; and if doubters can resurrect generative semantics and make it generative enough to compare with lexicase, then we can find out which approach produces better generalizations. The study of semantics in generative grammar has generally based itself on truth values; two sentences have the same meaning, and hence the same semantic representation, if they have the same truth values.2 Yet we know that truth values are only relevant to declarative sentences (cf. Gazdar, Klein, Pullum, and Sag 1985: 7). How can we give a truth-value definition of the meaning of a command, a question,3 a speculation, or a statement about a logically impossible situation?4 Nevertheless these all have meaning for the speakers, and I know of no linguistic reason to believe that this kind of meaning is any different from the kind of meaning that is associated with declarative sentences. It is this broader kind of meaning which must be the subject matter of linguistic semantics, not some artificial and arbitrary system which has only a partial overlap with the information content of natural language. I will try to show especially in Chapter 4 sections 4.2 and 4.5 that what language actually encodes is not particular situations, but rather speakers' perceptions of real or imagined situations (cf. Grace forthcoming;), which I will refer to (following Fillmore) as PERSPECTIVE. As philosophers learned long ago, natural language is not entirely logical, and since lexicase is an attempt to represent natural language, it should not be surprising that the linguistic meaning of a sentence as characterized by a lexicase grammar may sometimes not match the meaning assigned by logicians on the basis of truth values. It is therefore possible and normal for two sentences with the same truth values to have different lexicase analyses and thus different linguistic meanings, for example (la) and (lb), or for a sentence such as (2) with a single lexicase representation and thus a single linguistic meaning to have two distinct interpretations, which could be paraphrased as either (2a) or (2b). This constitutes a claim that logic (in both the mathematical and pre-scientific intuitive senses) has no favored place in human linguistic competence as such, and that the ability to make logical inferences or recognize two sentences as having the same or different truth values is independent of the ability to recognize well-formed sentences in a given language (which is of course why formal logic was devised in the first place).
6
FORMAL PROPERTIES OF LEXICASE THEORY
(la)
Mary gave the strudel to John.
(lb)
Mary gave John the s t r u d e l .
(2)
Mary doesn't think John l i k e s h i s coffee black.
(2a)
Mary t h i n k s that John d o e s n ' t l i k e h i s coffee black.
(2b)
It i s not the case that Mary t h i n k s that John l i k e s h i s coffee black.
1.2.3 Language and society Although lexicase grammars are grammars of competence, the linguistic competence characterized by a lexicase grammar is individual rather than group knowledge; a strict lexicase grammar is the grammar of an idiolect. It thus has a much more intimate and immediate psychological reality than integrated linguistic descriptions which profess to describe simultaneously the competence of some arbitrarily chosen group of speakers. Although actual lexicase practice has frequently deviated from this precept, strictly speaking it is not possible to write a psychologically real grammar of a whole language such as English, or of any subvariety of it which has more than one speaker. Where would we look to find the integrated physical system corresponding to, say, 'the grammar of English? If a grammar must be a model of some real isomorphous physical system, as assumed by lexicase, then only the grammar of an idiolect located in the brain of a single human being is able to characterize a physically specifiable configuration. If we are looking for physical linguistic reality, we have to turn to the individual native speaker. Where the speaker is, there also is language. Thus lexicase, as a linguistic paradigm which views language as a physical phenomenon, should be focusing on some physical part of an individual speaker, arid neurological evidence tells us that the part we want is the human brain.5 Only the linguistic competence of the individual is psychologically (and ultimately physically) real. 'English' and 'Chinese' are abstractions over ranges of the grammars of arbitrarily selected individual speakers, and as a consequence, a description of 'English' 'or 'Chinese' is going to be just as arbitrary as the process by which the exemplary speakers of 'English' or 'Chinese' were selected for study. Just to take the question of dialects, for example: which social, regional, and national dialects should be covered in a grammar of English? Is there any non-capricious way of deciding this? Is there any single human being who has internalized all and only the dialects we decide to include? It seems to me that the way to avoid this arbitrariness is to describe the individual's knowledge. The individual is where the next level of linguistic explanation is going to connect up, and an arbitrary non-physical
FORMAL PROPERTIES OF LEXICASE THEORY
7
intermediate level between individual competence and its physiological—and social—bases will merely get in the way. In this respect, lexicase differs from both Chomskyan grammar and most sociolinguistic approaches (but not all; see Hudson 1980c: 183). The philosophical issue involved is the nature of the reality modeled by a theory. In physics, for example, mass and thermal energy inhere in physical objects, and have no existence apart from them. However, what is the locus of 'the grammar of English5? For Chomskyan grammar, if the question makes sense at all, the answer seems to be that the locus of grammar is the ideal native speaker, while for sociolinguists it must be the community of speakers. But where would we go to look for an ideal native speaker? Whose brain contains all the linguistic information characterized by a 'grammar of English' and no other linguistic information? If it has no physical locus, I maintain that it has no physical reality, and thus is not a legitimate object of study for a hypothetico-deductive science. Richard Hudson has a very nice discussion of this point (Hudson 1984: 31-3) from which I will only quote an excerpt here: Language is a property of the individual . . . the facts only 'exist' in the minds of individuals . . . the place where you look for the data of linguistics is in the individual human being . . . it would be quite wrong to build a general theory of language on the assumption that standard languages are typical . . . Larger aggregates, whether linguistic ('language X', 'dialect X', 'register X') or social ('speech community X') are popular fictions . . . In relation to the goals of transformational linguistics . . . What I reject . . . is the apparent attempt to have it both ways, by preserving the belief that language is a property of the community . .. individuals have minds, but communities do not... Accordingly, it makes no sense to advocate a mentalist approach to language if language is a property of a community. If we replace Hudson's 'mind' with 'brain', then I concur completely. Note that I am certainly not objecting here to all idealizations per se. They are after all ubiquitous and necessary in physics, which I take as the prototype and paragon of the hypothetico-deductive sciences. However, the idealizations employed in physics, such as 'ideal black body' or 'ideal gas', still refer to real invariant properties of real physical objects, which have a locus in space and time. Thus they differ from disembodied idealizations such as 'speech community' and 'ideal speaker-hearer', which do not.
1.3 CONSTRAINTS 1.3.1 Why constraints? Why does a grammar need constraints? Constraints are Good, because they are the content of a theory. An unconstrained theory implies that all things are possible. Such a theory can never be falsified by observations, and thus has no empirical content. A constrained theory on the other hand asserts that certain things are impossible, and can in principle be falsified by finding an instance of the supposedly impossible phenomenon; because a constrained theory can be falsified, then, it has empirical content. The more constraints we can
8
FORMAL PROPERTIES OF LEXICASE THEORY
successfully impose, the more narrowly we can circumscribe the invisible system we are modeling, and the more clearly we are able to perceive its true form. For these reasons, it is not necessary in an individual case to show that the constrained theory leads to a superior solution to a given problem of analysis (although in a number of instances it has), but only to show that it is a real constraint, that is, that it excludes some class of possible grammars without adding any new grammars to the set, and that the analysis it provides is just as good as the ones allowed by more powerful frameworks. If this is so, then the constrained solution is preferred at the level of explanatory adequacy, since it is the one selected by the metatheory which more narrowly characterizes the phenomenon under investigation. Any additional benefits of this constraint, such as those listed in connection with the analysis of gerunds and VPs below, are frosting on the cake, adding additional confirmatory evidence for the empirical validity of the constraint. In order to refute the imposition of a constraint, on the other hand, it would be necessary to show not merely that it is possible to construct some analysis which is compatible with, say, Jackendoffs three-bar notation (a fact which should come as a surprise to no one, given the overwhelming power of this notation), but that the added power must be invoked in order to capture some genuine syntactic generalization which cannot be accommodated by the constrained version of the theory. In order to reduce the power of a grammatical framework, it is necessary to constrain it so that the number of possible grammars compatible with the theory is decreased, and the probability of falsifying the theory is thereby correspondingly increased. In recent years, a number of revisions have been made in Chomsky's original 'Standard Theory', and several alternative approaches to generative grammar have been proposed. Part of the motivation for some of the innovations has been the attempt to reduce the power of the model to the point that it could be held to make interesting claims about the nature of human language. One of the most radical of these proposed variants of the Standard Theory is lexicase. The lexicase approach to the problem of excessive power, and the approach to be followed in this book, has been one of attrition. Rather than attempting to create a whole new metatheory from the ground up, lexicase started off with a ChomskyFillmore basic framework, then removed various excessively powerful mechanisms one at a time and tried to see how far the theory could get with what was left. Since the set of possible grammars generated by the revised metatheory at each stage was a proper subset of the set generated by the previous version, the result of the process was a progressive decrease in the power of the metatheory. At each stage, the result was tested in the description of a different natural language, to ensure that it was not so weak that it was unable to provide an adequate account of the grammar of the language being investigated. In the following paragraphs, I will describe the main constraints that have been imposed in this way on lexicase grammars since 1970. All of the constraints proposed here are of course tentative hypotheses, not arbitrary decrees. They are confirmed to the extent that the grammars and articles
FORMAL PROPERTIES OF LEXICASE THEORY
9
incorporating them have succeeded in describing observed linguistic generalizations, and they can be falsified if it can be demonstrated that there are general statements which clearly belong in an account of grammatical competence but which cannot be accommodated within a grammar incorporating the constraints. Among the constraints which have been imposed on the initial framework to produce lexicase are the the following: (1) Distinct underlying representations and transformations are disallowed. (2) Phrase Structure Rules are eliminated as redundant. (3) Five of Pullum's seven MOLLUSC conditions on X-bar representations (Pullum 1985; see 1.3.2.2) are maintained, one of them in a more stringent form than the original version. (4) Nodes in a tree do not bear features, and node labels may be omitted as redundant. (5) The domain of grammatical relatedness is limited to sisterheads, that is, to the heads of the Comps of X in X-bar terminology. (6) The inventory of major lexical classes (parts of speech) is limited to a small set of atomic categories. (7) All lexical features are binary, and rule features and double contextual features are eliminated. These constraints will now be considered individually under three headings: constraints on possible grammatical representations, constraints on possible lexical features, and constraints on possible rules of grammar. 1.3.2 Constraints on grammatical representations 1.3.2.1 Deep structures and strata In a lexicase grammar, there is no distinction between deep and surface structures, levels, strata, or other simultaneous levels of syntactic representation. This constraint is the most radical of the nine discussed in this paper, and is sufficient to rule out almost all of the 'generative' grammars written between 1957 and 1975. As an example, if multistratal representations are allowed in a grammar, any of the following seven underlying structures could be assigned to the sentence 'John loves Mary' without violating any general metatheoretical principles (see Figure l.l). 6 Since the beginning of transformational grammar there have been some minimal constraints on possible phrase structure trees, such as the requirement that lines cannot cross, but recently even this limitation has been weakened to allow discontinuous constituents (Aoun 1979, cited in Chomsky 1981: 128; cf. O'Grady 1987), so that even trees such as the example in Figure 1.2 and its notational equivalents would be allowed. What this means is that the metatheory in effect claims that such a simple N - V - N sentence conceivably could be ambiguous in eight ways in some language. Actually of
Verb-initial (3) a)
loves
loves
Abstract verb,
Abstract verb,
SasNP c)
S as NP S
d)
S
PRES
x love
Performative analysis e)
V
I
DECLARE
NP
NP
NP
N
N
S
I
I
I I
you
I
V love
NP
NP
N
N
John
Mary
I
i
I
i
iNP \
NP
N
N
John
Mary
i
i
FORMAL PROPERTIES OF LEXICASE THEORY h)
11
Discontinuous VP
loves
John
Mary
Figure 1.2 course it could be ambiguous in more than eight ways, since (a)-(h) do not begin to exhaust the possible underlying representations which could be assigned to this sentence by a theory which allows a deep-surface distinction and relatively unconstrained phrase structure representations and transformations. There has always been at least an unspoken assumption that such vacuous ambiguity is not acceptable, and that a grammar should provide one and only one analysis for each sentence with a single reading. However, all multistratal frameworks do allow such multiple vacuous representations. That is what all the grammatical arguing is about. It is because of this failure to constrain the theories, and of the equally serious failure to write formal rules, that transformational grammarians and adherents of other multistratal theories are forced to justify their analyses by argumentation rather than by empirical testing. I contend that any linguistic metatheory which allows so many different possibilities and offers no guidelines to decide between them has failed in its task to delimit the form of a possible grammar. In order for a linguistic theory to make a testable claim about the nature of human language, it needs to be constrained so that only one analysis is possible for one sentence. That single analysis is the prediction that needs to be tested and if possible disconfirmed, and is the measure by which the theory stands or falls. By rejecting analyses with more than one level or stratum, lexicase has made a major step in the direction of a constrained theory. GPSG has imposed the same constraint, and other frameworks, such as Lexical Functional Grammar and Government and Binding Theory, are finally moving in that direction. 1.3.2.2 X-bar constraints X-bar theory is a set of conditions on Phrase Structure representations which was proposed by Chomsky in 'Remarks on nominalizations5 (Chomsky 1970), and developed in later works, especially Jackendoff 1977 and Chomsky 1981. The prime motivations were (i) to provide a means of systematically representing the notion 'head of a construction' in PS rules and representations, (ii) to limit the class of possible base systems (Chomsky 1981: 11), and (iii) to facilitate the capture of cross-categorial generalizations.
12
FORMAL PROPERTIES OF LEXICASE THEORY
However, in spite of frequent references in Chomskyan linguistic literature to 'X-bar theory' as if it were a coherent set of principles, in practice the principles remain programmatic rather than precise, and seem to be honored primarily in the breach.7 Geoffrey Pullum (Pullum 1985) has gone through the literature attempting to extract what Chomskyan linguists might mean by X-bar theory, and has educed the following 'MOLLUSC conditions' defining a prototypical X-bar theory, which I will take as my point of reference in the following discussion: Maximality Optionality: Lexicality: Lexical disjointness: Uniformity: Succession: Centrality:
every non-head daughter in a rule is a maximal projection. every non-head daughter in a rule is optional. all nonterminals are projected from preterminals. ifX° -* t and Y° -* t ' are both in R, andX¥"Yy then t£t'. the maximum possible bar level is the same for every preterminal. the bar level of a head is one less than the bar level of its mother; * k - . . . Xk~x the start symbol must be the maximal projection of some preterminal.
Optionality: every non-head daughter in a rule is optional. 'Optionality' is an expression of the classic definition of 'head' in Phrase Structure grammar (cf. Wilson 1966): the head of a construction is an obligatory term on the right side of the arrow in a Phrase Structure rule. Unlike conventional versions of dependency grammar, however, lexicase does not require that every construction have a single head, just as Phrase Structure grammar has no requirement that there be only one obligatory element in the expansion of a given category. Rather, it distinguishes between endocentric constructions, those that have a single obligatory term, and exocentric constructions, those that have more than one. An obligatory element which is a lexical category (X°) is a LEXICAL HEAD, and one which is a phrase (X') is a PHRASAL HEAD. TWO terms which are heads of the same construction are CO-HEADS. TO date, two kinds of 'exocentric' constructions have shown up in lexicase grammars: prepositional phrases (including complementized embedded sentences; see 6.2) and coordinate constructions (see 6.4.) The lexical leaf constraint: all terminal nodes are words. In classical Phrase Structure grammars, the entities introduced by a PS rule could be almost anything: morphemes, words, word classes, phrases, dummy symbols, or even bare matrices. In more recent GB analyses, PS representations still allow dummies ('empty categories' such as / and PRO) and sublexical morphemes such as Tense and AGR. The lexicon in a lexicase grammar on the other hand consists not of morphemes (Halle 1973) but of actually occurring words (cf. Aronoff 1976: 46, Hudson 1979c: 19) and possibly also of stems, i.e. words minus inflectional affixes (but cf. Ladefoged 1972 for some preliminary neurolinguistic evidence suggesting that this may
FORMAL PROPERTIES OF LEXICASE THEORY
13
not be the case), and only full actually occurring words can be 'inserted' in trees. Consequently, lexicase syntactic representations are hierarchically structured sequences of words, and all terminal nodes in the syntactic representation of a sentence are required to be words, not strings of disembodied morphemes (cf. Gazdar, Klein, Pullum, and Sag 1985: 104). By allowing sublexical morphemes to appear as separate nodes in a tree, Chomskyan linguistics abolished the structuralist distinction between morphology and syntax, and in requiring all terminal nodes to be words, lexicase has reintroduced it. The associated claim is that the morphological structure of words is irrelevant to the syntax. We could replace every word form, including inflected and derived forms, by an arbitrary sequence of Is and 0s, keeping the associated lexical matrix, then apply the rules of the grammar, list the generated sentences, and then put the word forms back in, and we would still have the correct sentences; information about internal morphological structure is not needed in identifying the set of well-formed sentences (though it may of course be useful in decoding them), and normally the speaker is not aware of internal morphological structure, and does not need to be. Since all morphemes in a lexicase representation are already combined into words, there are no 'percolation rules', 'Spell-Out Rules' (cf. Akmajian and Heny 1975: 116) or 'readjustment rules' (Aronoff 1976: 5) outside the lexicon which put morphemes together to form words. In fact, as we shall see, a lexicase core grammar is 'pan-lexicalist' in Hudson's sense (Hudson 1979a, Hudson 1979b; cf. Chapter 2), since there are no syntactic rules outside the lexicon at all. As a consequence, there can be no sub-lexical T (Tense; cf. Robinson 1970: 63) or INFL (Chomsky 1981, Stowell 1981) constituents acting as the head of the whole sentence, and no phonetically null nodes such as the 0 node proposed by Anderson (cf. e.g. Anderson 1971: 43), placeholding As, or the 'empty categories' PRO or t proposed in the Government, and Binding framework. The lexicase representation thus sticks quite close to the lexical ground, accepting as possible grammatical statements only those which can be predicated of the actual strings of lexical items which constitute the atoms of the sentence. Again, this constraint plus the constraints of earlier sections limit the class of possible grammars by excluding otherwise plausible analyses and deciding in favor of equally plausible analyses formulatable within the constrained lexicase framework. For example, it rules out such familiar-looking structural descriptions as (4), since Pres is not a word (see p. 15). In a lexicase representation, the structural analysis of this sentence would be a monostratal tree such as (5) with the tense-inflected verb loves as a terminal node. To invalidate this constraint, it would be necessary to demonstrate that syntactic rules must apply to morphemes represented as nodes in a tree rather than as lexical features. As an example of such a rule which might be supposed to apply to morphemes rather than words, consider article-nounadjective agreement in a language such as Spanish, where Postal (Postal 1964: 47-8) assumed that an affix construction dominated by the head noun and dominating sublexical gender and number constituents is copied under
14
FORMAL PROPERTIES OF LEXICASE THEORY
Article and Adjective nodes. However, as Chomsky proposed in Aspects (Chomsky 1965: 170ff.), agreement can be accounted for quite handily in terms of features rather than nodes, and a constrained version of this feature approach is in fact the one adopted in lexicase. The Affix Hopping and spellout rules which would be required in a generative version of Government and Binding theory in order to detach TENSE from INFL and incorporate it into the verb is also rejected by this constraint. In this case there is no lexicase counterpart to such rules, since tense is never separated from the finite verb in the first place. The one-bar constraint: Each and every construction (including the sentence) has at least one immediate lexical head, and every terminal node is the head of its own construction. In X-bar terms, this means that only single-bar phrase designations are possible node labels in a lexicase tree, that every terminal node has its own one-bar projection, and that every non-terminal node is an X ' which is a maximal projection of its head X. This requirement incorporates Pullum's LEXIGALITY, and entails UNIFORMITY (all projections have one bar), SUCCESSION
(all non-terminals have one bar and their lexical heads have none), and CENTRALITY (S must also be a projection, like every other non-terminal node). It also assumes MAXIMALITY: since no item ever occurs without a projection, and since every projection is maximal, every non-terminal node will be a maximal projection. Every node X under another node Y will be either a lexical item, in which case it must be the lexical head of Y, or it will be a phrase node, in which case it will by definition be a maximal projection. The one-bar constraint being proposed here is intended to rule out X-double-bar constructions, and all constructions which do not include an immediate lexical head (see 6). In other words, no lexicase grammar can contain a construction in which one or more non-lexical nodes intervene between a phrasal node and its lexical head or heads. Thus the construction whose lexical head is a verb (traditionally S) can only be a V ' (V-bar), not V " or even V m as it would be according to Jackendoff s 'Uniform Three-level Hypothesis' (Jackendoff 1977: 36) . Similarly, a construction whose lexical head is a noun (traditionally NP) must be an N ' , not an N * or N m9 while the construction whose lexical head is a preposition (traditionally PP) will be a P ' , etc. Another consequence is that a topic-comment construction cannot be structurally binary (Chomsky 1977: 91, Radford 1981: 105), for example (7), since this is a double violation of the constraint. Imposing the one-bar constraint eliminates the VP node and any other node such as Predicate-Phrase which intervenes between the V and its commanding S (see (8)). This structure violates the one-bar constraint, as can be seen by comparing a conventional representation of The steed bit her master, Figure 1.3, with a conventional X-bar transposition, Figure 1.4. To meet the one-bar limitation, (8) would have to be replaced by something like (8b) (see Figure 1.5). This structure in an X-bar notation would satisfy the one-bar limitation, as shown in (8c), since no constituent label has more than one bar (see Figure 1.6).
FORMAL PROPERTIES OF LEXICASE THEORY
15
S
(5) NP
V
NP
1
1
1 loves
N 1
1
N 1 1 Mary
1
John
(7)
TOP
S'
COMP Comp, X
(8)
Predicate-Phrase
NP
I
Oet
N
the
steed
I
I
VP V I bit
NP
/
Oet
I
Figure 1.3
her
\
N
I
master
THE NP-VP SPLIT: The VPless one-bar analysis required by lexicase constraints has important metatheoretical advantages over the Chomskyan S - NP VP analysis: (i) If we don't distinguish between VP and S, we also don't need to distinguish between constituency and dependency. In a lexicase analysis, that is, the subject is both a sister and a dependent of the verb, whereas in a Chomskyan analysis it is dependent on the verb (it receives its thetarelation from the verb, for example) but not a co-constituent of the verb.
16
FORMAL PROPERTIES OF LEXICASE THEORY (8a) Spec7v'
/
Spec, N the
\
V
N steed
N*
V Spec.N
N
her
master
i
steed
her
i
master
Figure 1.6
(ii) Because the subject is a Comp of the V, lexicase does not need to allow for exceptions to the sisterhead constraint: only Comps are relevant to the subcategorization of the verb; there are no 'external' arguments (Williams 1981). (iii) Since there is no VP, a lexicase grammar never has to posit discontinuous VPs, and the metatheoretical ban against lines crossing in a tree can be made absolute. No special coindexing mechanisms (Aoun 1979, cited in Chomsky 1981: 128), underground cables (O'Grady 1987), or 'scrambling' transformations (Saito 1985) are needed to account for VSO and 'nonconfigurational' languages, since all languages are 'non-configurationaP. (iv) Without a VP, there is no need to posit an abstract INFL to do all the things to the subject that the verb would have done if it were the head of the sentence. The verb is the head of the sentence, and so can agree directly with the verb, assign Case to it, and so on without positing otherwise unmotivated conventions and categories such as INFL and AGR.
FORMAL PROPERTIES OF LEXICASE THEORY
17
(v) Positing a VP in the GB framework forces the loss of generalizations at every turn (Starosta 1977; Starosta 1985f). Anyone who cares to can verify this claim by (a) going carefully through Chomsky's 'Lectures on Government and Binding5 (Chomsky 1981), including the footnotes, and making a list of the conventions, constraints, principles, and constituents, including the INFL node, which owe their existence solely to the NP-VP split and could be eliminated without it; and then (b) counting the number of generalizations for which an exception must be stated solely because the subject is not the sister of the verb. Almost all of these onerous difficulties vanish when the S-VP distinction is eliminated and the subject becomes the sister of the verb. Some other current syntactic frameworks outside the Chomskyan tradition are also edging away from the VP vs. S distinction. In GPSG for example, 'VP and S can be analyzed as having the same bar level but differing in a feature that corresponds to the property of containing a subject' (Pullum 1985: 26). All the non-theory-specific advantages Pullum cites for this rapprochement also accrue to a theory such as lexicase and Hudson's Word Grammar which go one step farther and eliminate the distinction completely. Similarly, in Lexical-Functional Grammar, VP is just an idler in c-structure (as shown by the invariant ( t s l ) equation), and as Wedekind has shown (Wedekind 1986: 487, 488) it plays no role at all in f-structure. THE N ' ANALYSIS: A noun phrase such as the fanatic lexicalist can be analyzed in terms of PSRs and conventional X-bar grammar as including a NOM or N ' node between the NP and the N (see (a) in Figure 1.7). In a lexicase grammar, however, because of the one-bar constraint, there can be no node between the NP and the head noun, so only a one-bar analysis such as the (b) in Figure 1.7 is possible. It has been suggested (cf. Radford 1981: 104 and the references cited there, especially Hornstein and Lightfoot 1981) that (i) such intermediate nodes are required, and that (ii) because there may be more than one Adj preceding a noun, we must give up the Succession constraint and allow N ' -* . . . N ' . . . recursion. This analysis is based on the assumption that sequences such as the big bad wolf have the structure [NP Det [N- ty [N- bad wolf] ] ]. The primary evidence for this assumption is the fact that one can substitute either for bad wolf or for big bad wolf. At the end of the chapter, however, Radford presents some exercises which show that the onesubstitution test gives contradictory results, and which in fact call into
I OT
/ \ AH I
ART
ADJ
the
fanatic lexicalist
I
N
the
M Figure 1.7
I
I
fanfltic
I
I I
,exicaIist
18
FORMAL PROPERTIES OF LEXICASE THEORY
question the use of any such substitution test as a means to determine constituency. The other type of evidence adduced for an N ' analysis is coordinatability, but this is subject to an alternate explanation in terms of gapping (see 6.4), so the case for N ' is not very strong. In the lexicase analysis of NPs, there can be no N ' distinct from NP. The one -substitution data can be accounted for by analyzing one as an anaphoric noun which allows no inner (subcategorizing) attributes, and the appearance of more than one Adj before a noun is iteration rather than recursion. No special relaxation of Succession is required to account for this. Part of the rule stating the allowable attributes will look something like RR-22, and if nothing further is said, the rule automatically allows any number of adjectives to precede the noun. RR-22.
[+N]
—>
*U+Adj]) -
[+Adj]
GERUNDS: T O account for the externally nominal and internally clausal properties of gerunds, transformational grammarians have often proposed analyses which assume rules such as PSR-3 (cf. Horn 1975: 338), and Schachter's NOM - VP rule (Schachter 1976b: 206) fits the same pattern. Such rules can be accommodated as one ofJackendoffs allowed exceptions to Succession (Jackendoff 1977: 34, 235 cited in Pullum 1985: 10, 24-5) covering rules of the form X* -+ tY* where X / Y and t is a grammatical formative. Pullum cites the rule shown here as PSR-4 for gerunds as an example (ibid., p. 24). However, none of these analyses is allowed in a lexicase approach. Instead, gerunds must be treated as derived nouns, just like Chomsky's 'derived' nominalizations.8 (9) (10)
PSR-3 PSR-4
NP - - > N2 - - >
S V2 ing
COMPLEMENTATION: The one-bar constraint also of course eliminates analyses in which an X ' category dominates no lexical item X at all. Such an analysis is sometimes proposed for complementation structures, especially in generative-semantic-type analyses, e.g. Figure 1.8 (McCawley 1968: 77), for infinitival complement sentences such as John wants to kill Harry. Here the Ss dominated exhaustively by NPs violate the one-bar constraint. Halliday's claim that 'AH other embedding in English [other than comparatives] is a form of nominalization, where a group phrase or clause comes to function as part of, or in place of (i.e. as the whole of), a nominal group.' (Halliday 1985: 187) is also ruled out by this constraint. By this constraint, there can also be no node such as COMP which appears in configurations such as is shown in Figure 1.9, since COMP in (13) lacks a lexical head.
FORMAL PROPERTIES OF LEXICASE THEORY
19
(11) NP
VP
I
John
NP
V
I
1 1
want
I
S NPr 1 1 John
VP V
NP
kill
Harry
|
l
Figure 1.8 (12) COMP
(13) COMP
I
NP
whether
/ \ Det N 1 I 1 1 whose nose
Figure 1.9
Modality
I
open
Object
Agent
V K
I
by
NP N
/
K
I
j2f
John
\ Det
Instrument NP
-^l
N
the door
K
/
I
with Det a
\
NP
1
N key
•John opened the door with a key.' Figure 1.10
CASE NODES: The one-bar limitation also elirhinates Fillmorean case analyses in which case relations are marked as nodes in a tree,9 for example Figure 1.10. Such an analysis, and any analysis in which a semantically defined category replaces a syntactic node, is precluded because (i) the S does not have a lexical head, and (ii) the case relation nodes Agent, Object, and Instrument are not X ' projections of their lexical head Ks. If we were to insert a K ' between the K and the case relation, in the way that Chomsky assumes
20
FORMAL PROPERTIES OF LEXICASE THEORY
in Aspects (cf. Starosta 1986b), there would still be a violation of the one-bar constraint above the K', since the case relation labels would not have any lexical head at all (see Figure 1.11).
I
by
NP lIN
I
O
John
NP
^1
Det
N
the door
I
with
a
key
'John opened the door with a key.' Figure 1.11
The sisterhead constraint: lexical items are subcategorized only by their dependent sisters. The sisterhead constraint is a restriction on the domain of subcategorization of lexical items: contextual features are marked only on the lexical heads of constructions, and refer only to lexical heads of dependent sister constructions. As I noted in my paper, 'A place for case5 (Starosta 1976: 3-5), the Standard Theory leaves the matter of subcategorizational domain open; according to Chomsky's Aspects, selectional restrictions apply between items which are grammatically related, and grammatical relations are where you find them. Allowing phrasal nodes with more than one bar violates the X-bar requirement that an X be subcategorized only by its sister Comps. To demonstrate this, it is sufficient to cite the most egregious offender, VP. In GPSG (Gazdar, Klein, Pullum, and Sag 1985: 34) as well as in Government and Binding theory, an attempt is made to limit the domain of subcategorization to the Comp of an X, but because of the insistence on maintaining a VP node, this has to be weakened and circumvented in order to allow the statement of selectional restrictions applying between X (the main verb) and its Specifier (the subject). If we are looking for a case in which there is a clear grammatical relationship obtaining between two syntactic categories, however, it would be hard to find a more unequivocal case than subject and verb. Verbs almost always impose selectional restrictions on their subjects, and if the verb agrees
FORMAL PROPERTIES OF LEXICASE THEORY
21
with anything, it will at least agree with the subject, etc. Logically, if anything should be a Comp of the V, the subject should. Yet if we posit a VP node (15), it can't be, whereas if we dispense with the VP (16), it is. (15)
JS
(16)
S
NP
V
NP
love
Mary
II
John
i
At this point, some X-bar-knowledgeable reader is certainly thinking, 'But the subject is a Spec, S, a Specifier of the S! That's different from a Comp!' Of course it is different, dear reader, but why should it be different? By maintaining a difference between Specifier and Comp, which of course cannot be done in a one-bar theory such as lexicase, we lose the generalization that Comps subcategorize Xs, and that anything that subcategorizes an X is a Comp. What do we gain? A VP. But what good is a VP? Time after time, it turns out to require single generalizations to be stated as two different ones in GB (cf. Starosta 1985f). It is hard to avoid the dark suspicion that having a VP has become an end in itself, and that the the whole awkward Spec analysis of subjects and INFL analysis of clauses is an artifact of the determination to maintain a category VP at all costs. It may not be an exaggeration to state that the main obstacle that is now standing between the Government and Binding framework and a genuinely generative monostratal theory is VP. On the other hand, a grammar meeting the sisterhead constraint clearly commits itself about the domain of such features: contextual features can be met or violated only by the lexical heads of sister attributes within the same construction. In X-bar terms, an X is subcategorized only by its Comp and by nothing else. In a lexicase grammar, since all grammatical relations are stated in terms of contextual features on lexical items, this amounts to a constraint on the domain of possible grammatical relatedness: given the canonical X-bar form of a construction as X ' -* X, Comp n , contextual features on the head X of a construction X ' can be satisfied or violated only by the head of an associated Compj. Thus all grammatical relationships can be stated in terms of X (the REGENT) and Compj (the DEPENDENT or ATTRIBUTE), and all pairs of
items standing in a regent-dependent affiliation are by definition grammatically related.10 The regent-attribute relation can also be referred to as GOVERNMENT, and we may refer to all the words standing in a direct or indirect dependent relationship with a given word X as the (syntactic) DOMAIN of X. The sisterhead constraint functions to place an empirically motivated limitation on possible grammatical representations. Thus when the verb and the head of its subject NP are found to be grammatically related in terms of subcategorization, agreement, government, and/or selection (as they always are11), if a given syntactic analysis (such as the VP-type binary structure
22
FORMAL PROPERTIES OF LEXICASE THEORY
assumed by GB and Categorial Grammar) does not show them as being in a regent-dependent relationship, that analysis is wrong (see Figure 1.12).12 As a consequence of the sisterhead constraint, whether or not two words are grammatically related can be immediately read from a lexicase tree. For example, by looking at (17), we can immediately determine that a noun for example cannot be subcategorized with respect to any constituent outside of the NP, and the verb can't be subcategorized with respect to any constituent inside an NP except for the lexical head noun (see Figure 1.13). (8d)
^ NP
^
S
^
^ VP
Figure 1.12
Richard Hudson has claimed that dependency constraints have to be weakened in order to allow a lexical item to impose constraints on its regent (not his terminology). Thus adjectives such as enough require their regents to precede them (Hudson 1984: 124), and verbs in French determine whether their regents can be avoir or etre (Hudson 1984: 123). However, this does not logically follow. In the case of auxiliaries in Romance and Germanic, for example (to oversimplify), we can divide non-auxiliary verbs into two classes, say [±motn] 'motion' for German, then mark haben 'have5 as [—[+motn]] and sein 'be' as [—[—motn]], rather than marking motion verbs as requiring a regent sein and non-motion verbs a regent haben. The appropriate cooccurrence facts are still accounted for without allowing attributes to impose requirements on their regents as Hudson does.
FORMAL PROPERTIES OF LEXICASE THEORY
23
Features are marked only on lexical items, not on non-terminal nodes. In the original Chomskyan version of X-bar theory, phrasal nodes were to be replaced by complex symbols composed of features projected from the lexical heads of the phrases. However, although this requirement was originally intended to limit the power of the grammar, 'it is clear that the use of syntactic features on phrase nodes multiplies the total [number of categories in a grammar] further an indefinite number of times' (Pullum 1985: 6). The lexicase constraint against phrase-level features rules out analyses in which features are marked on higher nodes. This includes the GKPS analysis in which nodes have features which may even contradict features on the head daughter (cf. Gazdar, Klein, Pullum, and Sag 1985: 94-5) as well as Anderson's localistic case dependency grammar (Anderson 1971; Anderson 1977; cf. Starosta 1981), at least the earlier versions of Hudson's Daughter Dependency Grammar (Hudson 1976; Hudson 1979b), and all X-bar type analyses that adopt Chomsky's suggestion (Chomsky 1970: 208) that tree nodes be replaced by feature matrices. Chomsky originally made the latter suggestion in order to be able to express cross-categorial generalizations, especially with regard to the distributional similarities between verbs and predicate adjectives and their respective nominalizations. However, attempts to apply this consistently have resulted in many clumsy and counterintuitive analyses. It has been shown within the lexicase framework (Starosta forthcoming a: 103-5; Lee 1972: 65-6; Starosta, Pawley, and Reid 1981: 47-8, 5062) that if we eliminate this awkward revision of phrase structure grammar, the resulting simplified framework is still able to account for the data which originally motivated it in terms of contextual features which carry over in the process of lexical derivation, so that the generalization is captured in a lexical rule rather than in the syntactic representation itself. Another reason which has been suggested for requiring some features to be marked on phrases rather than on their heads is the claim that it is the whole NP which has a referent rather than the head noun itself (Dahl 1980, cited in Hudson 1980a: 8). This is, however, based on a confusion of representation and interpretation: it may well be that a hearer cannot identify the referent of a noun without access to the attributes (the question of interpretation). However, this does not invalidate a grammar in which a noun is represented as being marked for the referential properties of the whole NP of which it is the head (cf. Hudson 1980a: 9). PRETERMINAL NODES: Interpreted as conditions on PSRs, the constraints in this section require that at least one symbol on the right side of the arrow be an obligatory lexical category symbol such as N or V. This category is the lexical head of the constructions, and the phrasal category to the left of the arrow is a projection of that category. In PSR-1 below, for example, the verb is the immediate lexical head of a sentence, since all the rest of the categories are optional, and the noun in PSR-2 is the immediate lexical head of the NP, since the minimal NP is a single noun, for example an indefinite mass noun, proper noun, or pronoun (see (18)). The one-bar constraint, however, prevents word-class and word-subclass node labels such as N, Pron, Proper,
24
FORMAL PROPERTIES OF LEXICASE THEORY (18)
PSR-1
S
-->
PSR-2
NP - - >
(NP) (Det)
V
(NP) (Adj)
N
V, V-intr, Det, Art (as distinct from Det), Adj, Adj-degr, and so on from appearing in a lexicase representation. Both of these types of category labels are incompatible with a strict interpretation of the X-bar requirement (cf. Pullum 1985) that X be subcategorized only by Comp, where X ' -• X, *Comp n . If we abide strictly by the one-bar constraint, word subclass nodes may not appear in trees because if they did, there would be no legal way to exclude sentences such as (19). This sentence cannot be marked as ill-formed using the contextual features of lexical entries, since in a strict X-bar theory contextual features could only refer to Comps, that is, to dependent sisters, whereas if the lexical entries are dominated by word-class symbols, they will have no sisters to be subcategorized by (see 19a).
(19) *Every me expired the arrogant Nelly
expired Det
me
every
[+NJ
I
[+Det]
DetP 1
1
Det 1 1
the [+Det]
AdjP I
1
Adj 1 1 arrogant
N 1
1
Nelly
[+Adj]
We could make a special exception for major class X° labels, as is conventionally done without comment in X-bar analyses, but in lexicase this would be pointless and redundant, since the information conveyed by a node label such as N is already present in the form of the lexical category feature in the matrix of the noun, as shown in 19b). When these redundant node labels are eliminated, it is possible to maintain a strict interpretation of the Comp
FORMAL PROPERTIES OF LEXICASE THEORY
25
subcategorization condition, known in lexicase since 1979 as the Sisterhead Constraint: a lexical item X is subcategorized only by its sisterheads, that is, by the lexical heads of its dependent sisters or 'Comps' (see 19c). Then example (19c) is doubly ill-formed because (i) me and Nelly are marked by [—[]] as excluding all Comps, that is, all dependent sisters, but each has a sister, that is, another node dominated by the same mother; and (ii) expired is marked by [— [+N] ] as not allowing a right sister whose lexical head is marked [+N], yet it does have a right sister, and the lexical item directly dominated by the right sister node is marked [+N]. Note that once we have gone this far, we can go one step farther. Given the fact that every lexicase node is a one-bar projection of its head lexical item, node labels are predictable and therefore redundant as well. That is, assuming that an S, as a one-bar projection of V, is a V'13, that the node dominating a [+N] item is an N', i.e. an NP, etc., (19d) conveys exactly the same information as 19c. Of the seven MOLLUSC constraints listed in Pullum 1985, lexicase does not implement one L: Lexicality: all nonterminals are projected from preterminals. This constraint is meaningless in lexicase because due to the one-bar constraint, there is no difference between nonterminals and preterminals. It also rejects the peripherality constraint: Peripherality: lexical heads must be phrase-peripheral. This constraint, proposed for example in Stowell (1981: 70), seems to be an artifact of the Chomskyan determination to maintain a VP constituent at all costs and almost equally firm categorial-grammar-like predilection for binary
26
FORMAL PROPERTIES OF LEXICASE THEORY
constituent structure. If a construction has two immediate constituents and one of them is the head, then plainly the head will either be at the left end of the construction or at the right end. 1.3.2.3 Constraints and expressive power After filling some twenty-eight pages with lucid clarifications and cogent criticisms of X-bar theory as practiced by its progenitors, Pullum's conclusions and the consequences he draws for his own theory are rather curious (Pullum 1985: 27): The GKPS theory does not observe a particularly stringent set of conditions on the X-bar system. Minor categories are left out of the bar system (Weak Lexicality), and minor categories can be introduced as non-heads (a version of Weak Maximality), for instance. But as has been shown in section 2, such distinctions are devoid of effect on the class of languages that can be described . . . X-bar theory does not limit expressive power to any proper subset of the Cfls, nor does it have other magical effects like cutting down the class of grammars to finite size. It is not a powerful and explanatory component in a theory of the human capacity for language, and should not be represented as such. It has hardly any claims to make at all on its own . . . In this evaluation we need to be aware that Pullum is measuring power mathematically in terms of Cfls, that is, 'context-free-languages', and that these are in fact not natural languages at all, but sets of unstructured strings. That the study of such objects has any direct significance for natural language investigations remains to be demonstrated. As Robert Wall states (Wall 1972: 290), Worse still, however, is the fact that nearly all work on formal grammars deals exclusively with the sets of strings they generate (called the weak generative capacity of the grammar) and has little to say about the kinds of structural descriptions (constituent structure trees) assigned to the grammatical strings (strong generative capacity).. . [Results in mathematical linguistics] permit very few inferences about the grammars of natural languages. A quote in the final paragraph of Pullum's paper (op. cit.) seems far more pertinent to the lexicase purpose for developing a more constrained system of X-bar constraints: 'what we need most are proposals for actual limitations on what structural descriptions of sentences can be like, rather than further richness in the array of descriptive devices assumed in* addition to the theory of phrase structure . . . ' Precisely. That is what the constraints in this section are designed to provide. 1.3.3 Constraints on lexical representations To ensure that a new and more powerful monster did not arise from the ashes of the old, lexicase has started from something close to Chomsky's feature notation for lexical matrices as proposed in Aspects and added new constraints where possible. Constraints on lexical representation and the use of features at the present time include the following:
FORMAL PROPERTIES OF LEXICASE THEORY
27
1.3.3.1 Abstract lexical entries The lexicon contains only actually occurring words. The lexicon contains only full words, free forms with a syntactic category, a meaning, and a pronunciation. There are no affixes, derivational morphemes, abstract verbs such as CAUSE, abstract underlying forms such as *ostrac for ostracize, and no phonetically null elements such as A, PRO, or t. 1.3.3.2 Feature inventory Syntactic categories are limited and atomic. Every word in a grammar is a member of one and only one of a restricted set of syntactic word classes or 'parts of speech', probably limited to the following: N, V, Adjy Adv, Det, P, cnjc ('conjunction'), and possibly SPart ('sentence particle'). Every lexical entry is a member of one of these seven or eight classes. Each syntactic class is defined in terms of a single positive atomic feature drawn from this set of categories, such as [+N], [+Adj], etc., and no lexical item is marked positively for more than one of these features. All the categories are defined primarily in distributional terms, that is, in terms of sequential and non-sequential dependencies which they contract with other items in a phrase. Major syntactic categories are divided into syntactic subcategories based on differences in distribution. Thus nouns are divided into pronouns (no modifiers allowed), proper nouns (no adjectives and typically no determiners allowed), mass nouns (not pluralizable), etc., and similarly for the other syntactic classes. The contextual features associated with the words in these various distributional classes determine which words are dependent on which other words. They are in effect well-formedness conditions on the dependency trees associated with the words in a sentence. Morphological properties and derivational potential are of secondary importance and semantic or 'notional' criteria of tertiary importance. The requirement that major class features be atomic and mutually exclusive is meant to exclude analyses such as Hudson's (Hudson 1979c: 16; Hudson 1984: 98)) in which gerunds are simultaneously marked positively for membership in the verb and noun classes, as well as elastic X-bar type GB analyses in which lexical categories are treated as feature complexes. For example, it rules out definitions of N, V, A, and P as [+N, —V], [—N, +V], [+N, +V], and [-N, - V ] respectively (Stowell 1981: 21; Gazdar, Klein, Pullum, and Sag 1985: 20-1), JackendofPs treatment of nouns for example as (Jackendoffl977: 3): [•Subjl -Obj
l+Compj and Chomsky's cutesy analysis of passive participles as simply [+V], a neutralization of verbs [+V, —N] and adjectives [+V, +N] (Chomsky 1981: 55). It also excludes systems allowing feature matrices with internal structure,
28
FORMAL PROPERTIES OF LEXICASE THEORY
such as that developed in GPSG (Gazdar, Klein, Pullum, and Sag 1985: 21), Bresnan's proposal to define categories as ordered pairs of an integer (representing bar level) and a bundle of feature specifications (Bresnan 1976, cited in Gazdar, Klein, Pullum, and Sag 1985: 21), and analyses involving ordered pairs of sets of agreement feature specifications (Stucky 1981 and Horrocks 1983, cited with apparent approval in Gazdar, Klein, Pullum, and Sag 1985: 21). 1.3.3.3 Lexical disjointness A word can be a member of only one syntactic category. This constraint seems to be what is meant by Pullum5s lexical disjointness condition (Pullum 1985: 6), formulated as: If X° — t and Y° -* t ' are both in R, and X ¥" Y then t ¥" t'. Pullum refers to this condition as 'unrealistic5, but doesn't elaborate. Perhaps he is concerned about the fact that an element like run can appear as either a noun or a verb. However, in this situation we are talking about a ROOT run, which has a pronunciation and a meaning but not a syntactic category. In a lexicase grammar, a word (as opposed to a root) has a grammatical category as well as a pronunciation and a meaning. Thus if we find that run appears as a noun and as a verb, then we are dealing with two lexically distinct though homophonous words, runx [+V] and run2 [+N]. The regular relationship between such pairs of words is captured in lexicase by means of derivation rules rather than by complex lexical entries. In a lexicase grammar, a word is defined by three elements—sound, meaning, and distribution—rather than just the Saussurean semiotic sound and meaning. None of these three elements is predictable from the others, so all must be listed in the.lexicon. This constraint rules out single entries which have subparts, say, a [+N] part marked for certain idiosyncratic features, a [+V] part with others, and the rest of the features held in common (Chomsky 1970: 90), with 'redundancy rules5 (ibid.) accounting for whatever is predictable across classes. However, the term 'redundancy rule5 is inappropriate, because the information contained in such rules is not redundant: the existence, form, and meaning of a derived item is not completely predictable from its derivational source, and the need for rule features to make everything come out right at the end is just an admission of this fact. A rule feature is merely an address or a pointer, a clue in a treasure hunt game telling the searcher where to look next. Instead of positing combined entries and rule features, a lexicase grammar must establish separate lexical entries, one for the verb and one for the noun, and write a derivation rule to account for the regular relations between them. Because I take this position, I have been accused of disturbing 'the unity of the word'. However, I think the combined entry approach was never anything more than a lexicographic convention based on a neglect of the distributional dimension of the linguistic sign, a mystical belief in 'the unity of the word', and the availability of the alphabet. My experience with unwritten languages suggests that this 'unity' has little psychological justification. There is no more
FORMAL PROPERTIES OF LEXICASE THEORY
29
linguistic reason to list items together according to their common roots or (derivational) stems (that is, according to their etymological sources) than there is to group them according to their meanings (as in a thesaurus) or their distributions (as in grammar lessons or reference works for the preparation of teaching materials). Except for inflectional paradigms,14 combined entries are just an arbitrary,15 awkward, and formally unworkable16 alternative representation for separate independent entries, in my opinion, and this constraint makes the potentially testable claim that such composite representations have no psychological validity. 1.3.3.4 Binary and implicationalfeatures only All non-contextual features are binary, and contextual features are positive, negative, or implicational. Lexicase allows no w-ary features such as employed in Chomsky 1965: 170-2, and no features which take other feature specifications or even whole categories as their values, a proposal developed and elaborated in GPSG (Gazdar, Klein, Pullum, and Sag 1985: 22-3) or sets of values (Pollard 1984, cited in Gazdar, Klein, Pullum, and Sag 1985: 41). Given this kind of powerful notation, it is just too easy to miss important observations. For example, in a system which allows numbers as feature values, it is easy to be influenced by traditional terminology and English-specific properties and treat the category of person as a feature like PER with values 1, 2, or 3 (Gazdar, Klein, Pullum, and Sag 1985: 22-3), and thereby miss semantic insight and the generalizations about the inclusive-exclusive distinction which can be achieved when the notation forces a binary analysis in terms of [±spkr] 'speaker' and [±adrs] 'addressee': -speaker
+speaker
•addressee
'second person1
'first person inclusive'
-addressee
'third person'
'first person
exclusive' From an anglocentric point of view, the inclusive—exclusive distinction is a troublesome anomaly, but a binary analysis accounts very simply for the vast number of languages which make such a distinction, and explains the otherwise anomalous Ilokano first-person dual pronoun: it is simply a firstperson singular inclusive [—plrl, -hspkr, +adrs], a possibility predicted by the binary analysis. Similar considerations apply for example to case: it is so easy to treat case inflection categories in Sanskrit fashion as case 1, case 2, etc., and thereby miss the whole localistic system which a binary analysis forces the investigator to discover, or to set up a class of 'minor categories' which makes complementizers, determiners, conjunctions, etc., exceptions to the X-bar system (Gazdar, Klein, Pullum, and Sag 1985: 25, 35, 96) and thereby misses
30
FORMAL PROPERTIES OF LEXICASE THEORY
a solution to the problem of X n -* X n recursion which forced GPSG to give up Pullum's Succession constraint (Pullum 1985: 25; see section 1.3.2.2). Non-contextual features in a lexicase grammar are binary. At the moment there is no particular reason to believe that this restriction is correct, but since it does constrain the theory and has not as yet forced the adoption of any obviously bad analyses, I will assume as a hypothesis that it is correct, and that analyses adopting multiple-valued features in syntax are wrong. Although at first such a constraint may seem to be quite arbitrary and devoid of empirical content, it does have some potential for falsifiability, in the following sense: it claims in effect that there can never be two lexical entries which are identical except that one is specified by binary features and the other by n-ary features (n > 2), so it can in principle be disproved by demonstrating that such items do exist in some attested language. More practically, we might look instead for external confirmation at least from psycholinguistic or neurolinguistic investigations, although I have no particular experiments to propose at this point. In the last two years, linguists working in the lexicase framework have come to recognize that there are two different kinds of contextual requirements marked on lexical items, absolute and implicational (Khamisi 1985: 341; Pagotto and Starosta 1986: 4-5; cf. Blake 1983: 167). Absolute contextual features mark an item as requiring or excluding a particular kind of dependent sister category in order for the sentence to be well formed. For example, in English transitive verbs must have overt objects, and finite verbs must have overt subjects. If either of these requirements is not met, as in (20) and (21), the sentence is clearly ungrammatical. Similarly, pronouns may not cooccur with determiners, and the presence of a determiner violating a negative absolute contextual feature (see (22)) results in ungrammaticality. (20) *John resembled. [•
[+N]]
(21) ^Resembled a gross-breasted rosebeak. [+[+N] (22) »The [+Det]
] I [-[+Det]]
love
the
you.
[+Det]
l-[+Det]]
Some positive contextual restrictions can be violated without necessarily resulting in ungrammaticality. One kind of example is the requirement that finite verbs have subjects. In English this is almost absolute (except for * headless relative clauses' and probably for imperatives), but in Swahili (Khamisi 1985: 340-2), Spanish, and other 'pro-drop' languages it is not. In this situation we need to say that the finite verb EXPECTS a subject ( p [ + N o m ] ] ) rather than that it REQUIRES one ([+[+Nom]]). The other
FORMAL PROPERTIES OF LEXICASE THEORY
31
common place where this comes up is in 'wh-movement' situations (cf. Pagotto 1985a: 30 ff). Lexicase does not allow empty categories, so if a constituent is not there, it is not there; yet a missing 'wh-moved' category does not necessarily result in ungrammaticality, as shown in (23). To account for these two different kinds of contextual requirements, then, lexicase grammar now distinguishes absolute from implicational features. Violation of an absolute polar feature is sufficient grounds for ungrammaticality, whereas an implicational feature results in ungrammaticality only if the implied constituent has not been identified by a rule of interpretation at the end of the derivation. (23)
What did you [g put it
on
]?
[:>_[+N]] 1.3.3.5 No double contextual features No lexical entry may contain a double contextual feature. A double contextual feature is one in which a contextual feature is included within another contextual feature. For example, in order to maintain a GBtype analysis of infinitival complements, a feature might be proposed for verbs such as want which required the subject of an embedded infinitive to have a preceding accusative NP:
want +[-fint
J
Such an analysis would be ruled out by this constraint, since it requires a contextual feature nested within another contextual feature. It is important to impose this constraint, since without it the sisterhead constraint can be evaded and dependencies can be stated between any two arbitrarily chosen categories. Unfortunately, this constraint is also easy to evade even within the lexicase notational system. For example, instead of the feature [—[+Acc] ] in the above matrix, we could substitute some dummy non-contextual feature [+dmmy]:
want +V
-fint •dummy
32
FORMAL PROPERTIES OF LEXICASE THEORY
and then mark infinitives one way or the other for this feature (SR-1), which is subsequently interpreted as a contextual feature (RR-12). Thus this constraint needs to be modified to prevent this kind of evasion as well as more direct violations. Unfortunately, as of this writing, the lexicase analysis of 'headless relative clauses' such as whatever bit John uses exactly this kind of mechanism, and it is not clear how to analyze this construction without it. SR-1. RR-12.
[-fint] — > [+dmmy]
[±dmmy]
—>
[-[+Acc]
]
1.3.3.6 No rule features No lexical entry may contain a rule feature. This constraint requires that a grammar contain no lexical diacritic feature of the form [+RJ or [—RJ, where 'Rj' is the number or address of a particular rule in a grammar, and where [+RJ indicates that the item on which it is marked must undergo rule Rj in every derivation, while [—RJ indicates that its host item may never undergo Rj in any derivation for which the structural analysis of that rule is met. Rule features are not nice, and I don't want them in my nice clean grammar. Rule features are lexical specifications marking a lexical item as an exception to some grammatical rule. They can be used to camouflage a leaky rule as a general and productive rule, and their ready availability has had the effect of making it easy for syntactic-rule-oriented syntacticians to state transformations or relation-changing rules as if they were true productive structural generalizations, and disregard the fact that most 'syntactic rules' are lexically governed. This constraint decreases the power of the metatheory in a very salutary way: it weeds out some otherwise possible but unnatural and ad hoc analyses. Such a constraint immediately excludes all possible grammars which adopt the kind of notation and machinery proposed in Lakoff s Irregularity in Syntax (Lakoff 1970: 22-6). A direct consequence of the stricture against rule features is the elimination of all transformations or relation-changing rules which have any arbitrary lexical exceptions, since there is no way to block the application of such rules if there are no rule features to refer to. Interestingly, this turns out to be most of the transformations in the Chomskyan literature. The (partial) generalization previously captured by such rules will then have to be captured somewhere else, and the only other place to do it is the lexicon. As an example, consider the 'Dative Movement' rule. As is well known, there are lexical exceptions to this rule, so the absence of rule features means that it cannot be stated as a transformational rule or a Relational Grammar promotion rule. Then where can this property of English grammar be stated, or can't it? It should come as no surprise that it ends up in the lexicon. Thus NP NP]) the class of verbs which occurs in the 'moved' environment (+[ for 'Dative Movement' is treated as a separate syntactic class from the set of verbs which can occur in the 'unmoved' environment (4-[ NP to/for NP]),
FORMAL PROPERTIES OF LEXICASE THEORY
33
and the relationship between the two classes is accounted for in terms of a lexical derivation rule that derives the +[ NP NP] verbs from their +[ NP PP] counterparts or vice versa (Starosta and Nomura 1984: 30). Since a derivation rule is a semi-productive analogical process of word formation, gaps are allowed and expected. The psychological claim made by this constraint then is that there is not a single class of verbs, some of which allowed themselves to participate in certain external reshufflings and others which do not. Instead, there are two distinct classes of verbs, the "moved' and the 'unmoved' verbs. They are learned and stored separately, but because of a high degree of overlap in the class membership, the speaker is able to move items from one class to the other by analogy, and to understand new examples produced by other speakers in the same way. This claim is a consequence of the constraint against rule features, and one which in principle is subject to empirical testing. To cite another example, because of the unavailability of rule features, predicate adjectives cannot be lexically identified with adnominal ones, marking some items as positive or negative exceptions to Lakoff s WH-DEL and ADJ-SHIFT transformations (Lakoff 1970). Instead, predicate adjectives must be listed in the lexicon as a separate class from adnominal adjectives. The intersection of the two sets is accounted for again in terms of a lexical derivation rule, and the relative complements of the intersection (the words that occur in only one of the two sets but not the other) are considered lexical gaps. Finally, this constraint also excludes the lexical specification of irregular morphology allowed in GPSG (Gazdar, Klein, Pullum, and Sag 1985: 34) and Hudson's morphological cross-referencing system (Hudson 1980b: 2,12,15). Hudson groups irregular inflectional forms such as write : wrote : written into single entries consisting of a common root, e.g. /rait/, plus a specification of vowel alternations dependent on morphological contexts. This can be seen as either a notational variant of separate lexical entries or as the old structuralist replacive morph analysis, a way of marking a single lexical item to undergo its own private morphophonemic rule, with the implied claim (which I consider highly dubious) that the vowel changes reflect some psychologically real morpheme division. When other irregular sets such as drive : drove : driven exhibit the same alternations, Hudson allows them to refer to the same set of vowel changes and contexts (Hudson 1980b: 14-15). At this point we are surely dealing with the equivalent of a minor rule in Lakoff s sense (Lakoff 1970), with the vowel changes and contexts considered to be independent of any particular lexical entry (Hudson 1980b: 12). The cross-referencing notation is a rule feature triggering the application of the minor rule, although Hudson explicitly denies this (Hudson 1980b: 15; cf. Hudson 1984: 58). As such, it is excluded by the anti-rule feature constraint, which thus provides an answer to his question (Hudson 1980b: 4) as to where to draw the line between reasonable and unreasonable analyses.
34
FORMAL PROPERTIES OF LEXICASE THEORY
1.3.4 Constraints on rules 1.3.4.1 No transformations The original generative grammars made extensive and crucial use of descriptive devices called 'transformations' to relate overt * surface structures' to meaning-bearing 'deep structures', and to account explicitly for the systematic relation between corresponding sentence patterns such as active and passive structures. Among the people who have been seriously concerned about generative power and what to do about it, there is a broadly held view that such transformations are primarily to blame for the limited empirical content of most versions of generative grammar, and that something must be done to limit this power. A first step in this direction was Emonds' structurepreserving constraint (Emonds 1976). This was an attempt within the Standard Theory tradition to limit the power of transformational rules in order to 'clear the way for a generative model of a language which will be both formally manageable and empirically significant' (op. cit., endflap). By 1968, one faction of the MIT school had developed the transformational machinery into an extremely abstract though quite inexplicit logic-oriented system which came to be known as 'generative semantics'. Partly as a reaction to the excesses of generative semantics, perhaps, Chomsky (Chomsky 1970) proposed that some kinds of grammatical relatedness, in particular certain kinds of nominalization, should be accounted for in terms of lexical rules rather than transformations. This proposal, which is referred to as the lexicalist hypothesis, has since spread to other areas of grammar and has been important in the genesis of new grammatical frameworks within and beyond the bounds of MIT linguistics, such as Bresnan and Kaplan's lexicalfunctional grammar (cf. Bresnan 1982a), Brame's Base Generated Syntax (Brame 1978), and Hudson's Daughter Dependency Grammar (Hudson 1976) and Word Grammar (Hudson 1984), all of which reject the use of transformations altogether, although in many cases they seem to have created new devices of unknown power to replace them.17 Lexicase was apparently the first generative framework to propose the total abolishment of transformations, so that there is no distinction between deep and surface structures, levels, strata, 'functional descriptions', or other simultaneous levels of syntactic or semantic representation, and there are no grammatical rules that map one sequence or hierarchy onto another; that is, there are no rules that adjoin, delete, permute, relabel, or copy parts of one structure to produce another structure. To invalidate the stricture against transformations and similar stratumonto-stratum mapping rules, it would be necessary to find some valid fact of grammar which cannot be accounted for without invoking transformations or other equally powerful descriptive devices. Much of the work in lexicase since 1970 has thus been devoted to showing that facts formerly handled by transformations are either not facts of grammar proper or can be handled in the lexicon without resorting to transformations. For example, facts about discourse-related phenomena such as clause-external anaphora, conjunction reduction, short answers to questions, etc., must be considered part of
FORMAL PROPERTIES OF LEXICASE THEORY
35
performance, at least until someone can show such phenomena to be amenable to a formal and explicit description. On the other hand, grammatical processes such as Subject-Aux Inversion in English have been shown (Starosta 1977) to be susceptible to a purely lexical analysis, with no resort to transformations necessary. 1.3.4.2 No Phrase Structure rules There are no Phrase Structure rules (PSRs) as separate from lexical rules. A final constraint is found to follow from the those proposed above: Phrase Structure rules are redundant and can be eliminated, and all rules of grammar proper can be reformulated as generalizations about relations among lexical entries or relations between features within lexical entries. Phrase Structure rules are used in a grammar to describe the ways in which words are allowed to combine into phrases, clauses, and sentences. Given the constraints proposed in this section, however, this information can be provided by means of contextual features marked on the lexical heads of constructions. (This consequence will be discussed in detail in section 2.4.3.) Eliminating PSRs is nice in terms of simplicity, but more important, it constrains possible syntactic representations. To demonstrate the difference in power between ordinary PS rules and their lexicase counterparts, we need to find tree structures which can be generated by ordinary PSRs but not by lexical rules constrained as proposed in this section, and then show that all the structures generated by lexical rules are also generable by PSRs. It is easy to show that PSRs can generate many kinds of structure which are not possible in lexicase. The following common transformational grammar-type rules for example generate trees which violate the one-bar constraint, since there is no lexical category on the right side of the arrow: *S" - T O P S ' *S' -+COMPS *S - NP INFL VP *NP - DET NOM *NP - NP S *NP - S The elimination of Phrase Structure rules is not only motivated by questions of expressive power, but can be justified in other ways as well. For example, Jappinen ^/.(Jappinen, Lehtola, and Valkonen 1986: 461) observe that PS rules are not appropriate for free word order languages such as Finnish, since such rules are heavily dependent on configuration, and hence onfixed-orderconcatenation (Nelimarkka, Jappinen, and Lehtola 1984:169). Similar points have been made within GPSG as justification for separating linear precedence from immediate dominance. In retrospect, it is probably true that the invention of PSRs in thefirstplace was heavily influenced by the fact that the inventors were speakers of languages with relatively fixed word order, and that ' Scrambling' transformations had to be brought in solely to
36
FORMAL PROPERTIES OF LEXICASE THEORY
remedy the damage done by adopting an unsuitable PSR notation in the first place. As will be seen below, the lexicase counterpart of constituent structure rules is Inflectional Redundancy Rules which can either control or ignore linear precedence, and thus have the best of both worlds. Whenever a grammatical framework imposes the elimination of some previously accepted but excessively powerful formal mechanism, there is always the danger that the Frosch Air Mattress Principle will be found to apply: whenever you push an air mattress down in one place, it bulges up somewhere else. This is for example how Pullum (Pullum 1985: 23-4) evaluates the formal proposals made by Stowell 1981: In Stowell's theory, particular grammars provide the lexical entries that trim this universal CFL down to a particular language, and its entire content hangs on the nature of those devices . . . As so often within current linguistics, particularly GB, a well understood device (PS rules) is 'eliminated', and in its place appear a dozen devices for which no one has ever given definitions, let alone specified their expressive power when they are used in concert in full-scale rule systems. The situation in lexicase differs from Pullum's characterization of Stowell in that lexicase formalisms are a refined and developed subset rather than a vague and programmatic superset of formalisms proposed in Chomsky 1965. Lexicase 'Derivation Rules' have no clear counterparts in Aspects, but they too have their origins in the Revised Standard Theory, being essentially a more rigorous conceptualization and formalization of JackendofFs 'redundancy rules' (Jackendoff 1972: 41, Jackendoff 1975). In spite of the limited generative power of lexicase grammars incorporating the constraints discussed above, it has turned out to be possible to describe quite complex phenomena without going beyond the formal limitations imposed here. In the nearly two decades since its inception and in the course of accounting for a broad range of grammatical phenomena from close to fifty languages, lexicase has (with one or two exceptions in the area of unbounded dependencies) become more tightly constrained than it was at the beginning. In spite of this, even grammatical constructions such as clefts, pseudo-clefts, and gerundive nominalizations can be generated directly without losing generalizations, and the entire English auxiliary system, including passivization, subject-aux inversion, 'affix-hopping', etc., has been described in terms of a single set of explicit lexical rules obeying the constraints (Starosta 1977). Thus there seem to be grounds for thinking that the theory is on the right track. NOTES 1. The idea of applying the concept of Valence' to linguistics originated with Lucien Tesniere (cf. Tesniere 1959), and has givenriseto an important school of European linguistic thought. (Cf. for example Engel 1977 and Helbig and Schenkel 1975.) 2. One consequence of this is that all contradictions 'mean' the same thing, and all tautologies 'mean' the same thing. Thus 'a rose is a rose is a rose' means the same thing as 'a dollar is a dollar'.
FORMAL PROPERTIES OF LEXICASE THEORY
37
3. Note however, as Pat Lee has pointed out to me, that for non-declaratives there exist satisfaction values in erotetic logic and the logic of commands, parallel to the felicity conditions in speech acts. 4. See Gazdar, Klein, Pullum, and Sag 1985: 8 for further problems entailed by identifying a semantic theory for natural language with 'a recursive definition of truth in a model'. 5. That is, brain rather than mind, since the latter concept has no physical existence. 'Mind' is merely an epiphenomenon of the electrochemical processes taking place in the brain. 6. These examples are constructed assuming a grammatical framework which expresses strata of analysis as phrase structure trees, but corresponding examples could be given in other frameworks, for example Relational Grammar. 7. The discussion in this section draws copiously on Pullum's excellent critique of X-bar theory, 'Assuming some version of the X-bar theory' (Pullum 1985), although as will become apparent, I disagree with certain of the author's conclusions. 8. For a justification of this analysis, see Starosta 1985e: 14-18. 9. Chomsky argued in Aspects against such a use of nodes as relational labels (Chomsky 1965: 68-74), but in fact he egregiously violates his own constraint later in the same chapter (cf. Starosta 1976: 6-10). Although I am a case grammarian myself, I am not sad to see these labels jettisoned. Since case relations can also be marked on head nouns, it was redundant anyway to have two ways of doing the same thing (op.cit.: 27-8), and it is nice to have an additional justification for the standard lexicase practice of marking case as a feature of the noun itself. 10. Cf. Hudson 1979c: 9-10 for a very similar approach to the domain of subcategorization. 11. In fact, this remains true even if there is disagreement in the analysis of an ergative language about which of the nominal constituents in a sentence should be regarded as the subject. 12. It must be pointed out here, though, that relations of control and coreference are not defined in terms of regent-dependent relationships, which is one major reason for treating them as different in kind from the purely grammatical connections we are discussing here. 13. This definition will have to be adjusted a bit below to allow for sentences with NP and PP predicates. 14. And maybe even including inflectional paradigms. 15. In order to maintain this fictional unity in a real generative grammar, it will be necessary to list many essentially separate entries together in a clump under a single phonological form when their semantic properties are subjectively felt to be close enough to justify this stratagem (cf. Chomsky 1970: 190; Hudson 1980b: 6). 16. Of course anything, including combined cross-categorial entries, can be made to work with rule features, which is one reason why rules features are bad. 17. For other references, see Hudson and McCord n.d.: 1.
2.
Pan-lexicalism
2.1
G R A M M A R AS LEXICON
Grammar is lexicon. A word class is established on the basis of the potential cooccurrence and interdependency of words in constructions, and a construction is a set of word sequences each of which is characterizable in terms of one or more head words and zero or more non-head words whose appearance, sequencing, and grammatical marking are governed directly or indirectly by the head word. Information about the makeup of constructions is internalized and represented in a grammar in terms of classes of lexical items, since it would be extremely uneconomical to account for such collocations in terms of lists of sequences of arbitrary elements. Word sequences whose head words are members of the same lexical class (or classes, in the case of exocentric constructions) are tokens of the same construction type, and conversely, the structure of each construction type is characterized in terms of the class of words appearing in the head position and of the word classes, construction types, and relative positions of the head word's attributes. All the grammatically relevant properties of a word are specified in its lexical matrix in terms of contextual and non-contextual features. These features characterize the possible environments in which each item can occur, and 'environment' here means not just sequential environment but also hierarchical dependency environment. Contextual features are statements of sequential and non-sequential dependency relationships, i.e. 'immediate domination' and 'linear precedence' in the GPSG sense. Given the lexicase X-bar constraints, especially the sisterhead constraint and the one-bar constraint, the contextual features on the words in a sentence allow the appropriate dependency structure to be assigned without the adduction of any rules, e.g. given the sequence of lexical items in Figure 2.1, there is only one way, Figure 2.2, of drawing a dependency tree which satisfies all the contextual features. wwwww
xxxxx
yyyyy
+w -t+X] 4 + Y] .4+21.
+x +[+W] + [+Y]
+Y 4 + X] -{+W]
.-Ml .
Figure 2.1
zzzzz +z +[+X] K+Y] .4+W]
J
39
PAN-LEXICALISM
wwwww r+w"
4+X] -t+Y]
xxxxx +x
yyyyy
+[+W] + [+Y] 4+Z]
+Y 4+X] K+W]
zzzzz +z
+[+X] -t+Y]
1
Figure 2.2
Thus constituent boundaries are in effect already incorporated into an analysis employing appropriately constrained contextual features. This suggests the possibility that rules of linguistic competence stated in terms of such features might have direct counterparts in an account of performance. In production terms, the speaker chooses words and places them in hierarchical structures that will not violate the contextual features, and in perception terms, the hearer parses the sentence in such a way that no contextual restrictions will be violated (cf. Starosta and Nomura 1986). 2.2 GRAMMAR AS RULES If a well-formed tree structure is defined as one in which each lexical item is in an acceptable environment, then the lexicon is a set of well-formedness conditions on possible tree structures (cf. McCawley's Node Admissibility Conditions \ McCawley 1968: 247ff). That is, the lexicon itself is a generative grammar, a formal device which generates the sentences of the language with their associated structural descriptions. If the grammar of a language is a set of lexical items, what about rules? Do we still need any? The answer is, 'Yes and no.' It is quite true that (setting aside unbounded dependencies for the moment) the lexicon itself generates the sentences of the language without referring to rules at all, so that strictly speaking no rules are necessary. However, it is also true that such a grammar is not optimal: it leaves something to be desired, namely, generality. A lexicon is a list of exceptions, an inventory of all those elements of a language which cannot be predicted (cf. Bloomfield 1933: 274). If we are content with treating a grammar as a list of exceptions, then we need go no further than writing an explicit lexicon. There are, however, general facts about a language that as scientists we would like to present to the world, and since the grammar inheres in the lexicon, these facts turn out to be generalizations about the properties of the lexicon. To take one example, it turns out to be the case that in a lexicon of English, every lexical matrix containing the feature [+prnn] (pronoun) will also contain the feature
40
PAN-LEXICALISM
[—[+Det]] ('does not cooccur with a dependent sister Determiner'). This generalization can be stated as an implication: RR-1.
[+prnn] -->
[-[+Det]]
thereby becoming our first grammatical rule. Each time we find a way to predict a feature of a lexical entry, that feature becomes redundant and can be removed from the entry, since it is predictable by rule. We go on extracting such generalizations until there are no more left to extract. When all redundant features have been removed and stated as rules, what remains is by definition a lexicon in the Bloomfieldian sense: a list of all the non-predictable elements of the language. At that point, we have a grammar in the familiar sense of a lexicon plus a set of rules. The fundamental principle of lexicase grammar, then, is that all grammatical rules are to be viewed as generalizations about the lexicon. This postulate, referred to by Richard Hudson as 'pan-lexicalism' (cf. Hudson 1979a; Hudson 1979c: 11,19), evolved in lexicase at about the same time as Hudson's daughter-dependency grammar. Within lexicase, it crucially depends on the lexicalist hypothesis (Chomsky 1970), the one-bar constraint, and Pagotto's work on the analysis of unbounded dependencies in terms of rules of interpretation (Pagotto 1985a). Lexicase is lexicalist in the sense of Chomsky 1970, in that it accounts for the systematic relations among sentence types in terms of lexical rules. It is lexicalist in a more radical sense as well, however: there are no phrase structure or transformational rules at all, and with the exception of rules of grammatically conditioned anaphora and sentence-level phonological processes, every statement in the grammar is a generalization about the relations obtaining among the internal features of words or about analogical patterns relating sets of words. This result may seem paradoxical: we are interested in writing rules to account for syntactic structures, but the rules we have been talking about in this section operate on lexical matrices, and the output of the grammar, as shown in the flow chart (Figure 3.1), is not sentences but 'fully specified lexical items'. This situation is, however, not as paradoxical as it may seem. Syntax is (minimally) the syntagmatic and hierarchical combinations of words in sentences. Sentences are composed of constructions, and constructions are ultimately made up of words. The possible combinations of words within a construction are determined by the contextual features marked on the lexical head of the construction, and the distributions of constructions within higher constructions such as S is specified in terms of contextual features on the lexical heads of the higher constructions. The words that emerge from the IRRs thus generate their own sentences: a well-formed phrase is any sequence of words for which we can find a dependency tree for which all contextual features of the words are met, and a (verbal) sentence is a phrase with a [+V] lexical head. Thus in accordance with the standard notion of generative grammar, the grammar plus the lexicon still generate the sentences, but in a less direct way: the rules plus the minimally specified lexicon generate the fully specified lexical entries, that is, the words, and the words themselves generate the set of possible wellformed sentences.
PAN-LEXICALISM
41
A descriptively adequate generative grammar is a formal and explicit system which generates (characterizes) the set of sentences in a language, including not only the sequences of words but also what the speaker knows about those sequences, including the hierarchical and dependency structure of the string and at least that much of the meaning of the string which is constant for all contexts in which it is used. If in the 'distribution' portion of the lexical representation of each word we can include information about hierarchy as well as sequence, and if in the 'meaning' part we can include information about the semantic content of inter-word relationships, then all the information we require of a generative grammar will be resident in the lexicon. This is in fact what the lexi- and -case components of the lexicase framework are respectively designed to do. Then writing a 'pan-lexicalist' lexicase grammar will just be a matter of extracting the generalizations from the lexicon. In the remainder of this book, I will be proceeding on the assumption that a lexicase grammar is panlexicalist in Hudson's sense. That is, a grammar will be taken to consist of the lexicon, that is, a list of all the elements of the grammar which are not predictable, and a set of generalizations about the lexicon. It should be pointed out at the outset that there is a fair possibility that this assumption is mistaken. It presupposes (i) that it is the mission of a grammar to capture generalizations, and (ii) that a grammar is a model of some kind of homomorphous internal reality. However, we should be aware that these two assumptions may be in conflict. The issue relates to the question of how grammatical information is internally represented in the brain. It was mentioned earlier that a generative grammar can be represented either (i) solely as a fully specified lexicon, with no general rules involved at all, or alternatively (ii) as a set of minimally specified lexical entries plus a set of general rules. If a grammar is a model of some physical reality in the brain, as I assume here, then two different formal systems (i) and (ii), should correspond to two different physical realities, and only one can be correct. It should be clear that the choice of the correct solution is not one which can be made on the basis of which one generates the correct set of sentences, since both do. In spite of this, I believe that it is an empirical question, but one which can only be answered by going outside of grammar proper to enlist the assistance of psychology. Looked at from a psychological perspective, the question to be answered is whether speech processing relies more on processing (alternative (ii)) or memory (alternative (i)). This is a question which can in principle be answered by psychological experimentation, and in fact the preliminary results are already in (cf. Kintsch 1974, Pawley and Syder 1983). They indicate that memory is far more important in actual speech processing than most generative grammarians had hitherto believed possible, and that the lexicon bears a much greater burden in this respect than the rules which many of us have dedicated our professional lives to constructing. If this is true, what are the implications for the way a lexicase grammarian writes a grammar? I am inclining to the view that the implications are for how we interpret our grammars rather than how we write them: our grammar provides the information that a speaker at least potentially has access to, but
42
PAN-LEXICALISM
whether a given speaker uses that information in prepackaged form (alternative (i)), reconstructs it as needed (alternative (ii), or does some of each will have to be left open. The maximally general grammar tells us what generalizations are in principle available to the speaker, but some of these generalizations at least may never be recognized and employed by any speaker. An example is the elaborate demonstrative system of Palauan, a real morphologer's delight: a detailed morphological analysis can discover a consistent meaning for almost every phoneme in every (non-suppletive) form, but it has so far proven impossible to get a native speaker to show any awareness of the existence of this charming system.
2.3
LEXICON
At this point it is not yet clear whether the lexicon is represented as (i) a list of words (fully specified lexical entries) or (ii) a list of fully specified words (those lexical entries which are not predictable members of a paradigm) plus a list of stems (partially specified lexical entries which are expanded into inflectional paradigms by inflectional subcategorization rules; cf. Scalise 1985). Richard Hudson (Hudson 1984: 27-28) has a nice discussion of exactly this point. He opines that: The only reasonable conclusion seems to be that some predictable information is stored, and some is not; and at present we are not in a position to decide which is which (a similar conclusion is expressed in McCawley 1977). This leaves the linguist in a difficult position, as a searcher after psychological reality . . . I shall assume the minimum of storage, as a general policy. As far as psychological reality is concerned, then, my grammars will provide a base-line below which we must assume that mature speakers of the variety concerned will not sink. . . . I shall assume the maximum of generalization.... But it would be quite possible to speak Englishfluentlywithout ever having made this generalization [regarding certain subcategories of auxiliary verbs] in one's grammar. . . . So for such speakers, my grammar is predicting less stored information than they actually have in their heads. Like Hudson, then, I will generally assume the maximum amount of generalization, at least in the area of inflection, to make sure that I have done the whole job; but I will be prepared to put more of the burden on storage rather than processing when there is evidence to support such a conclusion. In the area of derivation, however, it is necessary to reverse that emphasis. This is because of the non-productivity of lexical derivation: a derivation rule (DR) tells us what COULD be a word in the language but not what is a word. As veteran field workers know, a consultant may tell you either that a sentence is (i) OK, (ii) bad, or (iii) bad but possible. A typical response of this sort would be, 'Well, you COULD say that, but nobody says that.' 1 The third reaction occurs when a sentence contains a word which is possible according to the DRs, but which has not yet actually entered the speaker's lexicon. He knows it is a possible word and more or less what it might mean because it fits the patterns of his DRs, but he also knows that it isn't a word YET.
43
PAN-LEXICALISM
To represent this kind of intuition, a lexicase lexicon lists actual rather than potential words, all the words stored by a given speaker at a given point of time. And in fact, there is no other way this could be because a lexicon is stored in a finite space (the brain), but the list of potential words is infinitely long and thus could not possibly be stored in a finite brain. New words and/ or stems are added to the lexicon by acquisition from external sources or by applying internalized Derivational Rules, analogical patterns which each speaker constructs for himself based on regular patterns he happens to notice among the items in his lexicon at any given point. There are no non-word entries in the lexicon. Everything that occurs in the lexicon is a word (minimal free form: Bloomfield 1933: 178), including socalled 'function words'. Elements which are not minimal free forms, i.e. inflectional and derivational affixes (e.g. -ed, -ing, -*,) are not listed in the lexicon and do not have their own branches in a syntactic representation tree. These affixes exist in the grammar only (i) as parts of stems and words or (ii) as elements introduced by DRs or MRs, e.g. -Ii and -d in the following rules: DR-1. >—>
f+Adjl
ki
+Advl
L«F| J ]
->
li]
MR-1. ]
—> d] / [ fpast]
DR-1 allows the creation of new manner adverbs ending in -ly from adjectives, and MR-1 adds the suffix -d to verbs marked for the inflectional feature [+past]. Thus neither Manner nor Past needs to be listed as a separate entry in the lexicon. 2.3.1 Words Lexicase is a grammar of words, not of morphemes, formatives, etc. It claims that there is a fundamental and psychologically real distinction between words on the one hand and phrases or morphemes on the other. The lexicon stores words, sentences are structured strings of words, and grammar is a statement of the internal and external properties of words. Some linguists now deny that words exist at all (Thomas n.d.), but given the Bloomfieldian definition of 'word' as a minimal free form, it is trivially easy to prove that they do: there are too many sentences in a language for anyone to memorize them all, so we must memorize an inventory of smaller units and combine them in various ways to make full sentences. The smallest syntactically indivisible units we have to memorize are words, by definition. QED. If conventional orthography fails to reflect a maximally efficient syntactic analysis based on this principle, then that is the orthographer's problem, not the syntactician's.
44
PAN-LEXICALISM
Right at the outset it must be pointed out, lest the reader do it first, that things are not quite this simple, even at this stage. The problem involves clitics. Firstly, are clitics words, affixes, or something in between? The answer to that is easy: they are not something in between because lexicase makes no provision for anything in between. Next question: given that clitics are minimal, are they free? The first point to make here is that not everything that is called a clitic really deserves the title. Clitics are words that are on their way to becoming affixes, and some of them have already made it, thus becoming a problem for morphology rather than syntax. For syntactic purposes, the relevant distinction is between words (syntactic units) and affixes (morphological units), and within the words between phonologically free words and phonologically and/or positionally bound words (clitics). Separating the new affixes from true clitics can be done by standard structuralist morphological analysis techniques. The stock assumption in lexicase is that a clitic is a word that is phonologically bound but syntactically free, so the true clitics must be listed in the lexicon and treated within the framework of dependency relationships. This raises certain problems for case frames and semantic interpretation, which will be touched on later, and for phonology, which won't be touched at all. 2.3.1.1 Signs and lexical entries In the lexicase view, the Saussurean sign ignored a crucial aspect of linguistic knowledge, with serious consequences that are still being felt in linguistics up to the present time. For Saussure, the sign represented the arbitrary combination of sound and meaning, the unit that had to be memorized individually for each lexical item.
/
sound
\
V meaning J Figure 2.3
What this conception ignores, from the lexicase point of view, is distribution. When a speaker learns a language, he must learn for each lexical item not only how it is pronounced and what it means, but also its distribution, that is, how it is allowed to cooccur with other lexical items. The grammatical properties of a lexical item cannot be accurately predicted from its meaning. For example, house /haws/ is 'the name of a person, place, or thing', and can also function (in the form /hawz/) as a verb, as in: (l)How are you going to feed and house all those bureaucrats? Apartment /opartmnt/ too is 'the name of a person, place, or thing', but (as far as I know) there is no * apartment /apartmnd/ as in:
PAN-LEXICALISM
45
(la) *How are you going to feed and apartment all those bureaucrats? If a linguistic field worker came to your village and asked you if you could say sentence (la), you would probably tell her, 'I know what you mean, and maybe somebody could say that (for example some Washington bureaucrat), but we never say it that way.' But what could you say if she asked you, 'Why not?' House [+V] is simply in your lexicon but apartment [+V] (probably) is not. Yet there is nothing in the meaning of these two words which would allow us to predict this distinction. It is simply a gap, and really needs no explanation, other than perhaps a sociohistorical one (houses have been around longer than apartments.) To capture this kind of intuition, it is necessary to store distributional information along with the traditional Saussurean sound and meaning; apartment [+N] is in the lexicon, but apartment[+V] is not. In effect, the lexicase sign has three parts: sound, meaning, and distribution, and the English2 lexicon will have to contain three separate entries, something like those in Figure 2.5.
Figure 2.4 house /haws/ ~+N +dmcl L +ndpd
house /hauiz/ "+V +dmcl +ndpd +trns
apartment /apartmnt/ "+N 1 +dmcl +ndpd|
Figure 2.5
The general principle being enunciated here then is that if a form occurs in two distinct syntactic environments, it must be represented as two distinct lexical entries. None of the three parts is predictable from the other two, so all must be memorized together as a unit. If two signs differ in any one of these parts, the units are separate lexical entries, and must be memorized separately.3 Even if there is a regular correspondence between pairs of items in terms of sound, meaning, and distribution, as there is for example in quality adjectives and manner adverbs ending in -/y, the forms must still be
46
PAN-LEXICALISM
memorized separately unless the correspondence is 100 per cent productive and uniform, which in this case it is not. Any such regular correspondence between lexically distinct sets of items can be captured by a derivation rule. One might propose that when sets of items are related by a derivation, we can save space and capture a generalization by leaving one set out of the lexicon and predicting its members by regular rules applied to the members of the other set. However, such correspondences are almost never 100 per cent productive and uniform (Salkoff 1983); one member of a pair may not exist at all, and if it does, the sound or more commonly the meaning may not be exactly analogous to other similar patterns. Thus a derivation rule, an analogical rule which states a perceived correlation among pairs of lexical items, will not be able to predict one set from the other, and there will be a difference between the set of possible words, which are generated by all the DRs, and the set of actual words which exist in the lexicon at any given time. There will also be a difference among speakers as to which DRs they recognize and how they perceive the analogy. Both sets are part of a speaker's competence, so both the list of actual words and the rules which generate potential words must be included in a grammar. The occurrence of adjectives in noun environments illustrates the point that different distributions require separate entries. A form like poor appears in adjectival as well as nominal environments, e.g. (2)
[^p Tfye poor slobs] can't even remember their account numbers.
(3)
[^The poor] ye always have with you.
The poor in (2) must be analyzed as an adjective, since it occupies the normal position for English adjectives. However, the poor in (3) must be a noun. This follows from the fact that the poor is a noun phrase (that is, an N'), and since lexicase allows no deletions and no phonologically null elements, one of its two constituents must by definition be the head N. Since poor in this sentence occupies the normal position for the head noun of a noun phrase, and since the never occurs as the sole representative of a noun phrase, this poor must be the head noun. Accordingly, there must be two separate lexical entries for the form poor4" (see Figure 2.6), and a derivation rule to relate them.5 The following reasons can be given to motivate this analysis: poo^
poor2
"+Adj" +dprv
"+N 1 +dprv +dfnt +humn | +clct
+plrl J
Figure 2.6
PAN-LEXICALISM
47
(i) poor j occurs as the head of a noun phrase in ordinary English expressions exemplified by The Good, the Bad, and the Ugly, without any preceding context needed to supply an antecedent. (ii) The source adjectives may be able to refer to a broad range of noun types, but the derived nouns seem to refer only to collections of humans. Thus good can refer to anything, but Good in the Good, the Bad, and the Ugly seems to refer only to people. Since this narrowing of reference is not predictable from the meaning of the adjective alone, it must be memorized and stored separately. (iii) poorx and poor2 have different distributions with respect to preceding modifiers such as working, which would be awkward to explain with a zero head or head deletion (decapitation) analysis: (a) the poort working slobs (b) *the working poorl slobs (c) the working poor2 (iv) Not all adjectives can occur as collective nouns. Thus there is no: (4) *The utter made fools of themselves. (5) *The late (• 'deceased') will rise again on the latter day. If we assume a transformational decapitation analysis, only rule features could 'account' for these gaps, but in a lexical derivation analysis these are just the kind of gaps one expects, since lexical derivation is prototypically non-productive. (v) The lexicase analysis also accounts for the specialization of meaning that may occur with these constructions which again is a typical situation with lexical derivation. Thus, for example, poor can refer to economic deprivation, {the poor house), inferiority with respect to a standard (poor accommodations), or pitiability {poor little fool), but the poor as a noun phrase refers only to the economically deprived, not the pitiable or those objects which are substandard. To take a similar example from German, why does the noun Junge have only the meaning 'boy' (or young animal) when the corresponding adjective jung means simply 'young' or 'recent' and has no restrictions as to sex or even animacy? Why can't a Junge be a girl, a young man, a young woman, a Junggrammatiker, or a recent election? And speaking of German, why do the Germans consistently capitalize the counterparts of English poor in the poor? In German, only nouns are conventionally capitalized, so the German orthographers must think that Junge is a noun. (vi) A similar situation obtains in Spanish, where many words are conventionally listed twice in the dictionary, once as adjectives and once as nouns. As in English, the adjectival entry pobre 'poor' has a broader meaning, while the noun pobre is glossed as 'poor person, beggar'. In addition, there are a number of noun entries with diminutive suffixes, all with glosses related to the other meaning of the adjective pobre:
48
PAN-LEXICALISM
pobrecita, pobrecito, pobrecico, pobrecica, pobrecillo, and pobrecilla: 'poor little thing'. Diminutives are prototypically derived from nouns, 6 which suggests the existence of a noun pobre 'pitiable person' to serve as the source for the derivation. Possible arguments against this analysis, with responses, are: (vii) Q: If poor2, good, etc. are plural nouns, why don't they have the plural suffix -s ? A: Not all English plural nouns have an -s suffix. In particular, there is a set of underived human collective nouns including police, clergy, intelligentsia, and faculty which have exactly the properties stated by the rule: they are definite plural collectives with no necessary final -s. The DR above can thus be seen as a mechanism for adding new members to this class. (viii) Q: If poor is a noun, why does it allow adverbial modifiers like very ? A: In a sentence like: (6)
The very poor are an embarrassment in a welfare state. the phrase the very poor is clearly a noun phrase, that is, an NP. By lexicase law, an NP is an N ' , and an N ' must have an overt N immediately under it. Poor is in the syntactic position of a noun, so it is a noun. Consequently, the argument turns out to be circular: poor is not a noun because it is modified by an adverb, and adverbs do not modify nouns because poor in (6) is not a noun. 7
Similar arguments apply to demonstratives. Thus the form this occurs in two different syntactic slots and so must have two distinct lexical entries: (7) This argument is ridiculous! [+Det] (7a) This is ridiculous!
rl
l+prnnj The same argument applies to possessive determiners for example: (8)
John's students are sneakier than Mary's students. [•Det]
(8a)
[+Det]
John's are sneakier than Mary's. l+prnnj
[+prnnj
and the separate-entry analysis is supported by the fact that there are forms which occur in one environment but not the other, e.g.
49
PAN-LEXICALISM (7b)
The
argument i s r i d i c u l o u s I
[•Det] (7c) *The i s
ridiculous)
l+prnn
and by the existence of entries in both the [+N] and [+Det] classes which because of their morphological shapes could not be treated as Det's with their head nouns missing (cf. Hudson 1979c: 17): (8b)
My/*mine students are sneakier than h e r / * h e r s students. [•Detl
(8c)
[+Det]
Mine/*my are s n e a k i e r t h a n h e r s / * h e r . +N L+prnn
+prnnj
Again we can say that there are two separate lexical classes, [4-Det] and [+N, +prnn], each of which contains some underived members, and that there is a derivation rule deriving demonstrative and possessive determiners into the pronoun class (cf. Starosta 1971a;, Starosta 1985f) (see Figures 2.7 and 2.8).
mine
my
hers
her the "+Det " -posn +dfnt +rtcl
this .j "+Det " +posn +dfnt -rtcl
John's.
this2
"+Det +posn +dfnt -rtcl
"+N +posn +dfnt +prrn
Figure 2.7
DR-2.
+Det
+N
-rtcl
+pmn -prsn
Figure 2.8
John's2
"+N
]
+posn +dfnt |
+prrn j
50
PAN-LEXIGALISM
2.3.1.2 Classes and 'parts of speech' Tarts of speech' have been defined traditionally in notional terms ('a noun is the name of a person, place, or thing'), morphologically in structural grammar (adjectives are words with a comparative and a superlative form), functionally ('an adverb modifies a verb, an adjective, or another adverb'), or distributionally ('a pronoun takes the place of a noun'). There are difficulties with all these definitions, though. (i) Notional definitions: what about sentences like 'We went for a long run on the beach'? Most words that fit in the slot 'a long ' are nouns, and it will be awkward for the grammar to treat 'run' here as a verb just because it describes 'an action or a state of being'. (ii) Morphological definitions: one problem is that morphology is often defective, and while it may help us define prototypical members of a word class, it can't b e the only criterion for establishing syntactically valid word classes, or even the primary one. 'Squishes' for example (cf. Ross 1972;, Comrie 1981: 103) are the result of applying morphological criteria of word classification in situations where morphology is defective. A 'squish' is a grammatical class which forms a continuum, with some of the members of the class looking like category X, others more like category Y, and the rest spread out along a cline between the two extremes, a situation which is irreconcilable with the lexicase atomic analysis of lexical classes. As an example, Chinese 'coverbs' (DeFrancis 1963: 83) are a category of elements which are identical to verbs in form but which function like prepositions. Several even have verbal morphology. For example, weile 'for' looks like the obsolete verb wei 'do, make, cause' with a following perfective verbal suffix .le. The suffix however is frozen and inseparable from the stem, and syntactically weile behaves like any other Chinese P. If morphology is allowed as. a primary criterion for word class identification, this is a squish, but from a syntactic standpoint, there is no problem: if it has the distribution of a P, it is a P,8 and the verbal connections are just historical residue. Another problem with morphological criteria is that syntactically distinct word classes may be inflected for the same categories. In Latin, for example, determiners and nouns are inflected for gender, number, and case, yet probably no one would deny that these are different 'parts of speech'. Chomsky (Chomsky 1970: 199) discusses and rejects the reasoning that if two categories such as nouns and adjectives crossclassify for a particular feature, those categories both belong to the same super-category. (iii) Functional definitions: when people talk about 'adverbial clauses', they are making a typical error of confusing what something does with what it is; adverbial clauses may modify a verb, but they have nothing to do with the lexical class of adverbs; they are PP's with an S phrasal head. (iv) Distributional definitions: this is basically the structuralist method, and comes close to the lexicase method. In fact, the selection of this as the paramount lexicase criterion for word classification is really a foregone
51
PAN-LEXICALISM
conclusion: if we are interested in syntax, that is, the way words link together, then the only relevant classification will be one based on distribution, that is, the way words link together in constructions. Notional or functional or morphological criteria will be useful only to the extent that they coincide with distributional ones. In lexicase, 'syntax' is dependency and constituency relations among word classes. Thus definitions of 'parts of speech' will be stated in dependency and constituency terms, and especially in the terminology of strict X-bar notation, which is really a fusion of dependency and constituency notions. Thus for example a noun is defined as the lexical head of an NP (where NP — N'), a verb is the lexical head of a sentence (where S — V '),9 etc. From another perspective, words are classified according to their relationships to other words; to say that a word can act as the head of an N ' is to say that it allows a certain range of word classes as dependents and that it is allowed as the attribute for another class of words. This does not exclude supplementary definitions in terms of function (after all, dependency is a function of attribution) or morphology (there will be a high correlation between syntactic class and inflectional category, almost by definition) or even semantic/ notional definitions, as long as we recognize that we are talking about prototypical members of a class rather than necessarily about all members. 2.3.1.3 Definitions The current definitions for the parts of speech assumed in lexicase grammars are as follows: N: a noun is the lexical head of a noun phrase. Nouns bear case forms and case relations. V: a verb is the lexical head of an endocentric clause. In languages which allow NP and PP predicates (Starosta, Pawley, and Reid 1982:150), verbs differ from predicate Ns and Ps in that: (i) [H-prdc-predicate] is a lexical feature of Vs but an inflectional feature of Ns and Ps; (ii) V-bars are endocentric but [+prdc] P-bars are exocentric; and (iii) Vs don't allow Adj or Det attributes but [+prdc] Ns do. Adj: an adjective is the head of an adjective phrase, an endocentric nonpredicational attribute of a noun. Ado: an adverb is the head of an adverb phrase, an endocentric attribute of a verb, adjective, or adverb. Det: a determiner is the head of a determiner phrase. It is similar to an adjective except that it usually or always occurs at the boundary of an NP and is is radically endocentric: it has no attributes of its own at all. P: a preposition' or 'postposition' is the head of a PP. Ps are a class of words characterized by the syntactic property of cooccurring as the lexical head of a binary exocentric construction with a phrase, for example: i •
P
i I
phrase
i i
phrase
i •
P
52
PAN-LEXICALISM
where the cooccurring phrase can be JVP, S, or PP. This lexical class thus includes not only traditional prepositions such as English at, with, etc. (P + NP), but also 'complementizers' such as to and that and 'subordinating conjunctions' such as after, while, when, because, and that, which occur in exocentric construction with verbal rather than nominal co-constituents (P + S; Edmonds 1976: 172-3). The term 'postposition' is sometimes used very loosely to refer to any element that comes at the end of a nominal-looking phrase. For example, the term is used in Mandarin Chinese grammar to refer to elements which turn out to belong to at least three grammatically quite distinct sets (Starosta 1985c: 249-53, 256-9), only one of which qualifies for the term 'postposition' in the sense in which I am using it here.10 On the other hand, too much theory can also result in a bad analysis. For example, Hong (Hong 1985, cited in Choi 1986: 26) analyzes Korean nominal constituents ending in -eykey as PPs, but constituents ending in ~lul as NPs, in spite of the fact that the morphological characteristics and basic syntactic environments of -eykey and -lul are identical. The reason given is to make the c-command relationships work out in such a way as not to violate the locality principle of Chomsky's Binding Principle. This type of classification of morphemes is purely 'functional' in a strongly theory-dependent sense (it functions to make the GB theory appear to work), and clearly has no necessary association with any morphological or syntactic facts. cnjc: a coordinating conjunction forms an n-ary (n > 2) exocentric construction with two or more phrases from the same major phrase type (NP, S, etc.) and zero or more other conjunctions. i
phrase
i
cnjc
i
phrase
SPart: a sentence particle is the clause-level counterpart of a determiner; it is the head of a SPart phrase, and is similar to an adverb except that it usually or always occurs at the end of a clause and is is radically endocentric: it has no attributes of its own at all. This category is marginal or non-existent in English11 but ubiquitous in East, South, and mainland Southeast Asian languages, for example. 2.3.2 Features In a lexicase analysis, lexical features are the basic descriptive mechanism.12 The properties of individual lexical items are represented by features, generalizations are expressed as relationships among sets of these features or among sets of lexical entries identified by particular features, and constraints on expressive power are also mostly stated in terms of features. Colleagues and students sometimes react against what they perceive as an excessive use of features in lexicase grammars. Are these really all necessary? The answer is yes, not only for lexicase but for your grammar too, dear reader. Grammatical generalizations are statements about classes of items, and features are simply indicators of class membership, and thus necessary in
PAN-LEXICALISM
53
some form or other to any grammar. In order for a grammar to be general, it must refer to lexical subsets as classes rather than as lists. This means that each lexical item must be marked somehow for its membership in the grammatically relevant classes to which it belongs, so that a rule can refer to it by class membership rather than by listing. Features are simply a mechanism for indicating class membership, so they have to be there. Whatever notational devices are used to represent the class and sub-class membership of individual lexical items are features. Even if a grammar contained no rules at all, but just marked individual lexical items for valence, that is, potential cooccurrences with other lexical items (which is in fact perfectly feasible), there would still have to be features, since the contextual specifications on the regent and dependent items would have to refer to classes and subclasses rather than lists of individual words. To cite the old transformational grammar argument, it is impossible that speakers memorize all the regentdependent pairs they have heard because (i) our memories are not that good, and (ii) we know that speakers can produce, understand, and render consistent judgements on combinations they have never heard before. The features involved in lexicase lexical representations are established by means of componential analysis, contrast, and syntactic consequences (Sloat 1975). Features in a lexicase grammar are of one of two types: contextual or non-contextual. Contextual features specify ordering and dependency relationships among major syntactic categories ('parts of speech'), agreement and government requirements, case frames, and 'selection', semantic implications imposed by head items on their dependents. Non-contextual features characterize class memberships, including membership in purely syntactic categories such as [+N] and [+V], purely semantic, for example [+dgre] 'degree';, [+soft], etc., or in between, for example [+mass], [+prnn] (pronoun), or [+plrl] (plural). The 'in-between' features are semantic features which have external syntactic consequences, either in terms of entering into agreement relationships, such as [+plrl], or in restricting the range of elements allowed to cooccur in the same construction, such as [±mass] or [±prnn]. They include all inflectional features, including person, number, gender, and tense features as well as localistic case form and case relation features. Whether features should be binary, as they have been in lexicase until recently, allow integer values as assumed in Chomsky's Aspects (Chomsky 1965: 175), or permit a much more complex range of values, as assumed in Generalized Phrase Structure Grammar (Gazdar et al. 1985: 17-40) remains an open question. Lexicase currently assumes that only binary features and implicational contextual features are possible, since this limits the class of possible grammatical descriptions. The choice of binary features rather than atomic predicates to represent this semantic decomposition is a reaction against the excessive power introduced in Generative Semantics (Starosta 1982a: 297), and the difference in power between binary features and the abstract predicates used in generative semantics is a consequence of the fact that atomic predicates are hierarchically ordered with respect to each other, and can be repeated significantly, whereas features are unordered;
54
PAN-LEXICALISM
consequently repeating features has no significance. Thus assuming a single value for the components cause, become, and alive, only one distinct feature matrix can be constructed (see Figure 2.9), but (assuming the minus feature value on alive to be a predicate) an unlimited number of distinct atomic predicate configurations are possible (see Figure 2.10). An atomic predicate analysis claims in effect that kill, illk, klli, and kkill and many other configurations should all be possible semantic representations for words in some language, but a binary feature analysis predicts that only one of them should be possible. kill
kill
kill
+cause
"•become"
•cause
r
•become -alive
=
-alive •cause
=
-alive •become
kill
=
•cause 1 •cause •become
-alive
J
Figure 2.9 kkill
kill CAUSE BECO)ME/S. ME CAUSE NOT yn NOT A ECOME y\ BECOME ALIVE I NOT ALIVE BECOME CAUSE ALIVE
U S E
/ \
:AUSE /
\
BECOME/H NOT ALIVE
Figure 2.10
The ultimate justification of this choice will consist in showing that a system constrained in this way is still capable of capturing all the grammatical and lexical semantic generalizations about a given language. 2.3.2.1 Non-contextual In lexicase, the meanings of lexical items are stated in terms of distinctive features. This reflects the view that an attempt to state what something means is misguided, since (i) the set of descriptors we would need to represent everything we know about a word seems open-ended, and (ii) it is not clear that people really know what exactly they are talking about (cf. Hudson 1985: 35). Rather, lexical meanings are rather fuzzy around the edges, subject to a lot of individual variation, and seem to work in terms of prototypes (Hudson 1985) rather than precisely delimited concepts. However, even if speakers cannot tell you exactly what something means, they are usually able to tell you if two words mean exactly the same thing or not. It is this kind of intuition which distinctive features are very well suited to representing. As a minimum goal, and in fact a practical one,13 the semantic analysis of some set of vocabulary items must ensure that all pairs of items which are perceived as non-synonymous are shown as differing in at least one feature. This can be
55
PAN-LEXICALISM
seen as an answer to the question about how a grammar should distinguish between dictionary knowledge and encyclopedic knowledge: by requiring that every entry be distinct from every other (non-synonymous) item in at least one binary semantic feature, we have established a relatively nonarbitrary practical answer to the question as to the minimum amount of semantic information to be included in a dictionary entry. As a further requirement, we can ask that all lexical items which are perceived as similar to each other in some way should agree in at least one feature, and a third requirement, or perhaps a heuristic, is that symmetric feature breakdowns are more highly valued. When all the distinctive and shared elements have been extracted and represented, the lexical item has been 'decomposed' into its basic elements, and the analysis is complete. This method of lexical description, which is known as componential analysis, has an extensive literature; cf. for example Bendix 1966. 2.3.2.1.2 Inflection Lexicase accepts the traditional criterion of maximum differentiation (Williams 1984: 644) for recognizing a distinct inflectional category: there has to be an overt distinction SOMEWHERE to talk about 'inflection' in the class at all. However, this distinction can be morphologically minimal, as it is for the [±Nom] category for English nouns (where only a subset of the pronouns show a stem difference) or it can even be manifested by word order alone, as [±Nom] is in Mandarin Chinese (Starosta 1985c: 218), especially if the same category is manifested morphologically in other languages.14 Class membership is logically and temporally prior to the formation of inflectional paradigms. Words are inflected for a particular category because they are in the same syntactic class, and in fact that is one of the defining characteristics of inflection: if a given affix remains with a stem when that stem appears in two different syntactic classes, then that affix is by definition not inflection. For example, although traditionally the comparative and superlative adjectival endings are regarded as inflection,15 that cannot be correct, because they carry over in zero derivation, for example: (9) Hary was a big disappointment. [•Adj] (10) Their hot fudge sundaes are really big. [+V] (11) Brunhilda was a bigger disappointment. [+Adj] (12) The banana split is even bigger.
56
PAN-LEXICALISM (13)
John chose the bigger of the two. [+N]
(14) Michiko was the biggest disappointment. [+Adj] (15) The Hog Trough was the biggest of all. t+N] 2.3.2.2 Contextual The lexical entry of each word includes contextual features which specify the subordinate attributive words with which it may cooccur. These positive and negative contextual features in turn determine the well-formedness of the phrases in which the items occur, where 'sentence' is simply a type of phrase with a finite verb as a head and no external syntactic connections (upward dependencies). Contextual features are the part of the lexical representation which makes phrase structure rules unnecessary. A contextual feature is a kind of atomic valence, stating which other words may attach to a given word as dependents to form the molecules called 'sentences'. Contextual features may function syntactically, morphologically, or semantically. For example, the feature [— [+Det]] on English nouns states that English determiners may not follow their nouns, and another feature, [+[+Det]], is marked on definite common nouns to show that they must cooccur with determiners; the features [—[+Nom,+plrl], — [+Nom, —spkr]] mark a Spanish verb as first person singular and trigger the addition of an -o suffix; and [D[+PAT, -fanmt] ] on a verb such as die imposes an animate interpretation on its subject. Contextual features may refer to dependents occurring on the left or on the right, or they may be non-directional, referring to sister dependents on either side when the presence of some category is important but the order varies (as in English subject-auxiliary inversion; cf. Starosta 1977) or is irrelevant (as in free word-order languages). Thus they allow the grammar to state grammatical relationships in terms of either immediate dominance or linear precedence as appropriate, just as Generalized Phrase Structure Grammar does, but without the necessity of setting up two distinct kinds of phrase structure rules, Immediate Dominance Rules and Linear Precedence Statements (Gazdar, Klein, Pullum, and Sag 1985: 46). Selectional features are also contextual, but they differ in function from grammatical contextual features. Thus a verb like love may impose an animate interpretation on its subject by means of the following selectional feature: p [ + A G T , +anmt]] (AGT — 'Agent'). This feature states that the verb love interprets its Agent to be animate, regardless of the actual semantic features of the agent (cf. McCawley's citation of Fillmore, McCawley 1968: 267). This provides a straightforward account of metaphor; for example, in the sentence in Example 16.
57
PAN-LEXICALISM (16)
Nasturtiums
love
+V
[+N -anmt L+AGT
rain.
]
3 +AGT 1 +anmt]j
Nasturtiums is lexically inanimate, but when it occurs as the subject of love, it gets an animate interpretation because of the selectional expectations of love. Contextual features in a lexicase grammar are similar in form to inherent features except that they specify an environment, either preceding, following, or both (see Figures 2.11, 2.12). (a) [+[+F] ] required in the preceding environment (b) [+ [+F]] required in the following environment (c) [+[+F] [+F] ] required in both the preceding AND following environments; or (d) +[+F] required in both the preceding AND [+ [+F] following environments required in either the preceding OR following environ(e) [+[+F] ] ment (or both) 0 [~H[+F])] ALLOWED in either the preceding OR following environment; blocks c-e in Figure 2.12. Figure 2.11
(a) [—[+F] ] excluded from the preceding environment (b) [— [+F]] excluded from the following environment [+F]] excluded from both the preceding AND following (c) E~[+F] environment; or (d) — [+F] excluded from both the preceding [— [+F] AND following environment; or (e) [—[+F]] excluded from both the preceding AND following environment Figure 2.12
A third type of contextual feature is an implicational feature, for example [3[+F]] 'expected in the preceding or following environment'. This latter type of feature is needed to account for what in Government and Binding type transformational grammars are called 'empty categories', for example subjects of infinitives ('PRO'), missing pronominal subjects in languages such as Swahili and Spanish ('PRO drop'), and gaps corresponding to wh -constituents ('wh trace'), plus 'subject surrogates' in pseudoimpersonal constructions and missing constituents in potential ('tough movement') constructions (Pagotto 1985b).
58
PAN-LEXICALISM
The ways in which words can combine together are strongly restricted by the Sisterhead Constraint, which states that a word can contract a grammatical relationship only with the head of a dependent sister construction, and the One-bar Constraint, which requires every construction to have at least one lexical head. The result is syntactic tree representations which are flatter, since there are no intermediate nodes between lexical entries and their maximal projections, and more universal, since there are only a very limited number of ways in which languages can differ in their grammars. Lexicase contextual features combine properties of the two types of contextual features introduced in Chomsky's Aspects, selectional features and Strict Subcategorization features (Chomsky 1965: 95). Like selectional features, they refer to contexts stated in terms of other lexical features rather than constituent labels, and like Strict Subcategorization features, their range of sensitivity is limited to sisters, a property incorporated in a lexicase grammar by means of the Sisterhead Constraint. This constraint requires that the [+F] in the above examples refer only to features marked on the lexical heads of sister attributes. In the case of exocentric attributes (see the following section), the [+F] refers to a Virtual matrix' composed of the combined features marked on all the lexical heads of the construction (Acson 1979: 67). By the Sisterhead Constraint, contextual features refer to the features marked on the heads of dependent sisters (that is, in X-bar terms, to the lexical heads of Comps). However, if a construction is exocentric, it has more than one head, and consequently the features of ALL the co-heads of the exocentric construction jointly subcategorize the regent. This situation can arise when the Comp is either a coordinate construction or a PP (see Figure 2.13).
Figure 2.13 Because coordinate constructions are exocentric, they take their features from all their heads. The constituent Jack and Jill has three co-heads, the conjunction and and the two NP's [NP Jack] and [NP Jill]. Thus the construction as a whole takes the features [+cnjc] ('conjunction'); from the lexical head word and [+cnjc], and [+N] from the lexical heads Jack [+N] and Jill [+N] of the two phrasal co-heads [NP Jack] and [NP Jill]. The
PAN-LEXICALISM
59
resulting combined matrix is represented in a lexicase tree diagram as a VIRTUAL MATRIX under the constituent, and when the regent of such a construction looks down at it, it 'sees' the virtual matrix. Thus when the verb went 'looks' to the left at Jack and Jill, it 'sees' [+N,-fcnjc], and if it has, say, a requirement that it be preceded by a constituent whose head is marked for the feature [+N], that requirement is satisfied. The same kind of representation is used for Prepositional Phrases, which are also exocentric. The regent of an exocentric construction 'sees' the virtual matrix containing the features of the P and of the lexical head of the P's phrasal sister. Thus when the verb went 'looks' to the right at up the hill, it 'sees' [+P,-f-N], and if it expects either [+N], [+P], or both to its right, it is satisfied. Note that a contextual feature [+ [+P,+N]] is different from a feature [+ [+P][+N]]. The former requires a single dependent right sister whose head contains BOTH [+P] AND [+N] (which in lexicase could only be an exocentric PP), while the latter requires that TWO dependent sisters occur to the right, the first one marked by [+P] and the second one by [+N]. A virtual matrix is only apparent to an outside regent of an exocentric construction looking down on it from the outside. Within the exocentric construction itself there is no such matrix. We can think of the virtual matrix as formed by copying the (non-contextual) features of all the lexical heads of phrasal co-heads onto the matrix of the lexical head (and this is in fact the way it was implemented in the first lexicase parsing algorithm (Starosta and Nomura 1986)).
2.4 RULES AND ORGANIZATION 2.4.1 Lexical rules versus transformations Lexicase claims that it is unnecessary to use transformations in a generative grammar, and that all grammatical rules are generalizations about the lexicon. The obvious question which then arises is: how does lexicase do the things that transformations used to do? In a transformational grammar, transformations have two basic functions, the historically original one and a more recent derived one. The original function was to capture perceived relationships among different sentence types, either by mapping one 'surface' structure directly onto another structure (the earliest conception, originated by Zelig Harris) or by deriving related sentences from identical or similar abstract 'deep structures' (Chomsky's approach.) In a lexicase grammar, this same function is handled by lexical derivation rules. Every syntactic construction is characterized by the contextual features of its lexical head; a derivational rule relates one kind of head, say transitive verbs, to another kind of head, say passive verbs (cf. Starosta 1971b, 1977), and thereby indirectly relates the two structural configurations in which these heads occur (see section 2.4.3.3. One unintended and unfortunate function of transformations is to protect them from empirical falsification by making available an instant 'account' for
60
PAN-LEXIGALISM
any real or imaginable linguistic phenomenon. This of course greatly decreases the empirical content of any grammar containing such mechanisms, and consequently a major emphasis of lexicase theory has been to avoid any powerful formal mechanisms which would have this same effect. There has been increasing awareness in the last fifteen years that much of the descriptive burden in a grammar belongs in the lexicon, and reliance on lexical rules at the expense of transformations has grown steadily in linguistic theory and in such areas as computational linguistics and language acquisition. As Adjemian 1983: 253 observes: Research so far has identified several types of lexical rules, including word formation rules . . . and compound rules (Roeper and Siegel 1978, Jackendoff 1975). In addition, certain structural relations that in the past were analyzed syntactically are now being handled lexically by many linguists (see Hoekstra et al. 1980): certain passives (Wasow 1977, Bresnan 1978), dative constructions, certain complement structures (Baker 1979), etc. and further: Lexical rules, for example, differ from syntactic transformations in being obligatorily structure-preserving, in being able to shift syntactic categories, in involving only information available from an item's strict subcategorization, in not being extrinsically ordered with other rule types, in permitting idiosyncratic exceptions, in involving semantic compositionality, and in not being fed by syntactic rules (Aronoff 1976, Wasow 1977, Roeper and Siegel 1978). 2.4.2 Morphology 2.4.2.1 Words and paradigms Since lexicase grammars are descriptions of the internal and external properties of words, and because they generate inflected forms as paradigms rather than as sequential arrangements of morphemes, they belong to the class of 'word and paradigm' grammatical models of linguistic structure (Hockett 1954: 386, Robins 1959). As Hudson comments (Hudson 1984: 44), 'The discrepancies between morphological and syntactic (or semantic) analyses are well known, and point fairly uncontroversially to some version of the 'word-and-paradigm' analysis advocated by linguists such as Matthews (1974 Matthews 1974) and S. R. Anderson (1977 S. R. Anderson 1977).' Robins states in his article that 'WP can be formalized just as fully as IA or IP' (op. cit.: 119) and that 'WP when reworked in terms of current formal criteria deserves proper consideration as a means of stating and analyzing grammatical systems' (op. cit.: 144). Lexicase aspires to provide such a formal reworking and to merit such consideration. 2.4.2.2 Word structure Lexicase does not include internal morpheme boundaries in its lexical representations. In this respect it differs from Scalise's analysis (Scalise 1984: 82,90) and parallels Hudson's word grammar:/... a word grammar will give a general definition for "word", and also one for "sound segment", but will make no generalizations about "morpheme"' (Hudson 1984: 56). The internal
61
PAN-LEXICALISM
morphological structure of words is irrelevant to their grammatical properties (though morphology may be useful in parsing). Speakers may not even know the internal morphological structure of words, and it won't affect their use of the language. This goes not only for such residual cases as: fall
:
fell
grass
:
graze
glass
glaze
brass : braze but probably also for more common things, such as strong verbs and irregular plurals. Based on my experiences teaching introductory linguistics and morphology courses, I have concluded that speakers vary significantly in the extent to which they have analyzed the internal structures of the words in their lexicons. Because internal word structure is irrelevant to syntax, it will be hard to detect this difference in people's ordinary use of language. How can we tell if someone has recognized the partial phonological-semantic similarity between drive and drove or see and saw ? And if we are interested only in syntax, what does it matter? 2.4.2.3 Inflection versus derivation Morphology can be defined as modifications of word forms introduced in the process of deriving new words into the lexicon (derivational morphology) or assigning phonological exponents to the various members of inflectional paradigms (inflectional morphology). Lexicase makes the classical distinction between inflection and derivation, though in many cases it draws the boundary in different places. The basic intuition reflected by this distinction is that each derived form is a distinct word, a separate independent lexical entry with its own history, whereas inflected forms are ideally alternate forms of a single lexical entry, typically all derivable by regular rules from a single representative stem. Inflection can be distinguished from derivation in accordance with the criteria in Figure 2.14 (cf. Li 1973: 234). The following examples illustrate these distinctions: Derivation:
DR-1.
r+Adf|
"+Adv" aF. i _+mnnr
] -»
in
Inflection: a)
SR-2.
"+V " —•» [±past] +fint
b)
MR-1.
]
—» d ]
/
[+past ]
62
PAN-LEXICALISM
Inflection applies to existing lexical entries*
Derivation creates new lexical entries
(i) Forms PARADIGMS, i.e. sets of mutually exclusive variants of a lexical entry which occur in the same syntactic slot
Forms WORD FAMILIES, i.e. 'chains' of lexical items linked together in the lexicon by partially shared form and meaning
(ii) Productive — existence: may occasionally have gaps — meaning, distribution predictable — form typically predictable
Non-productive — existence: typically has gaps — meaning, distribution not predictable — form may vary widely; alternate affixation patterns may exist
(iii) Applies obligatorilyf
Applies optionally and sporadically when coining or interpreting new words
(iv) Can't change pre-existing features; all inflected forms belong to the same syntactic class
Any pre-existing phonological, syntactic, or semantic feature may be replaced, especially major class features
(v) Affixes are unique to a particular syntactic class of words, and don't carry over when the word is derived into a different major class; i.e. inflection bleeds derivation
Affixes become an inseparable part of the new lexical representation, and the new entries will be inflected in accordance with their new class membership; i.e. derivation feeds inflection
*Or perhaps does not 'apply' at all, if inflected forms are stored as fully specified items. fSee footnote above.
Figure 2.14 DR-1 states that it is possible to add a new word to the lexicon which has the same feature representation [aFj] as some adjective except that the new form is an adverb [4-Adv] with the additional feature 'manner' and ends in /li/. SR-2 on the other hand states that every finite verb matrix [+V,4-fint] is an abbreviation for two matrices which are identical to each other except that one is marked [—past] and the other is marked [+p&st]. MR-1 adds a final / d / to any word which received the feature [+past] from SR-2. Lexical derivation is based on analogical patterns which individual speakers may establish as they recognize phonological-syntactic-semantic relationships among pairs of lexical items in their lexicons. Such analogies may be represented in the classical analogical pattern, that is:
PAN-LEXICALISM
63
w : x :: y : r
Where the original pairs w : x may have come from is immaterial to the operation of the process of derivation; they could be learned from other speakers, borrowed directly or indirectly from other languages, or be created by other derivational processes.16 A derivation rule (DR) is the formal statement of a perceived analogical relationship between pairs of words in the lexicon (see Figure 2.15), that is, for every quality adjective with the semantic features aF { , there may exist an adverb with the same semantic features plus the feature 'manner', differing in form from the adjective in the suffix -ly. Such analogies however, are almost never completely productive, for example: good [+AdjJ
* goodly [+Adv]
friendly f+Adj)
* friendlily [+AdvJ
late «j [+Adj]
* lately [+Adv]
pregnant [+Adj]
? pregnantly [+Adv]
'deceased'
fuchsia [+Adj)
? fuchsialy [+Adv]
perturbed [+Adj]
Iperturbedly [+Adv]
and when superficially corresponding forms exist, they may not have completely analogical semantic properties, often because they preserve a meaning from an earlier time which is no longer present in the synchronic 'source', for example late : lately. e.g. calm
+Adj
+Adv
+qlty > — *
+mnnr
cool
coolly
ocR
red
redly
ocR
1
>—•
calmly
iyl
Fi H+Adv][+N] " -[+NH+N1
a.
j-[+Adv] [+Adv]
c.
b
Figure 2.17
These latter two features are unnecessary when a broader range of data is taken into consideration, since sentences such as (16) and (17) will also have to be generated directly by the grammar: (16)
Okra I never eat.
(17) John rarely completely finishes a meal. Once again, the corresponding PSRs would have to be complicated rather than simplified in order to account for these two examples. RR-9 states that the main verb of a clause may be followed by a complement sentence; that is, it may be followed by a sister constituent whose head is a verb: RR-9.
[*V]
—>
•([•V])
a. b.
It may also, by RR-4a-b, be followed by a preposition. We know that the complement sentence and the P must form a single constituent because RR4c-d prevent the cooccurrence of distinct PP and S in any order as sisters of V. They must form a single exocentric construction because RR-4e implies a sister containing the features [+P, +V], and matrices containing two positively specified major category features can only appear as virtual matrices associated with exocentric constructions; see Acson 1979: 67-71. RR-4f-g prevent multiple occurrences of a following PP or complement sentence (see Figure 2.18). RR-4.
[+V]1 — * "+ (I+P } 1
-H3] -I+PH+VJ
1
-[+VI+P]
a. b. c. d.
~ e. -f+p][+p]
f.
_4+V][+VL
g-
Figure 2LI B
In a lexicase grammar, case forms (which correspond to some extent with the semantically empty 'grammatical relations' of relational grammar and
PAN-LEXICALISM
66
Case of GB) and case roles (similar to GB's Thematic Relations, and recently incorporated into RG as 'syntactic-semantic grammatical relations' (Bell 1981; cf. Hou 1979: 2) are also represented as features of lexical items (Starosta 1976: 26-31), and many of the kinds of cooccurrence restrictions that appear in rules such as RR-2-4 would be stated in a full grammar in terms of case. Generalizations about major constituent order and syntactic typology, for example, are easily stated in this way. Thus if [4-Nom] 'nominative'; represents grammatical subjects (Is) and [+Acc] 'accusative' represents 'objects' (2s; actually a 'direct object' is an accusative constituent bearing the Patient case relation, not just any accusative NP), then various syntactic typologies can be represented through RRs applying to main verbs, as illustrated in Figure 2.19. S V RR-5.
O
[+V]—>
• - _
_ [+Nom ] "
H+ A c c l V S RR-6.
—>
•-[+N]_ .-
RR-7.
[+V]
b.
0
[+VJ
S 0
a.
[+Acc ] [+Nom ] ^
V —*
•-_[+N] -{+Acc ] [+Nom ]
I
e.
J
Fig ure 2.19 For SVO languages, a. and b. guarantee that the subject if any precedes the verb and the object if any follows it. For strict VSO and SOV languages, one feature guarantees that NPs occur on the correct side of the verb, and the other prevents the subject and object from ever appearing in the wrong order. In more adequate grammars, these rules can be simplified by the omission of c. and e., since VSO languages such as Tagalog do allow preposed topics, and SOV languages such as Japanese do allow postverbal nominals. In fact, even d. and f. can be omitted for these two languages at least, since the SO order is not absolute, and necessary word order constraints will be handled in terms of features such as [prnn] 'pronoun', [±AGT] 'Agent', and [±topc] 'topic' (Tagalog) or [±topc] and [PAT] 'Patient' (Japanese). For languages with truly free order of major constituents, if such there be, no ordering features at all need be specified, a treatment which seems intuitively18 preferable to the very dubious 'Scrambling' transformation proposed by Ross (Ross 1967: 75).
PAN-LEXICALISM
67
2.4.3.3 Relatedness One of the duties of a syntactic description since the beginning of generative grammar is to account for the interrelationships among sentences, and the primary mechanism for accomplishing this has been transformations. In Harris's original formulation, transformations formalized the relationships between sentences by deriving one sentence from another one. Chomsky modified this approach by establishing a level of deep structure, so that two sentences are related if they share a common or similar deep structure. Most recently, this mechanism has dwindled away to accounting only for movement phenomena. A lexicase grammar can be seen as a Chomskyan grammar in which transformations have disappeared completely, so that the burden of relating non-movement sentences is carried solely by the lexicon: two sentences are related to the extent that they share identical or related lexical items standing in regular relations to each other. A lexical rule may relate sentences because a lexical item determines its possible dependents, and a rule which establishes a relation between pairs of lexical items also indirectly relates their dependents (see Figure 2.20). DR-0.
r +x
=>r +w2 i
>—»
+Y D[+Z1
1]
! [PFj J =r+z21 Figure 2.20
This rule states that Ys may be derived from Xs, and it establishes a correspondence between W, and Z2, and between W2 and Z, by matching up their selectional features: if X expects its Wt to have the features aFj, then the corresponding derived Y will expect the same set of features on its Z2 (see Figure 2.21). This is of course exactly the same kind of relationship for which the passive transformation was created, but it is formalized completely in the lexicon. Thus an active and corresponding passive sentence can be related as shown in Figure 2.22.
2.4.3.4 Missing constituents, 'movement \ and 'PRO' The missing constituent or 'functional deviance5 phenomenon is the one major area of syntax which cannot be treated completely in terms of lexical rules alone. In the pairs of sentences 18/18a and 19/19a for example, most of the relationships between interrogatives and declaratives and between finite verbs and infinitives can of course be shown by lexical rules.
68
PAN-LEXICALISM V
X*
W.
W„
+x
[oF. ] 1
] -
LL
- -—'—f
•W1]
3
_
IPRI
[oF.J
z>+W2l
*\
+Z1
•Z2 aF :
k JJ Figure 2.21
subject "+Nom"
active verb '•AGTT
«F, J
+AGT •
J
i
subject +Norn
passive verb +MNS1
+PAT •pFi
•'FM
agent "+Abl" +MNS
- aF i +PAT]
•
object ^+Acc +PAT|
*-aFi
Figure 2.22
(18)
Martha
can beat John.
[+Nom] [-ntrg] (18a) Who can
Martha beat /\?
[+ntrg] f+Nom] (19)
Sheila lassoed a wallaby. r+fint]
69
PAN-LEXICALISM (19a)
Sheila t r i e d to [ / \
lasso a wallaby. [-fint]
The connections between the missing As and NPs elsewhere in the sentence cannot be specified in terms of lexical rules alone, however, and require interpretive rules which are triggered by 'functional deviance' (Brame 1978: 35) and apply to the entire clause. These rules have been discussed in detail in several important papers by Louise Pagotto (Pagotto 1985a;, Pagotto 1985b), and won't be further discussed in this volume. One of the primary functions of transformations has been to account for the relationship between constructions, such as simple declaratives and corresponding subject-aux inversion clauses, which contain identical or similar constituents occurring in different orders. But how can a non-transformational theory permute constituents? It cannot, and in fact it does not have to: it just has to account for the regular syntactic correspondences between such constructions, and transformations are not the only way of doing this. Order in these constructions is specified in terms of directional contextual features, and correspondences between the constituents in such constructions will simply be stated in terms of rules which relate directional contextual features to each other. For subject-aux inversion, for example, the rules are quite simple (Starosta 1977: 119,126) (see Figure 2.23). (a)
SR-1.
•••fint
~>
:±ntrg]
.•xlry. (b)
RR-8.
| ccntrg]
—>
a
[+Nom] l
-a[+Nom]
j
Figur e2.23
SR-1 states that finite auxiliaries may be interrogative or declarative, and RR-8 states that interrogative verbs require following subjects and declarative verbs require preceding subjects. Compare the examples in Figure 2.24. These two sentences are related in that they have identical lexical items in identical case and dependency relationships. They differ only in the feature of [±ntrg] 'interrogative', and in the order of the subject and auxiliary, and this correspondence between order and interrogativity is stated by a lexical rule, not a transformation. Other relationships are similarly statable in the lexicon by means of Derivation Rules such as DR-1 or Inflectional Subcategorization and Redundancy Rules such as SR-1 and RR-8. 'Scrambling' rules are of course not necessary at all. They are simply an artifact of describing constituent structure in terms of Phrase Structure rules rather than lexical rules. In a lexicase grammar, free word-order constructions are simply generated in
70
PAN-LEXICALISM
(20)
(20a)
is
The Pope
Catholic.
is
[+Nom] +xlry
+xlry
•••fint
•fint
-ntrg
•ntrg
-
+
[+Nom]
+[+Nom]
[+Nom]
-[•Nom]
the Pope Catholic?
J
[+Nom]
Figure 2.24
random order by not including order restrictions in the entries for the lexical heads of the constructions. NOTES 1. This sort of response could also result from an odd collocation, and further checking will have to be done to find out which cause is responsible for the uncertain response. 2. Actually, as stated earlier, there is no 'English lexicon' as a psychologically real construct. There are only lexicons of particular English speakers. Rather than repeating this point every time it arises, I will speak of 'English' with the understanding that I am referring to some idiosyncratic version of English. 3. Hudson (Hudson 1984: 3) notes the difficulty of determining the boundaries of a lexical entry, and decides that such a distinction 'corresponds to no kind of reality', and states as a virtue the fact that his network approach 'allows us to sidestep the issue completely'. The lexicase approach in contrast solves the problem instead of sidestepping it: different sound, meaning, or distribution not traceable to inflectional distinctions means different lexical entries. 4. In accordance with the lexical disjointness constraint, these are not fused into one single combined entry. 5. The analysis of poor as a noun in such examples has been explicitly rejected by Hudson (1979c: 17), who simply states that 'we must accept that words like poor in the poor and biggest in the biggest are not zero-derived nouns, but are genuine adjectives'. The necessity of accepting this position is not at all obvious, however. 6. Deborah Masterson informs me that diminutive endings may also appear on adjectives in Spanish. 7. The analogous situation arises with gerundive nominalizations such as: (21)
[^pjohn's always teasing Olga] started to get annoying. [•Det] [Mdv]
[+N]
[+N]
where teasing is syntactically a noun, even though [+Det Adv NP] is not a possible environment for non-gerundive nouns. 8. The rationale being invoked here is the DUCK DOCTRINE: if it looks like a duck, if it walks like a duck, and if it quacks like a duck, it's a duck. 9. Where we are temporarily ignoring the question of non-verbal sentences; cf. section 5.3.2.2.
PAN-LEXICALISM
71
10. A very similar analysis is assumed by O'Grady (O'Grady 1987;, esp. pp. 269). O'Grady however gives no explicit justification for the distinction he makes between 'case suffixes' (op. cit. p. 255) such as -ka NOM, -lul ACC, -uy GEN, -eykey DAT, and -eyse LOC, and 'postpositions' (op. cit. pp. 259-60) such as -ey 'at', -hanthey 'by', and -eyse 'in' (sic), though there is a fairly strong correlation between his Korean case suffixed NPs and English bare noun phrase translational equivalents, and between his Korean postpositional phrases and English PP translational equivalents. 11. Possible candidates include the 'O!' of 'among the leaves so green-o', the American urban blue-collar confirmatory tag word 'Aina?' ('Ain't it?'), and the famous Canadian confirmatory tag word 'Eh?' 12. The lexicase approach to lexical representation and lexical rules is a direct outgrowth of Chomsky's lexical analysis in Aspects of the Theory of Syntax (Chomsky 1965). The incorporation of 'case frames' obviously derives from the work of Fillmore (Fillmore 1968). 13. Mathematically speaking, the formula for the minimum number of binary features / needed to specify uniquely every item in a vocabulary of L items is log2 L — / . This means that if you used your features with maximum efficiency (which no one does), a distinctive feature semantic analysis of a vocabulary of 32,448 words would require onlyfifteendifferent features. 14. There is logical support for this kind of zero-inflectional analysis. First of all, morphology doesn't come from nowhere. A category must already exist before it can acquire a phonological exponent, so every inflectionally marked category must have evolved from a state in which a category was present but not yet morphologically marked. Second, if a category exists in language X, we know that it is a possible category of human language and therefore a possible category for language Y even if there is no overt affixation to mark it; and third, if recognizing it in language Y even without morphological evidence fills a gap in an otherwise universal pattern, then this may be an acceptable extension of the maximum differentiation principle into the domain of universal grammar. Obviously there are dangers in carrying this procedure too far, to the point where all grammars turn out to be encrypted versions of, say, English, but linguistic imperialism can in principle cut both ways, and several features of the lexicase analysis of English have their origins in earlier lexicase analyses of Western Austronesian and Australian aboriginal languages. 15. The Ohio State Language Files also treats participles as inflected forms (Godby et al. 1982: 60-2), in line with traditional grammar but again incompatible with generally accepted criteria for distinguishing between inflection and derivation: participles have quite a different syntactic distribution from other verbs, and so do not belong to the same syntactic subclass of verbs; participle stem formation is the derivational affixation that marks the verbs' entry into a new class. 16. In fact, they can even come about accidentally. Phonesthemes such as English//on words associated with light or sudden motion, e.g. flee, fleet, flicker, fly, flit, float, flood, flow, fluid, flush, flutter, flop, fleeting, flick, flare, flash, flip, flirt, floe, flounder come about by extensions of phonetic-semantic associations which may originally be accidental. 17. A similar problem was raised during the Lexicalist Wars in the early seventies about aggression vs. *aggress, but aggress now seems to be back in the language for a while; perhaps because we heard it so often we decided to keep it in. 18. Although of course this consideration of 'linguist's intuition' does not carry much weight, especially with respect to transformational competence models which make no claim to reflecting a speaker's actual internal language processing.
3.
Formalization
3.1 ORGANIZATION Figures 3.1 and 3.2 characterize the various kinds of lexicase rules and the representations they generate. Lexicase is a grammar of competence in the Chomskyan tradition, and the rules in the two figures are all rules of grammatical competence. Figure 3.1, the portion of the flow chart covering syntactic competence proper, has been quite stable for the past fifteen years or so. The Phrase level anaphoric rules and the Performance component in Figure 3.2 are newer, and are based on the work of Louise Pagotto and Frances Lindsey Jr. which was carried out under two artificial intelligence projects, the NTT Lexicase Project and the PICHTR Mentat project, at the University of Hawaii during the past three years. The lexicon is assumed to be a list of triune signs with all predictable information extracted and stated as rules (the minimalist position), though it may turn out that people store some or all of the predictable information (the maximalist position). According to Hudson (1984: 27-8): If we want our grammar to mirror the structure of the speaker/hearer's knowledge (as I do), then we need to know which of two assumptions is correct: that inherited properties are calculated each time they are needed, or that they are calculated as soon as they are available for calculation, and then stored . . . I take it that the question is ultimately an empirical one, and we are concerned here with two different claims regarding the structure of linguistic knowledge... At least some experimental evidence seems to support the claim that some inflected forms are calculated (e.g. Mackay 1974) . . . The only reasonable conclusion seems to be that some predictable information is stored, and some is not; and at present we are not in a position to decide which is which (a similar conclusion is expressed in McCawley 1977)... As far as psychological reality is concerned, then, my grammars will provide a base-line below which we must assume that mature speakers of the variety concerned will not sink. However, even this is too much to take for granted, in a sense, because I shall assume the maximum of generalization .. . But it would be quite possible to speak English fluently without having ever made this generalization [about auxiliary verbs] in one's grammar, but having learned all the right facts about the individual verbs. RRs, SRs, MRs, and IRRs are rules whose structural descriptions refer to features within a matrix, and which add features to all matrices satisfying the structural descriptions. The DRs are Derivation Rules, analogical rules of varying degrees of productivity by which new lexical items can be added to the
FORMALIZATION Lexicon 4Derivation Rules DR's Redundancy
t
Rules RR's
T
Subcategorization Rules SR's
Morphological Rules MR's Inflectional Redundancy Rules IRR's
T
Fully specified lexical items
I
Simple syntactic representation: fully specified phrases, (including sentences)
Figure 3.1 lexicon in accordance with patterns established by the similarities and differences among lexical items already in the lexicon (cf. Starosta 1971b; Taylor 1972; Aronoff 1976; DeGuzman 1978), and by means of which hearers may attempt to understand unfamiliar derived forms. The immediate output of the lexical rules, which in effect constitute the syntactic component, is not sentences but well-formed phrases, that is, hierarchically structured sequences of words which are exhaustively commanded by a single word, and in which all positive and negative contextual
FORMALIZATION Simple syntactic representation:
Phrase level
Phrase level
phonological
anaphoric
rules
rules
Augmented syntactic representation Competence Performance
Long-term storage: knowledge base
Short-term storage context of situation
Interpreted sentences
Figure 3.2 features of every word are satisfied. Sentences are merely a proper subset of these phrases in which the highest regent is a finite predicator [+fint,+prdc], typically a verb: As Brame (1979) has very sensibly argued (and cf. the assumptions of Montague's work), the grammar of a language should define the structure of all expressions in it, and can do this without giving any special status to those expressions that can be used independently as utterances. [Pullum 1985: 13] Lexicase was apparently the first generative framework to propose the total abolishment of transformations (Starosta 1971b), so that there is in effect only one level of representation (see section 1.1.2), and no transformations, 'functional structures', or other such rules or subsystems provided to state correspondences among distinct 'logical', semantic, functional, syntactic, and/or phonological levels (Gazdar, Klein, Pullum, and Sag 1985: 9).
75
FORMALIZATION 3.2 RULE TYPES 3.2.1 Word level
The Redundancy Rule, Subcategorization Rule, and contextual and noncontextual lexical subcategorization notation used in the statement of lexical generalizations in a lexicase grammar are identical to or slightly adapted from the formalisms introduced in Chomsky's Aspects of the Theory of Syntax (Chomsky 1965). The notations for inflectional Morphological Rules and for Derivational Rules were developed within the lexicase tradition, but can be seen as respective formalizations of 'spelling rules' and 'derivation rules', mechanisms within the Standard Theory which are often referred to and relied on but rarely formalized. 3.2.1.1 Redundancy rules (RRs) Lexical features are not randomly distributed. Instead, implicational relations apply among features within particular sets of items, and these implicational relations can be stated in a grammar as lexical Redundancy Rules (RRs). For example, it will always be true that every lexical item marked positively for the feature prnn (pronoun) will also be marked [+N]. This fact can be stated as a rule: RR-13.
[+pmn]
-->
[+N]
which has the effect of adding the feature [+N] to all matrices containing the feature [+prnn]. Thus the feature [+N] can be left out of all pronoun matrices in the lexicon, since it is predictable and therefore redundant, and reconstructed as needed by RR-13.1 The formal objects that remain after all possible implicational statements have been made and all redundant features have been extracted are the lexemes. The syntactic consequences of the presence of certain lexically specified syntactic features can also be captured by redundancy rules. For example, the fact that common nouns allow determiners while proper nouns generally do not will be formalized in terms of a redundancy rule showing the implication relationship between the non-contextual feature and a contextual one, for example: RR-14.
[-prprj — > [+([+Det])]
where [+([+Det])] means 'may occur in the environment before or after a sister attribute whose lexical head is [+Det]'. Non-permissible environments are ultimately marked at the end of the grammar by a general Inflectional redundancy rule called the Omega Rule (IRR-OMEGA; see section 3.2.1.3), which states essentially that any major-constituent context which is not explicitly allowed is forbidden. RR-14 thus states an exception to the Omega Rule: common nouns may cooccur with determiner sisters, since the feature [+([+Det])] introduced by RR-14 blocks the addition of [-[+Det]] by the
FORMALIZATION
76
Omega Rule, but no other category may take [-f Det] dependents, since all other feature matrices will contain the feature [—[+Det] ] introduced by IRROMEGA. RRs are really marking conventions. They state the normal value for a feature in a matrix given the presence of one or more other features. The exceptions, the marked specifications, are indicated by 'marking', by specifying the feature in the lexicon or inserting it in an earlier rule, and RRs are not allowed to change any feature which is already specified in the matrix for that item.2 3.2.1.2 Subcategorization Rules (SRs) All subcategorization of lexical items in a lexicase grammar is accounted for by Subcategorization Rules (SRs) in the lexicon rather than by separate Phrase Structure Rules. Subcategorization Rules characterize choices that are available within a particular category. These rules are of two subtypes, inflectional and lexical. For example, one inflectional Subcategorization Rule states that English count nouns may be marked as either singular or plural. The other type of Subcategorization Rule, lexical SRs, does not allow an actual choice, but rather characterizes binary subcategories of a lexical category. For example, there is a lexical Subcategorization Rule which states that English non-pronominal nouns are either proper or common. Lexical SRs divide lexical categories into binary sets, for example: (a)
SR-4.
[+N]
—>
[±prnn]
(b)
SR-5.
[-prnn]
—>
[±prpr]
which can be read as, Nouns are divided into pronoun and non-pronoun subsets, and the non-pronoun subset can be further subdivided into proper and common subclasses (see Figure 3.3). dog "+N prnn
Dan "+N
-prnn
-prnn
_-prpr _
+prpr
ma "+N
1
+prnn
Figure 3.3
In one sense, lexical Subcategorization Rules such as SR-4 and SR-5 are unnecessary, since they will never apply non-vacuously: all nouns will already be marked one way or the other in the lexicon for [±prnn], and all nonpronouns will be lexically marked as [—prpr] or [+prpr]. However, they still seem to belong in a grammar intended to capture the maximum number of generalizations, since they are part of the specification of the notion 'possible lexical item in English', and they may turn out to have a function in constraining the application of lexical derivation rules, leveling exceptional items, or naturalizing borrowings.
77
FORMALIZATION
Given that SR-4 is required in the grammar, we might ask whether RR-13 ([•fprnn] -* [+N]) is also required in the same grammar, since it provides some of the same information. The answer is affirmative: RRs add predictable features to minimally specified lexical entries, but lexical SRs (as opposed to inflectional SRs; see Figure 3.4) can't perform this function because all the features they try to specify are already present in the matrices that come to them (see Figure 3.4). It should be noted that this is not a systematic redundancy, since such rules may not always be exact converses of each other; for example, if we have the SRs 6 and 7: a) SR-6.
[+N]
b)
[+Det] — > [±dfnt]
SR-7.
— > [±dfnt]
[+prnn] — * [+N]
RR-13. SR-4.
[+NJ
—• [±prnn]
SR-5.
[-prnn] -—» [±prpr]
+N -dfnl
+Det +dfnt
-dfnt
+dfnt
dog
[+prnn, -prpr]
dog
[+prnn, -prpr, +N] \
dog
x
[+prnn, -prpr, +N]
dog
Figure 3.4
I
[+prnn, -prpr, +N]
then it will not be possible for a grammar to contain either of the two Redundancy Rules in Figure 3.5, since that might result in adding the wrong feature to the wrong lexeme. the [+dfnt]
i
Bill [+dfnt]
a)
RR-15. [+dfnt] — >
[+N]
'the
[+dfnt, +N]
b)
RR-16. [+dfnt] — >
[+Det]
'the
[+drnt, +N, +Det]
i
Bill 'Bill
[+dfnt, +N) [+dfnt,+N,+Det]
Figure 3.5
Traditional inflectional categories such as person, number, gender, case, tense, and so on are treated in lexicase as freely variable features which are not stored in their lexical entries (except in the cases of unpredictable forms), but are rather added by an inflectional Subcategorization Rule (see, for example, Figure 3.63). That is, Vs are either finite or infinitives; and finite verbs are SR-8.
[+V] — > [±fint]
SR-9.
t+fint] — > [±past]
Figure 3.6
78
FORMALIZATION
either past or non-past. Although identical in form to the Subcategorization Rules 4 and 5 discussed earlier, these inflectional SRs are distinct in function: (i) they do not apply vacuously, and (ii) the features they add have grammatical import; they are realized phonologically by inflectional Morphological Rules (MRs) and/or syntactically by Inflectional Redundancy Rules, for example: MR-1.
]
—>
IRR-3.
[ocfint]
—>
[•past] [- [ ± f i n t ] (b)
-fint walk /wok/
SR-9.
walk /wok/
MR-1
] —> d] / f+past]
(d)
IRR-3.
[afint] --> [-ot[+Nom]]
•fint walk /wok/
r+v [-fint.
O f i n t ] —> [ ± p a s t ] (c)
walk /wok/
walk /wok/
+V
'+V
1
+f int
+f int
-past
+pastj
walk /wok/ walked /wokd/
r+v
+V
•+v
[-fint..
+fint
+fint
-past
+pastj
walk /wok/
walk /wok/
|
walked /wokd/
[+V
+V
'+V
-fint
+ f int
+ fint
[-[+Nom]
-past
•past
+[+Nom]
+[+Noin]J
Figure 3.7
1
79
FORMALIZATION
Regular exceptions to inflectional SRs will be handled by blocking their application by means of features added by earlier rules. Thus RR-17 will capture the generalization that mass nouns are singular: RR-17.
[+mass]
~>
SR-10.
[+N] « >
[-plrl ]
[±plrl]
since the [—plrl] feature introduced by this rule will prevent the assignment of [H-plrl] to words like mud by SR-10. Irregular exceptions must be handled by some other mechanism; if they could be handled by rules, after all, they wouldn't be irregular. In a transformational grammar, inflectional irregularities are typically ignored or consigned to 'late rules' or 'spelling rules'. Such rules are practically never seen in daylight, but the few specimens which have been observed are reported to look something like the following: go + past
—> went
If this specimen is representative, it should be obvious that it is a misnomer to call such entities 'rules' at all. A rule states a generalization, and the 'rule' above is not general. Rather it states an isolated fact about a single lexical item: the past tense of go is went. It is in fact not a rule but the statement of an exception to the regular rule. Each such 'rule' must be memorized separately, just as if it were an independent lexical entry, and in a lexicase grammar that is exactly what it is. Instead of an entry for go and a rule to derive went from go, there are two entries, one generalized entry for go and one entry for went (see Figure 3.8). In a lexicase grammar, rules state generalizations. Thus a lexicase grammar of English will have one general rule for past tense: MR-1.
] —> d] /
[+past]
went
go
+V
[+V]
+ fint +past Figure 3.8
that is, every verb marked positively for the feature past will have a -d at the end (with the [t] and [id] allophones supplied by phonological rules). However, as we all know, there are exceptions to this rule: the past tense of go is not *goed but went, etc. Following Bloomfield, I consider the repository of exceptions to be the lexicon. However, because of the constraint against rule features, it is not possible to mark a word as an exception to a particular morphological rule, as is tacitly or explicitly assumed in most versions of generative grammar. As a consequence of this constraint, all 'diacritic'
80
FORMALIZATION
features (rule features) customarily used to designate arbitrary lexical classes in the morphological analysis of verbal and nominal inflection (cf. Chomsky and Halle 1968: 373-80) must either be replaced by syntactically or semantically motivated features, such as gender and case features (cf. Acson 1979), or eliminated in favor of grammatically conditioned phonological rules applying to a characteristic lexically listed 'normalized' stem form (cf. Hudson 1980b: 16-19).4 For example, the declension class of a Latin noun would be marked by means of a gender feature and/or a characteristic vowel on the end of the basic stem (or stems). The usual assumption in Item and Arrangement morphology, as well as in Chomskyan grammar (cf. Scalise 1984: 74), is that this base form must be a 'stem', that is, a form with all its inflectional affixes removed, and it is this assumption which has led to positing declension class features and other rule features to predict the correct output. However, this assumption is not a logically necessary one. Rather, from the word-and-paradigm point of view adopted in lexicase, we can choose as our base form any member of the paradigm which allows the rest of the paradigm to be predicted, and this could perfectly well be a single fully inflected form such as the nominative singular or the ablative plural for example, or even two or more 'principal parts' of the paradigm, as is done in traditional grammar. That is, exceptions to general statements which cannot be motivated in terms of phonological shape or semantic features must be indicated by listing exceptional members of an inflectional paradigm as separate lexical entries. Thus a 'word' with suppletive forms such as the notorious Latin fero^ferre, tuli, latus 'bear, carry, endure' would have the infinitive plus three separate stems listed, with two specified for tense and/or aspect but unspecified for person and number (see Figure 3.9). All unpredictable inflected words will be separately listed in their fully specified forms (see Figure 3.10). Similarly in English, all exceptions, including strong verbs and suppletive forms, will be listed as separate and independent lexical entries, as in Figure 3.11, along with the basic stem forms for the rest of the paradigm. ferO
ferre
[+V -fint
tult +V
latus +V
-pssv
+pssvj
1
[+prfc. Figure 3.9 Analogously, most English nouns will be listed as number-neutral entries and will have their plural forms supplied by regular lexical rules. Irregular plurals such as women, men, and feet however, will be independent [+plural] entries to which the plural-formation rules do not apply. According to Hudson (1984: 62), The analysis for/oo//feet must be different from that for go/went, because foot and feet are not totally unrelated in form: they both have the same consonants, and it is only the vowels that are different. This partial similarity must be shown . . .
81
FORMALIZATION fers
ferO
fert
+v
+V
+V
— +Nom 1
— +Nom 1
— +Nom
-spkrj
+spkr]
— +Nom 1
— +Nom 1
+addr]
-addrj
— +Nom ]
— +Nom 1
+plrlj
+p1rl|
+spkr
— •Nom 4addr
— +Nom +plrl
-pssv
-pssv
-pssv
-prfc
-prfc
-prfc
-f>ast
-past
.tpast
Fiigure 3.1 (0 saw
knew
[+V
1
did
went
was
+V
•V
+V
'•V
+f int
+f int
+f int
+ f int
+ fint
+past
+past
+past
••-past
+past
ccF.
fip
•^k
.»,
i
J
j
eF
1
m J
see
know
do
go
be
[+V '
+V
+V
+v
+V 1
8F
^k
SF,.
eFJ
k
J
Figure 3.11
However, the necessity for having a separate grammatical statement in the grammar to 'show' this is not obvious, and the cost in terms of rule features is prohibitive. We can determine that speakers know that feet is the plural counterpart of foot, but I know of no evidence that they treat this kind of alternation any differently than the difference between go and went, and in fact lexicase makes no distinction between the two cases. Finally, zero-inflected words such as sheep will also have two entries, one permanently marked as [-fplural], and pants will have only one entry, marked with a permanent [4-plural] feature, so that *pant [—plural] will be a lexical gap (Starosta 1971a: 167-8; see Figure 3.12). Completely suppletive pairs such as go and went will of course also have to have separate entries, identical in semantic content features but differing in inflectional features and phonological representation (see Figure 3.13).5
82
FORMALIZATION woman
women
man
men
-prnn
[-prnn]
f-prnn
-prnn
-prpr
-prpr
[-prpr
-prpr
L+plrlJ
+plrl
sheep
sheep
pants
-prnn
[-prnn]
f-prnn
-prpr
-prpr
-prpr
[+plrlj
[+plrl Figure 3.12 went
go
r+v
"I
+V
|
+motn
+motn
Ui
r
J
•past «•aF.
|
Figure 3.12\ Noun pairs such as cow and cattle or person and people might also be analyzed in this way, but here we have contrasting regular plurals cows and persons respectively, so that we are probably really dealing with separate collective plurals, one of which differs from the corresponding unmarked form in sex as well as collectivity (Figure 3.14); here ' a F / and '/JFj' indicate the other semantic features the two entries have in common.
cow
cattle
-prnn
people [-prnn
-prnn
-prnn
-prpr
-prpr
-prpr
-prpr
ccFi
+plrl
-mscl
•plrl
+clct
*Fj
•clct
person
[ocFi
l*Fj Figure 3.14
FORMALIZATION
83
Since all the irregular entries are independent of the corresponding unmarked forms, there is no need to make a commitment about, say, which word is the plural or singular form of which other word. Words are related semantically to the extent that they share semantic features.6 Thus people may or may not be more related to person than cattle is to cow or brethren is to brother, but these are questions for a lexicographer, not a grammarian. Further, because of this independent listing, it will not be surprising to find that a particular singular form has more than one plural or vice versa, as in person versus persons versus people. The slight semantic difference between the two plurals can be attributed to the Tive Clocks' principle (Joos 1962): why have two different plurals if they are both going to mean the same thing? There is a formal problem with inflectional morphological rules: they apply to the output of SRs, but if fully specified irregular forms follow the same route as minimally specified stems, then words like put [+past] and went [+past] will go through the IMRs and come out as *putted and *wented respectively (see Figure 3.15). How can this result be avoided? After strenuous good-faith efforts to do so, I have tentatively concluded that it cannot be: lexicase is telling us, as a good empirical theory should, that this is a purely performance problem: Mary goed to Disneyland is a grammatically well-formed sentence, and the reason adults do not use it is because they are lazy and have bigger lexicons which make fully specified prepackaged items like went easier to serve. I will refer to this as the PREPACKAGING STRATEGY: if a fully specified inflected form such as put [+V,+past, a F J already exists in the lexicon, use that rather than running the uninfected stem iovmput [+V, a F J through the rules ('Selective inheritance principle', Hudson 1984: 17-18). From the case of suppletive forms, we can see that the semantic identity is the primary criterion for triggering the invocation of this strategy: the presence of went [+V, -fpast, /?Fk] blocks the construction of* goed from go [+V, /?Fk] because the semantic features [/JFk] are identical, regardless of the lack of any phonological similarity. Intuitively this seems to make sense: the speaker has a meaning he wants to express, and chooses his lexical entries accordingly. If he finds a single entry matching the meaning he wants to express, then that is the item chosen. If not, more work has to be done either to construct such an item or find an appropriate phrase. Note that this strategy applies regardless of whether we take (a) the maximalist position, that all words, inflected and otherwise, are remembered individually, or (b) the minimalist position, that every recognizable generalization is extracted from lexical entries, leaving the absolute minimum of non-redundant information specified for each item, or (c) an intermediate position, that more common items are stored and used in fully specified form and less common items are reconstructed from stems and rules as needed. In this analysis, the relationship between the inflected form and the stem form is shown by the shared semantic features (represented by a F ; etc.), and the similarity in form is incidental. That is, irregular forms are treated exactly in the way in which suppletive forms are treated. This constitutes a claim that they are independent of each other, and must be learned separately, as opposed to a rule feature approach, which treats the irregular form as
FORMALIZATION
84
-»Lexicon
90
[+V] [+V] [+V]
put
[+V, +past ]
went
[+V, +past ]
walk put
DR's
~r+NM -aplrl
FORMALIZATION
87
In fact non-directional features are simpler and more general than directional ones, since there is no need to establish an artificial and ad hoc deep structure 'basic order' first, then apply an agreement transformation, then randomly 'scramble' the constituents. THE OMEGA RULE: SRs, R R S , and lexical representations can be stated
most neatly and efficiently, and perhaps can only be fully formalized at all if every grammar written in the lexicase framework contains one possibly universal IRR, the 'Omega Rule'. This rule is the last one in the lexicase grammar of every language, and states quite simply that nothing can cooccur with anything (see Figure 3.17). (The 'one per Sent' constraint required by case grammars could also be stated as part of this rule.) The empty matrix to the left of the arrow refers to all lexical categories, and the features to the right are non-directional, or equivalently, bi-directional, contextual features (see 2.3.2.2) which have the effect of preventing any category from having any other category as a sister. IRR-OMEGA.
[ ] —> -[+V] -t+Adj] -[+Adv]
-r+p] -[+Det] -[+Sprt] Figure 3.17 The Omega Rule is in essence an elsewhere condition, or a statement of default values, and the positively marked contextual features in the grammar, whether introduced by preceding rules or permanently specified on the lexemes, function to mark exceptions to it. That is, every general context which has not been explicitly allowed by the introduction or presence of some positively stated contextual feature will be marked as an impossible collocation 'elsewhere' by the Omega Rule. The positively marked contextual features may be obligatory, for example IRR-1, or optional, for example RR-14: (a) IRR-1.
[+[+Det]l
-prpr +dfnt
(b) RR-14.
[-prpr]
—>
[+([+Det])].
Optional positive contextual features only function to block the application of later negatively specified contextual features, especially those introduced at the end by the Omega Rule. RR-14 by itself only says that a common noun
88
FORMALIZATION
MAY cooccur with a Determiner. The Omega Rule subsequently marks all categories, including verbs, prepositions, pronouns, and so on as never cooccurring with dependent Determiner sisters by assigning the feature [—[+Det]] to their matrices. However, it cannot overrule the [+([+Det])] previously assigned to common nouns by RR-14, so only (non-pronominal) nouns are allowed (though not required) to cooccur with Determiners. RR-11 tells us that in English, nouns cannot precede their Determiners: RR-11.
[+N]
—>
K_[+Det]]
so the end result is comparable to the Phrase Structure Rule NP -• (Det) N plus some kind of feature-copying transformation in a Standard Theory analysis. It should not be concluded from the above paragraphs that there will be no IRRs other than agreement rules and the Omega Rule that introduce negative features. Certain other sub-regularities will also be stated in terms of such features, for example: IRR-6.
->
•mass
[-[+rtcl]]
-dfnt i.e. indefinite mass nouns do not allow articles, thus ruling out *0 mud (cf. Starosta 1971a: 192-6). (Note again that by IRR-4, the non-article determiners that they do allow must be indefinite, for example some mud.) 3.2.1.4 Morphological rules (MRs) As noted in the preceding section, the minimalist position assumes that morphologically regular inflection is not part of the lexical specification of a lexeme. Instead, it is introduced arbitrarily by an inflectional Subcategorization Rule, for example: SR-10.
[+N]
-->
[ctplrl]
which has the effect of replacing the original [+N] lexeme by two identical lexical entries differing only in the specification of the feature [±plrl]. The actual inflectional affixes for a particular inflectional paradigm are added to the stem by Morphological Rules (MRs), which depend on the prior application of SRs and add a predictable element to the lexical specification. As an example, the following rule would assign the correct affix to regularly formed English plural nouns: MR-2.
]
—>
z]
/
-prnn [+plrl
Again, exceptional forms would be listed as separate lexical entries, since rule features are not available to camouflage their unpredictable character.
FORMALIZATION
89
By convention, morphological rules do not apply to words whose inflectional features were not introduced by earlier SRs. This convention is intended to prevent MR-2 from adding -s to cattle [—prnn, +plrl] to yield *catties. This convention has not yet been formally implemented in any fullscale lexicase analysis. (For an alternative approach to this particular problem stated in terms of zero-filled slots, see Hudson 1980b: 13-14a.) Lexicase morphology is word-and-paradigm rather than item-andarrangement or item-and-process (Robins 1959; Matthews 1974). Thus it easily avoids some of the paradoxes and adhocieries of these segmentationoriented models. For example, consider the problem with the morphological analysis of fusional inflectional systems such as verbal inflection in most Romance languages. In Latin, for example, first-person singular present tense verbs end in -o. In an IA analysis, this might be analyzed as a sequence of the morphemes Pres + First + Singular; however, which part of the o is First, which is Singular, and which is Present? For this situation, structuralists resorted to exceptional 'portmanteau morphs' which conflated several morphemes. In lexicase terms, however, this turns out to be an artifact of the IA or IP methodology, a consequence of treating morphological differences as necessarily segmental. For a WP analysis, we merely have to use a cluster of inflectional features to identify a word by its position in the paradigm, and then make the phonological modifications appropriate to that position. Thus only one rule is required, and the question of which part of the o corresponds to which feature makes no sense (see Figure 3.18). Here -[+Nom, +plrl] means 'cannot cooccur with a plural subject5, that is, 'singular', and [—[+Nom, -spkr] means 'cannot cooccur with a non-speaker subject', that is, 'first person', -o then just marks the cooccurrence of those three features. The English third-person singular present tense is treated in exactly the same way (see Figure 3.19). MR-3.
]
—>
o] / [-past
1
- +Nom 1 •plrl 1 - +Nom 1 -spkr|1 Figure 3.18 Another nasty problem for a segment-type analysis is stem-internal modifications, such as infixes, ablaut, umlaut, reduplication, or Semitic triconsonantal stems. However, a lexicase-type WP analysis has no problem with these processes. Thus if perfective in Tunisian Arabic is marked by a between C t and C 2 and between C 2 and C 3 in a triconsonantal stem C^CjCg (Hudson 1984: 65, simplified), the lexicase MR is just: MR - 5
[CiVCtVCj]
—>
ICiadaCj]
/
[+prfc]
FORMALIZATION
90 MR-4.
-->
7] /
-past — +Nom +plrl E
+
en
J*
2
E
[aXAVz)
Figure 3.20 A fletched arrow, >-->, is used to distinguish lexical derivational rules from other lexicase rules. Occasionally this is replaced simply by a colon, to emphasize the fact that these rules are analogical patterns which can operate in either direction. [a¥i9 /?Fj] on the left refers to a set of syntactic and semantic features which characterize the class of words eligible to undergo the rule and which are eligible to be changed by the rule. [OL¥{] on the right refers to the features which carry over in derivation if any, and [yFk] indicates the new features added in the process of derivation if any. Lexicase derivation rules, like Aronoff s regular word-formation processes (Aronoff 1976: 21), apply to words, not stems or roots; they apply to the output of the lexical redundancy rules, and these are redundantly specified but uninflected free lexical entries. In Figure 3.20, [XY] represents the phonological representation of the source item and 0, ft, and c are the phonological modifications differentiating the derived stem' from the source stem. These changes are typically prefixes (a), infixes (ft), or suffixes (c), but can be other
FORMALIZATION
91
kinds of changes as well, including tone change, truncation (Starosta 1986a, Scalise 1984: 89,97), or, in the limiting case, no change at all, that is, zero derivation. As an example, DR-6, a rule which provides a means of associating English 'quality' adjectives with corresponding -ly manner adverbs, is shown as Figure 3.21. The rule is normally assumed to apply in the direction from phonologically less marked to more marked, though other kinds of markedness considerations could enter in as well, reflecting an intuition that when people make up new words, they usually add something to an item already present in their lexicons. However, a DR is an analogical pattern, and analogies are intrinsically non-directional. Thus it is in principle possible for any DR to operate in either direction, and when a DR is used in the markedto-unmarked direction, it is referred to as BACK-FORMATION. Sometimes when one affix replaces another, for example [aX] -* [X6] (Starosta 1986a), the appropriate direction may be difficult or impossible to determine nonarbitrarily. DR-6.
>—>
+Adj +qlty
J
1 i]
Figuire 3.21
Unlike the other lexical rules, DRs can and always do 'change' some features.9 The convention is that any feature mentioned on the left but not on the right is missing from the new item, and that any feature mentioned on the right is added to the new item. Such rules may or may not change syntactic class (for example, see Figure 3.2210). [)R-4. dress DR-4.
[+NJ >--> dress
[+N oF.
DR-5.
DR-5. [+VJ >—> undress
f+V |«F,
\>
f+V
kj >—>
r+v
[oF.l '
+rvrs| ccF. —>
[un Figure 3.22
[+V]
FORMALIZATION
92
The lower part of a DR specifies the phonological consequences of the rule if any; [XY] is the original phonological representation, and a, b, and c represent any modification made to that form. Contrary to usual assumptions (Selkirk 1972; Siegel 1974; Booij 1974; Scalise 1984: 78), no internal boundaries are created or allowed in the process: whatever 'phonological5 processes happen to such forms happen at the time the word is formed, and only their residues are left as part of the lexical entry. No subsequent synchronic rule can refer to any word-internal boundary. Possible modifications of form induced by the lower part of a DR include affixation, full or partial reduplication, internal modifications such as consonant modification {house [+N] : house [+V]), stress change {reject [+V]: reject [+N]), tone change, umlaut, truncation, or, in the case of 'zero derivation', no change at all. Bound morphemes such as a, b, and c exist in a grammars only as parts of such rules; there are no bound morphemes stored as separate entries in the lexicon. DRs are intrinsically ordered, in that the application of one rule may create an item which has the form and features required to make it eligible to be the input of another rule. After each application, a new word enters the lexicon, so there are no hypothetical intermediate stages, but in fact none are needed. It is of course true that an intermediate link in a chain of derivations could be missing (for example, see Figure 3.23), but DRs are analogies constructed by speakers on the basis of whatever is in the lexicon at a given time, so a speaker can just as easily construct a new pattern of derivation based on the endpoints of the chain,11 that is [W] >--> [Wai]. Moreover, as analogies these patterns are reversible. For example, a colleague once invented the verb 'process' /proses/ to describe what the faculty were doing into the amphitheater at a new university president's inauguration. We agreed that 'proceed' would not have meant the right thing,,and could not think of any other single verb which would have the desired meaning. This is a clear example of a back-formation, that is, a derivation in the marked—unmarked direction, based on the analogy of confession: confess:: digression: digress:: procession: ?. [W]>—> [Wa] >—> [Wab]
I
1 Xa
Xab
Y
Ya
Yab
X
z
i
DRs Lexicon
Zab
Figure 3.23 DRs do not have much of a role to play in active linguistic performance. They are only used to (i) interpret a novel word, or (ii) far less frequently, to make up a new word. They are however very important in a lexicase grammar in that they are the primary mechanism for stating the interrelationships of sentence constructions such as active and passive clauses. That is, the lexical
FORMALIZATION
93
head of a clause determines the structure of that clause; active and passive verbs determine the structures of their respective clauses, and the derivation rule which relates active and passive verbs thereby indirectly relates active and passive clauses, without the need for a transformation to relate the clauses directly (see 2.4.3.3). Lexical derivation differs from inflection in degree of productivity. For a particular derivational process, given some underived word, there may or may not be a corresponding derived word, and if the derived word exists, it may or may not exactly correspond semantically or phonologically to the general pattern of derivation. In coining a new word, the speaker necessarily follows some regular pattern, since if he did not, other people would not understand what he meant. However, he is not necessarily bound very tightly to the rule. The expected semantic properties of a new formation are based on the semantic analogies existing in the lexicon at any given point, and each new deviation from the pattern adds to the base of examples for establishing such patterns. Each word is made up to fit a particular need in a particular situation, and its assigned meaning will be partly a function of the pattern by which it was constructed and partly a function of the purpose for which it was coined. This meaning may not be completely predictable from the meaning of the source item plus the affix if any, and a hearer's interpretation of a new formation could be far off the mark in the absence of a disambiguating context. Each coinage is a single externally motivated historical event, and the new word temporarily enters the lexicon of the speaker as a nonce12 word which may or may not become permanent in that speaker's speech and/or spread to other speakers. Whether a possible form has been derived, and if so what the derivative looks like, or whether a derived word's derivational source has been lost are really questions about the history of the language (cf. Hudson 1984: 74), and should not be treated as if they were synchronic facts marked on some other lexical entry. Each application of a Derivation Rule (with the possible exception of completely productive derivational rules such as gerundive nominalization) must be registered in the lexicon by separately listing the new stem that is produced in addition to its derivational source, with the perceived relatedness of the items captured by the relevant lexical Derivation Rule. This applies to 'transparent' as well as 'lexicalized' formations. Thus it cannot be the case, as Scalise suggests (Scalise 1984: 93), that only semantically obscure 'lexicalized' compounds for example are listed in the lexicon, since even if the meaning of a particular compound is predictable from its components, the existence of the compound is not, and that fact must be registered by listing it in the lexicon. Speakers can tell whether a particular compound actually exists in their own lexicon (e.g. doghouse or birdhouse) or does not but could (*mousehouse) because they can scan their internalized lists and see if it is there. It is this sort of information which lexicographers are attempting to capture when they list doghouse and birdhouse but have no entry for mousehouse, though their listings are of course unions of various arbitrarily selected individual lists, and therefore have very limited psychological validity.13
FORMALIZATION
94
Once a word has been created, it is subject to changes independent of its source (or sources). The complete independence of the derivative is demonstrated by the fact that a derived form may continue to exist even after its derivational source has disappeared, as for example in the case of unruly and uncouth, or may never have existed in the language at all, as in the case of romance loans such as receive, distract, and derive and derivation, which were borrowed into English only in derived forms. This is of course an embarrassment to a rule feature analysis, by which a derived form is dependent on lexical properties of the corresponding underived form. As a separate and independent lexical entry, the derived entry may undergo its own independent phonological and/or semantic shifts subsequent to its derivation. For example, few speakers would recognize the semantic connections between comb and unkempt, grass and graze, glass and glaze, brass and brazen, or house, wife, and hussy because of the semantic and phonological shifts which the derived forms have undergone. Similarly Junge in German means 'boy' (or less frequently 'the young of an animal'), though as a derivative of jung 'young, recent' it might be expected to refer to any entity characterizable as young or recent. Pobre as a noun in Spanish refers only to an economically deprived person and pobrecito is a sarcastic way of referring to a person as an object of pity, though both of these senses are present in the source adjective pobre (see 2.3.1.1). These points are illustrated in the examples in Table 3.1 involving derivation with the suffix ~ize. Table 3.1 Derivation with the Suffix -/ze N + -ize >—> V
Pattern: e.g.
terror
+ -ize • 'affect with terror'
[+N] terror
(a)
energy scru 11' ny
terrorize
Both forms exist;
energi ze
meaning and form fairly
scru ti n i ze
predictable.
(b) minimum
mi nimi ze
Loss of -urn, insertion
max. i mum
max. imi ze
of -*, -/ not predictable
si
stigmaii
phonologically;
igma
(Tantalus) */tydro(c)
tantali
ze ze
hydro Iyze
*hydro
I ion
1i on i ze
Meanings not predictable
winter
wi nier i ze
(BowdIe r)
BowdIer i ze
[+N] non-ex istent.
from the meaning of the source item.
FORMALIZATION (anode)
anodize
(Hesmer)
mesmerize
(canon
)
(d) drama
canon
95
ize
dramati ze
Neither the meaning nor
the -t-
of the derived
item are predictable from the meaning and form of the source. (e) tiger
--
Non-existence of the
spring
--
derived item not
modicum
--
predictable, especially
cathode
--
given the existence of /ionize,
winterize, and
max imize,
(f) - -
anodize.
catalyze
Non-existence of source
ostracize
item not predictable.
analyze
(back formation from analys
cognize
i s)
(back formation from cognizance).
The data in Table 3.1 illustrate several of the characteristics of lexical derivation which we have been discussing. The process of lexical derivation cannot be completely described in terms of a precise rule, since some of the items in the table have unpredictable phonological forms or meanings. Instead, the rule states a general pattern of derivation which may be followed more or less exactly depending on the situational need which the new word is being created to fill; and since its form and meaning are not precisely predictable from the form and meaning of the source word, both words must be recorded separately. Every word has its own existence, and a derived word may exist without a coexisting synchronic source (cf. Hudson 1984: 71); a DR tells us whether a given word COULD exist, but the only way to know if it does exist is to look it up in the lexicon; that is, both source and derived words must
96
FORMALIZATION
be recorded separately, and either the source or the derived form can exist independently of its derived or underived counterpart. Derivational rules are necessarily optional, since if they applied obligatorily, the lexicon would have to store an infinite number of words in a finite space. Every word has many potential derivatives, but the application of lexical derivation processes is typically sporadic and unpredictable. In contrast, all other grammatical rules are obligatory. A given sentence or a given inflected form is either in the language or not in the language, but a given derived form is either in, out, or potentially in. This is taken as a clear indication that deriving or interpreting a new word is a matter of performance rather than competence, since lexicase does not need or in fact allow rule features or any other device for marking grammatical rules as optional or obligatory: all grammatical rules are obligatory, and whatever the grammar generates is in the language, otherwise it would not be testable. Anything which is optional is performance.14 In word grammar, derivation is handled 'by positing an abstract word-pair with the relevant relation, and linking decide/decision, along with other pairs, to this abstract pair as a model' (Hudson 1984: 70). This is very close to the lexicase DR formalism, in that the left and right matrices of a DR could very well be regarded as an 'abstract word-pair' which serves as a 'model'. The difference is the treatment of productivity: As we all know, derivational patterns like the one that links decide and decision are irregular, and do not apply completely generally. . .. There is no problem in showing this in a word grammar, because the abstract entry only applies to those words which are shown as instances of it [Hudson 1984: 71]. The 'is-an-instance-of mechanism is of course the notational equivalent of a rule feature, and thus not allowed in a lexicase grammar. RULE FEATURES: rule features are anathema in lexicase, so they cannot be used to code information about one word in the entry of another, or to derive an existing form such as hydrolyze or ostracize from a synchronically coexisting hypothetical underlying form such as *hydro or *ostra. As a consequence of this requirement, for example, all English complex words which depend for example on the presence of the rule feature [H-Romance] (Chomsky and Halle 1968: 373) or [4-Greek] (Scalise 1984: 76) for their derivation cannot be produced by means of a synchronically productive rule, but must be listed individually in the lexicon. Eschewing rule features amounts to the requirement that the lexicon list stems rather than roots. This has had salutary consequences for linguistic analysis, as can be seen from the fact that (i) it explains how a derivational source can be lost without affecting the derivative; in a rule feature analysis, which projects the derivative from the source, we would have to say that forms such as *ruely are still there but have gone underground; alternatively we would have to find some non-arbitrary way to distinguish between some derived words which have underlying forms which exist but are never realized, for example perhaps receive, and others which are truly on their own, such as unruly,
FORMALIZATION
97
and to determine at what point the ancestor ghosts disappear in fact as well as in form;15 (ii) it allows for independent phonological or semantic shifts in derivationally related words, for example grass versus graze and destroy versus destroyer, and (iii) it was directly responsible for the discovery of a neat and regular morphological analysis of the Tagalog verb (DeGuzman 1978), an area of Tagalog grammar which in earlier analyses had been handled merely by listing verb roots with a large number of phonologically similar but grammatically unrelated affixes for each root. The requirement of separate listing of derived forms may appear at first to count against the approach proposed here by the criterion of economy, since it requires the separate listing of both underived forms and their derivatives. If the only alternative is the use of rule features the separate listing is preferable, since it puts the information where it belongs and does not camouflage the irregular nature of the rule. Excluding rule features makes an empirical claim about psychological reality: it reflects the claim that words which are exceptions to regular rules are individually memorized and that awareness of derivational relatedness is not a part of anyone's synchronic grammar, a claim for which I think a considerable amount of supporting data will be found. ZERO DERIVATION: General lexicase constraints require that if a form occurs in two distinct syntactic environments, it corresponds to two distinct lexical entries, for example: (1) They always run to school in the morning. [+V] (2) W e took the dogs for a long run on the beach. [+N] Since there are many such pairs of examples, and since speakers are able to produce and understand new ones, the grammar needs to capture the relationship in terms of a derivation rule (see Figure 3.24).
[+V
•+N >—>
1
aF.
iJ
1
-mass
Ltf'i
i
Figure 3.34
Since there is no phonological change corresponding to the difference in syntactic category, there is no phonological subrule. This type of rule is referred to as 'zero derivation5. Such rules propose problems not of principle but of practice, in particular: how different must the distributional difference
98
FORMALIZATION
be in order for two forms to count as two separate classes requiring two separate entries? How do we decide in which direction the rule operates? In answer to the first question, I will assume that two distributional patterns, A and B, will count as distinct if we can find a root W which occurs in A but not in B, and another root Y which occurs in B but not A. In this situation, a root Z which occurs in both patterns A and B will be analyzed into two distinct but derivationally related lexical entries, Zx and Z2. Adopting the zero-derivation analysis makes life vastly simpler for the syntactician and not any more difficult for the morphologer. We can state the distributions for nouns and adjectives and verbs separately, in accordance with X-bar constraints and in an intuitively satisfying way, and capture generalizations which would otherwise remain forever elusive if we treated every occurrence of a given form as the same word with its own unique gerrymander-like distribution, as Hudson seems to be advocating (Hudson 1984: 90-1). In answer to the second question, a derivation rule is an analogy, and an analogy can apply in either direction. Intuitively, it is more natural for a derivation rule to apply from less marked to more marked. Where there is no phonological marking to assist in this determination, we may need to have recourse to prototypes and notional category definitions to choose between alternate directions; for example run can be a noun or a verb, but it seems natural to assume that the verbal usage is less marked because verbs are statistically more likely to be actions or states of being. Thus the rule is taken to apply from verb to noun, rather than vice versa. Since all DRs are reversible, no great harm will befall if we get the direction wrong. COMPOUNDING: Compounding is by definition derivation, since it adds new items to the lexicon. It differs from derivation by affixation, internal change, and zero derivation only in that two or more of its components are free morphemes. Compounding derivation rules are of the canonical form shown in Figure 3.25. That is, a compounding derivation rule typically takes two independent (uninflected) lexical entries as input and derives a new single lexical item as output. The lexical category of the derived item is typically the same as the category of one of the input items (the head of the compound), and the meaning of the derived form is typically a function of the meanings a¥{ and /3F- of the input items, with some additional arbitrary component yF k which depends on the purpose for which the word is created. The [+X '
[«F.
+
+Y 1
+Y
BFjj
«Fi *F,
[M]
+
[N]
—>
Figure 3.25
[a M b N c)
FORMALIZATION
99
phonological form of the derived output is typically just a concatenation of the phonological forms of the inputs, with some kind of sandhi phenomena operating perhaps, but there may also by additional phonological increments, such as the Tagalog ligature -na-/— or truncation and/or stem-forming vowel insertion, as in forms such as Indo-Pacific and Greco-Roman. There are again no internal boundaries introduced by such rules. As expected with derivational rules, compounding has the following properties: (i) It sometimes changes word class (though it is more common for the compound to retain the same part of speech as its head word); (ii) It is non-productive; of all the combinations compatible with the compounding DRs, only a very small subset are actually present in the lexicon at any given time, and the meanings of compounds (except in technical domains to a great extent) are often not precisely predictable from the meanings of their constituents; (iii) Compounds can be further derived, sometimes resulting in multiply layered forms (Mark Twain's notorious German example, Oberdonaudampfschifffahrtsgeselkchaftskapitdnsjackenknopf 'jacket button of a captain of the Upper Danube Steam Navigation Company' comes to mind), whereas as we have noted, inflection is the outer layer of affixation, and (with very rare exceptions) closes off the form to further word formation processes. PSEUDOCOMPOUNDS AND LEARNED BORROWINGS: There are intermediate
cases which resemble compounding except that one stem may not occur freely even though it is 'lexical' in some intuitive sense. This is especially common in English with 'learned borrowings' and latinate or hellenic wordcoinage,16 but it also occurs in languages such as Sora (Starosta forthcoming a) and Mandarin Chinese (Starosta 1985c: 249-55) in which a historically monosyllabic language has evolved into a disyllabic one and in the process developed a class of bound and semantically word-like suffixes used partly to fill out the disyllabic canonical form (Hashimoto 1978). CAMEL-BELCHING: Language is a flexible thing. People can talk about anything, and one of the mechanisms that makes this possible is the CamelBelching rule (Taylor 1972: 170, 205, 322-3), which states that a noun can be made out of any humanly utterable sound, including foreign words or phrases, animal sounds, or any other imitatable sound. The name comes from Q. P. Dong's discussion of this effect (Dong 1971: 8-9), in which he cited the examples: (3)
John s a i d , Arma virumque canO.
(4)
John s a i d , ( i m i t a t i o n of camel b e l c h i n g ) .
In a lexicase grammar, this phenomenon is accounted for by a rule which derives a quote noun out of anything: +N DR-3. [ ] >«> +quotJ,
100
FORMALIZATION
which accounts for the fact that there are no linguistic constraints on quotes, and that quotes function as nouns. PHRASAL DERIVATION: It will perhaps be necessary to provide for derivation
rules which take phrases as input, i.e. PHRASE -• WORD. Such rules would be needed not only for relatively infrequent fixed formations such as Jack-in-thepulpit, snake in the grass, and dyed-in-the-wool, and for nonce expressions such as (5) and (6),17 but also for the extremely common rule which produces possessive determiners of the his class by adding the suffix 's to whole NPs, e.g. [Det John's ] other wife, [Det this old woman's ] canaries, and [Det the child on the horse's ] teacher. (5) He's a *damn the torpedoes' kind of guy. (6) The salesman 1 i t t l e - o l d - l a d y - f r o m - P a s a d e n a ' d me into buying t h i s M a s e r a t i .
These are grammatically single words in every respect, including their ability to undergo further derivation as possessive pronouns, for example (7) and (8). (7)
[^p John's ] all kept fluttering around the room, but [^p this old woman's ] just sat in the cage with
its eyes closed. (8) The child in the pigpen's parents both showed up, but [^p the child on the horse's ] didn't attend. Historically this situation seems to be a result of a sequence of stages such as (9)-(9a). The step from (9) to (9a) could be handled by a regular compounding rule taking a sequence of Det + N [+title] + of + N [4-country], but structures such as (10) cannot be handled in this way because the pattern is too open to iterate all the possibilities. (9)
the King's hat of England
(9a)
the King of England's hat
(10)
the l i t t l e k i d in the t r u n k ' s p i e r c i n g screams.
It does not seem to be classifiable as a subtype of Camel Belching, either, because (i) Camel Belching takes any sound, human or otherwise, as input, while possessive Det derivation takes only NPs, and (ii) Camel Belching produces nouns as output, but this rule produces Dets. Unfortunately, current lexicase notation does not allow for DRs with generalized phrasal input. For the time being, this type of derivation remains an open problem.
FORMALIZATION
101
3.2.2 Phrase level rules 3.2.2.1 Semantic interpretation rules (SIRs) Phrase-level phonological rules and anaphoric rules are the only non-lexical rules in the lexicase system. The latter mark pronouns, 'gaps' or 'holes', and other anaphoric elements as coreferential or non-coreferential, and so are a very important component of an adequate parsing system. Lexicase distinguishes grammatically conditioned and situationally conditioned conference relationships, and only the former are considered to be within the purview of a competence grammar. Most of the work in the area of coreference has been done by Louise Pagotto (Pagotto 1985a, Pagotto 1986b, Pagotto and Starosta 1985a) and Frances Lindsey Jr. (Lindsey 1985). For a full exposition of this area of the grammar, see Lindsey, Pagotto, and Starosta (in preparation). 3.2.2.2 Phonological rules Nothing has been done as of this writing on phonology within a lexicase framework. Since a phonological specification is one of the three components of the lexicase triune lexical representation, we may expect that some of the same kinds of lexical rules will be used in capturing generalizations about that representation; there should be redundancy rules to tell us that vowels are non-strident (with [ae] in British RP listed as an exception), subcategorization rules expressing the fact that stops may be voiced or voiceless and vowels stressed or unstressed in some language, and IRRs stating that, say, unstressed vowels do not allow preceding aspirated stops. It is not yet clear whether phrase-level and sentence-level phenomena and external sandhi can be handled in terms of contextual features marked on the head of the phrase, or whether instead a separate set of phrase-level phonological interpretation rules of some sort will be required. All these questions remain open for future investigation. 3.3 RULE ORDERING Do lexical rules need to be ordered? Certainly the various components of the grammar have to be ordered with respect to each other, at least if we assume the minimalist position in which lexical items are stored without predictable features. Thus for example DRs and SRs must follow RRs, because they are triggered by features such as [+V] which would be introduced by RRs, and IRRs and MRs must follow SRs for the same reason. It would be possible to reformulate the rules so that they were triggered only by non-predictable features stored permanently in the lexicon, but this would require awkward and complex disjunctions. If we assume the maximalist position, of course, no ordering is required: all features are already specified in the lexicon, so no subsequent rules can alter them.
102
FORMALIZATION
The organization of the components of the grammar explains the relationship between inflection and derivation, in that the differences fall out from the interrelationships among the components of a lexicase grammar as follows:18 (i) the output of the DR component feeds back into the lexicon, since derivation creates new lexical entries. Any phonological modification introduced by a DR becomes a permanent part of the stored entry which can in turn serve as the input to the RRs and then to either the SRs or to new DRs. SRs can only operate on input from the lexicon, and feed into the syntactic representation, but lexical derivation is recursive: a simple or complex item emerging from the RRs can go through the DRs to create a new entry, typically a morphologically more complex one, and this new entry can in turn go through the RR-DR-lexicon cycle again, to create a new and morphologically even more complex lexical entry. A new item can then enter the SRs and ultimately receive inflectional affixes on top of derivational ones; that is, derivation feeds inflection. (ii) The first layer of inflectional affixes is necessarily also the last layer, and thus the outer layer: inflected forms cannot be further modified by derivational affixes because once an item enters the SRs, there is no track back to the lexicon. Some kinds of internal inflectional modification, such as umlaut or infixation, might give the impression that an inflectional rule had applied before a derivational one; however, in the cases I am aware of it is still possible to state the MR as applying to the output of the DR rather than vice versa, with no loss of generality or increase in expressive power. Inflectional affixes are characteristic of one single class because they apply to a form which already has its class and subclass features specified, and no subsequent rules are allowed to change such features. There is no arrow leading from the MR's directly or indirectly back to the lexicon, and in particular there is no arrow from the MRs to the DRs, so derivational affixes cannot be added to inflectional ones: inflection bleeds derivation. (iii) Inflection forms paradigms: a given stem emerging from the RRs will be operated on by the SRs and subsequent rules in regular and predictable ways. The set of expansions of the original stem is a paradigm. Every item coming out of the RRs will be marked for only one major category feature and possibly one or more subclass features, since in accordance with the triune sign definition, an entry has one meaning, one pronunciation, and one syntactic class. Since SRs and IRRs add but don't change features, all the members of a paradigm, the matrices projected from a single entry, will remain in that same syntactic class: inflection does not change classes. (iv) Derivation on the other hand is not regular: a DR may or may not apply to an output of the RRs which meets its structural description, and if it does apply, it may apply in an irregular way, depending on the purpose for which the item was created. Derivation typically creates a word of a different syntactic class from that of the model it applies to, while inflection never changes syntactic class. Thus class-changing is a sufficient condition for identifying derivation, though not a necessary one. All the items derived
FORMALIZATION
103
directly or indirectly from the same root constitute a 'word family' of completely independent and autonomous entries whose mutual relationship is etymological rather than grammatical. Derivation does not expand matrices, but rather creates new matrices by analogy with old ones; and once these new matrices have entered the lexicon, they can serve as the input to new DRs; thus derivation feeds inflection and derivation, and derivational affixes form the inner layer of word formation. Within the grammar as currently conceived, RRs must be ordered with respect to each other. Because of their function as marking conventions, a more specific rule which adds an exceptional marked feature may be ordered so that it precedes and bleeds a later rule specifying an unmarked feature. Within the DR and SR components, rules are intrinsically ordered by feeding relationships and do not bleed each other, so extrinsic ordering is just a convenience. MRs however must be ordered with respect to one another, since each rule may add an affix, and if the affixes in an agglutinative system are ordered, the MRs must be ordered accordingly, to attach the affixes in the right sequence. Within the IRR component, the Omega Rule must come at the end. It is the default value for contextual environments, and all positive major-category contextual features, such as [+([+Det])] on nouns, must be introduced prior to the Omega Rule, either in the lexicon or by earlier RRs or IRRs. Otherwise the Omega Rule might get therefirstand exclude things that are supposed to be there. This ordering may reflect the process of language acquisition: the child starts off with the Omega Rule, which states that no word can have a dependent. He progresses beyond this one-word stage by learning exceptions to it, marking them first on individual lexical entries and then gradually constructing general rules.
3.4 REPRESENTATIONS 3.4.1 Levels, structures, and functions Grammatical frameworks differ in the number of distinct levels of representation which they allow for a single sentence. Generalized Phrase Structure Grammar for example has only one (Gazdar et al. 1985: 10), Generative Semantics has two (semantic representation and surface structure), lexicalfunctional grammar has three (c-structure, f-structure, and logical representation), the Standard Theory has three (deep, surface, and semantic), Government and Binding Theory four or maybe more (D-structure, S-structure, Logical Form, and possibly others; van Riemsdijk and Williams 1986: 173; Radford 1981: 390), and relational grammar has an unlimited number of distinct strata, though they are written in a single composite graph. Lexicase, like Generalized Phrase Structure Grammar, has only one stratum, or two, depending on how you count: a single dependency representation, the SIMPLE SYNTACTIC REPRESENTATION. This structure plus supplementary information regarding grammatically determined coreference
104
FORMALIZATION
relationships constitute the AUGMENTED SYNTACTIC REPRESENTATION). I would
contend that the existence of these two kinds of representation does not make the framework bistratal. The augmented syntactic representation is identical to the simple syntactic representation except that it has added information. No word order has changed, no nodes or words or morphemes have been added or subtracted, and most importantly, no grammatical relations have been changed. This system is quite different in power from a framework which allows a single surface constituent to bear two distinct and mutually exclusive GRs in a single derivation, as is done in relational grammar and classical and contemporary transformational grammar. Government and Binding theory for example allows a single NP to be an object at one level and a subject later (passive), or a lower subject at one level and a higher subject at another (subject to subject raising), and relational grammar allows even more radical correspondences, such as 'ascensions' in which a possessor functions as an attribute of an N in one stratum and a clause-level constituent in a later one. This is not allowed in a lexicase representation. According to the Frosch Air Mattress Principle (Helmut Frosch, personal communication), when a grammatical framework is pushed down in one place, it normally bulges up somewhere else. Consequently, we might expect that lexicase grammatical representations and lexical rules will be far more complex than, say, their Government and Binding Theory counterparts. In the case of lexical rules, this expectation is to some extent fulfilled, since lexicase currently posits five types of lexical rules to generate its monostratal grammatical representations (Redundancy Rules, Subcategorization Rules, Inflectional Redundancy Rules, Morphological rules, and Derivation Rules), while the Standard Theory needs only three (Phrase Structure Rules for deep structures and transformations, spelling rules, and maybe filters for surface structures). Whether this constitutes an overall complication with respect to GB and RG cannot however be determined until the latter two theories have reached comparable levels of formalization and coverage. 3.4.2 Dependency and constituency A well-formed syntactic structure within the lexicase framework must satisfy all the requirements on possible trees (for example that the structure is connected, has one 'root' node, no crossing branches, and has a full word at the end of each branch). As an example, see Figure 3.26. Every word in a sentence is the head of its own construction, and every lexical item in a sentence but one, the main verb (or non-verbal predicator), is dependent on one and only one other lexical item, its REGENT.19 The head of a construction carries semantic and grammatical information for the construction as a whole, and attributes modify (constrain the potential reference of) the heads of their constructions. A lexicase representation can be viewed as a network of dependencies obtaining between (actual or virtual) pairs of lexical items in a sentence. Each word is specified for the kinds of dependents it is allowed to take, including in the limiting case (for example Determiners) none at all. A word decides which
105
FORMALIZATION
Rufus
+PAT]
+N J [+cnjc 1
rN L+PATJ Figure 3.26
classes or subclasses of words may, may not, or must occur as dependents, whether the dependents appear to the right or left of it, how they are ordered among themselves, whether they must appear in a particular inflectional category (government) or in the same inflectional form as the head (agreement), and how they are to be interpreted semantically (case relations and selectional interpretation), and the positive and negative contextual features in each lexical matrix must be met by the lexical items which depend on it in that tree. Lexicase grammatical representations neutralize the distinction between phrase structure grammar and dependency grammar, and a lexicase representations incorporate the information carried by the three different kinds of tree structure contrasted by Winograd (Winograd 1983: 80), dependency (head and modifier), role structure (slot and filler), and phrase structure (immediate constituents): a head is any item attached under a vertical line, and an attribute is any item attached under a slanting line (dependency); the case role of a constituent such as the bear in Figure 3.26 is the case role of its lexical head, in this instance [+AGT] 'Agent' (role structure); and a constituent is any word plus all its direct or indirect dependents (phrase structure; cf. Hudson 1984: 92). A construction can be defined as a set of functionally equivalent constellations of words all of which have a head word drawn from the same syntactic class. Conversely, the head of a construction can be defined as the indispensable representative of that construction (Wilson 1966). Obviously, then, the concepts 'head', 'attribute', and 'construction' are very important in such a grammar, and syntactic representations must minimally be able to distinguish among these three notions (Hudson 1979c: 4). However, conventional Phrase Structure representations cannot. For example, consider the PSRs in Figure 3.27. From these rules themselves, we can tell that PP is exocentric and NP is endocentric, since NP has only one obligatory immediate constituent and PP has more than one (Wilson 1966).
FORMALIZATION
106 (a)
PSR-2.
NP - - >
(Det) (Adj) N
(b)
PSR-5.
PP —>
P f NPI < PP>
Isl
Figure 3.27 In PP, the first constituent is the lexical head of the construction, and in NP, it is the last. These distinctions are lost, however, when the usual tree representations are drawn (see Figure 3.28).
Figure 3.28 The lexicase conception of dependency theory is a rigorous and restricted version of Chomsky's X-bar analysis (Chomsky 1970; cf. Pullum 1985) and it employs X-bar terminology. The ways in which lexicase representations differ most strikingly from typical X-bar tree notation are (i) the omissibility of node labels, which given lexicase constraints are redundant, and (ii) the 'flatness' of the trees, and especially absence of a distinction between VP and S, since given lexicase constraints, such a distinction is impossible. It is in fact the reverence for the binary NP-VP division at the sentence level in frameworks within the Chomskyan tradition which necessitates making a distinction between dependency and constituent structure in the first place. A subject is clearly dependent on the verb, as shown by subcategorization, selection, theta-role assignment, and/or agreement, yet in a Chomskyan NP-VP constituent structure, the subject is not in the same constituent as the verb. Therefore, dependency must be different from constituency. However, the lexicase one-bar constraint on syntactic representations eliminates the distinction between VP and S, and when the VP node is eliminated, the subject is within the syntactic domain of the verb, so there is no need to separate constituent structure and dependency structure. 3.4.3 Centricity Starting from the formal definition of head and attribute introduced above, it is possible to characterize the notions of endocentric and exocentric construction in a way which is somewhat more workable and theoretically productive than the traditional structuralist characterization in terms of substitutability
FORMALIZATION
107
(Bloomfield 1933: 194). Thus if a construction has only one head, that is, one obligatory constituent, it is defined as endocentric, and the head must be a lexical item rather than a construction by the one-bar constraint. The items in parentheses on the right side of the arrow in a conventional phrase structure rule are the attributes of the head, and must by the one-bar constraint also be constructions rather than individual lexical items, since every word must have its own one-bar projection (Pullum's 'Maximality' condition).20 The lexical head of an endocentric construction is written on a vertical line, directly under the label for the construction as a whole if this redundant label is retained to facilitate readability, and attribute nodes are written one step below the level of the head word on lines slanted to the right or the left. Noun phrases and (verbal) clauses for example are endocentric constructions, headed by nouns and verbs respectively. An exocentric construction is any construction with more than one obligatory member (Bloomfield 1933: 194), and by the one-bar constraint, at least one of its heads must be a lexical item rather than a construction. The other co-heads may be individual lexical items (from the same syntactic class) or phrases. Following a convention proposed by Cathy Hine, the dominating node for the construction is stretched into a horizontal line, and the two or more co-heads of an exocentric construction are written on vertical lines descending from the horizontal line representing the node dominating the exocentric construction as a whole. For example, a PP is an exocentric construction21 which has two co-heads, a preposition and an NP, PP, or S, and a coordinate NP such as my Dad and Rufus in (12) has three or more co-heads: one (or more) coordinating conjunctions (the lexical head of the construction) and two or more coordinated NPs (the phrasal co-heads). The lexical head of a exocentric construction, such as and in (12), is dominated by a vertical line of unit length, and the lexical heads of the phrasal co-heads (the 'secondary heads' of the exocentric construction), in this case the Ns Dad and Rufus, are dominated by vertical lines of two-unit length. The use of horizontal bars to characterize exocentric constructions makes it possible to reconcile (i) the requirement that heads be represented by vertical lines with (ii) the fact that exocentric constructions have more than one head and (iii) the eternal verity that parallel lines never meet (except at infinity). A side benefit is that exocentric constructions are easier to spot in a tree. Note that the constituent my Dad and Rufus in Figure 3.26 is formally a cnjc' ('conjunction-bar'), since a phrase label is a one-bar projection of its lexical head, and the lexical head of a coordinate construction is a coordinating conjunction. However, my Dad and Rufus is also an NP in function, because a coordinate construction is exocentric, and so the virtual matrix associated with my Dad and Rufus contains the feature [+N] as well as [-hcnjc], making the construction for external subcategorizing purposes simultaneously a cnjc' and an N ' , that is, an NP. Thus lexicase solves the old problem of how to put coordinate constructions in the same category as their phrasal heads (Pullum 1985: 24-6) without giving up the succession principle as GPSG does (op.cit., p. 25). See section 6.4 for a more detailed discussion.
FORMALIZATION
108
3.4.4 Command and the domains of relatedness and subcategorization The primary functions of constituent structure in a lexicase grammar (and in grammars in general, I think) are to characterize the notion of grammatical relatedness, to delimit the domain of lexical subcategorization, and to establish the direction of modification. In lexicase, tree diagrams are graphic representations of dependency and constituency relationships holding among pairs of words in a sentence, and thus indirectly of relationships holding among the constructions of which these words are the heads. In order to discuss these relationships, several terms need to be defined: 3.4.4.1 Cap-command The syntactic relationship between a lexical head and the heads of its sister constituents is referred to in lexicase as cap-command (from Latin caput, capitis 'head'): a word cap-commands the lexical heads of its dependent sisters, or, using X-bar terms, an X° cap-commands the heads of its Comps. Equivalently, X is the regent of its Comps. Thus in the tree in Figure 3.29,
| that
boy on
bus the
there
Figure 3.29 (i) bus cap-commands the and there, since bus has two dependent sisters, the and there, and the respective heads of these two constructions are the words the and there; equivalently, bus is the regent of the and there; (ii) on cap-commands bus {on is the regent of bus), since on has a single dependent sister (the phrasal co-head of the exocentric construction on the bus there), namely, the bus there, and the lexical head of the bus there is bus; and (iii) boy cap-commands that, on, and bus, since boy has two dependent sister constituents (indicated by slanting lines), that and on the bus there. The lexical head of the construction [that] (shown by a vertical line) is the word that. However [[on] [the bus there]] is an exocentric construction (shown by a horizontal line) which has two heads (shown by vertical lines), [on] and [the bus there]. The lexical head of [on] is on, and the lexical head of [the bus there] (vertical line) is bus. Thus the lexical heads of the dependent sisters of boy, the words in the immediate syntactic domain of boy, are that, on, and bus. The notion 'cap-command' plays a crucial role in defining the domain of lexical subcategorization, which in lexicase is coextensive with the domain of grammatical relatedness. To determine which constituents are relevant in subcategorization, lexicase appeals to the sisterhead constraint: contextual features are marked on the lexical heads of constructions, and refer only to
FORMALIZATION
109
lexical heads of sister constructions.22 That is, a word is subcategorized only by the words which it cap-commands. Thus in Figure 3.30, X is subcategorized by and grammatically related to only Y and Z, not W-bar. For example, a verb may be subcategorized by the heads of the noun phrases which are its sisters, but not by the other constituents which are inside the NPs. The immediate domain of the verb saw in the example of Figure 3.26 is composed of only the three nouns bear, Dad, and Rufus, and only the lexical features of these three words are accessible in determining whether saw is compatible with its attributes. Determiners such as my, on the other hand, are not in the immediate domain of the verb; rather, they are in the immediate domains of their respective head nouns. Conversely, a noun may not be subcategorized by any constituent outside the NP, so that saw is not 'visible5 to the contextual features of bear for example. In the case of exocentric constructions such as the coordinate NP my Dad and Rufus, the head words of all the co-heads are available to subcategorize the regent of the construction, since they are all cap-commanded by the higher head item. Similarly, in the Noun Phrase in Figure 3.29, the lexical head of the construction is the noun boy. Following the sisterhead constraint, the contextual features marked on boy can refer only to features of the words it cap-commands, in this case that and the heads of the exocentric PP, on and bus, but not to the or there. The features of both the preposition and the head of its sister phrase, or of a coordinating conjunction and the lexical heads of the conjuncts, fall within the domain of subcategorization of the cap-commanding lexical item and jointly subcategorize it. Their features taken together are said to form a 'virtual matrix', that is, a matrix which is not the lexical specification of any single lexical item, but which is rather a composite of the (non-contextual) features of all of the lexical heads of the construction.23
Comp
X
Comp
Figure 3.30
3.4.4.2 Command In order to account for grammatically conditioned coreference relations between overt and missing categories, lexicase needs to refer to indirect domains of words in addition to the immediate domains defined by capcommand. This is done in terms of the concept 'command'. A word X commands a word Y if either (a) X cap-commands Y, or (b) X cap-commands Z and Z commands Y. Thus for example in Figure 3.29, boy commands there because boy cap-commands bus and bus cap-commands there \ however that
110
FORMALIZATION
does not command there because that has no dependent sisters at all, and so does not cap-command anything. 3.4.5 Minimal representations As discussed in section 1.3.2.2, the interaction of the tree-drawing conventions, the one-bar limitation, and the sisterhead constraint makes it possible to eliminate both phrasal and major category labels without any loss of information. The matrix of an individual lexical item contains information about its syntactic category, making a category node label redundant. With the one-bar constraint, the label of the phrasal construction can be determined with reference to the lexical category of the head of the construction which is indicated by the vertical line above it. Therefore all syntactic information can be represented in a tree without node labels. It is in fact technically possible to streamline such representations even further, replacing the vertical lines marking heads by the head word itself; for example, see Figure 3.31, where a slanting line represents the relation between a head and its modifier and a horizontal line represents the relation between the co-heads of an exocentric construction. Then a lexical head is any word which has horizontal or down-sloping lines attached to it. This is a notational variant of the more usual tree notation used in lexicase for the last seventeen years, and for the sake of readability and avoidance of alienation, I will stay with the more familiar-looking trees.
the
the
movies the were
~1
attend
3.4.6 Intensional semantic representation From the dependency point of view, a syntactic representation is a string of words related to one another in terms of pairwise dependency relations. Every word but one is dependent on some other word in the sentence, and the meaning of each dependent serves to delimit the possible range of reference of its regent. In addition, relational features such as case relations add additional information about the relation obtaining between regents and their nominal dependents. Thus semantic information is readily extracted from the syntactic representation, because the representation links together those
FORMALIZATION
111
words which are semantically as well as syntactically related. Such a dependency tree then constitutes not only a syntactic representation of a sentence but at the same time an intensional semantic one: the meaning of the sentence is any conceivable set of circumstances which is not ruled out by the meanings of the individual words and the attributional and case relationships obtaining among them. Any imaginable situation which is not incompatible with this intensional representation is one of the readings of the sentence. Thus the (incomplete) structure in Figure 3.32 is grammatical though the situation it refers to is bizarre. The syntactic representation is simultaneously a semantic representation stating that the rock is interpreted as an animate entity which sees the rhubarb.
the
rock
saw +V
rhubarb
+N
3[+AGT]
+N
+cncr
=>[+PAT ]
-anmt
Z>
+AGT Z>
the
1
+cncr
+AGT]
-anmt
+anmtj
+PATJ
+PAT]
_ +cncr J Figu re 3.32
NOTES 1. Computationalists will recognize such rules as daemons. 2. Cf. Hudson's 'priority-to-the-instance' principle, Hudson 1984: 16. 3. The tree is not part of the rule, but is added to show the effect of the rule in graphic form. 4. Interestingly, Hudson now seems to assume the necessity of arbitrary diacritic features to account for conjugation and declension classes, since he uses their existence to justify positing other such features, 'epenthesizing-word' and 'devoicing-word' (Hudson 1984: 60). 5. The conclusion that go and went must be listed separately is consistent with the lexical leaf constraint, which rules out treating tense as a separate syntactic formative. 6. Hudson's word grammar tries to have it both ways: 'This network treats went and go as distinct word-forms, neither of which in any sense contains the other or is derived from the other; but since went is an instance of go, it inherits all the latter's properties other than its phonological shape.' (Hudson 1984: 62) Lexicase of course allows no such 4s an instance of relationship, since it is a kind of rule feature; every distinct word is on its own, and no variant of rule feature notation is available to link them together. 7. Which is of course itself a semantically shifted form, deriving from a usage referring
112
FORMALIZATION
to domesticated quadrupeds in general, from ME catel < ONF '(personal) property' < ML cap(t)dle 'wealth'. Even then, cattlemen were capitalists. 8. Actually the class [—prnn] includes proper nouns too, but they will have already been marked as generally not allowing Dets by an earlier RR. 9. More precisely, a derivation rule does not 'change' anything in the source word's matrix; rather, it creates a new word which differs from the source word in one or more features, but does not affect the source word itself in any way. When a lexicase grammar speaks of a DR 'changing' a feature, it is meant in this looser sense of a difference and possible contradiction between the feature matrices of the source and target lexical entries. 10. This could alternatively be a case of the situation illustrated in Figure 3.33. DR-7. dress [+N] > — > undress [+V] DR-A: dress [+N] > — > dress [+V] DR-7.
>
>
+v +rvrs
—>
DR-8.
[un
r l ~ rl l«F.J
i«F.J
Figure 3.33
11. An example of this sort in the history of Tagalog is given in Starosta, Pawley, and ReidMS(1981:99). 12. Of course, nonce itself is a nice example of this; it came into existence as the result of an etymologically incorrect syllabification of then once. 13. This situation has perplexed linguists who think they are supposed to be contributing to the grammar of a whole language. Thus for example Salkoff (Salkoff 1983: 291) is quite apologetic about the 'errors' he expects to turn up in the grammaticality judgements he presents in his paper, but such differences in judgment are exactly what we should expect given the lexicase assumption that every speaker has his own unique lexicon, and that a psychologically real grammar can only be extracted from a particular individual's lexicon. Since SalkofFs subject matter is the non-productivity of lexical derivation, it would be astonishing if individual speakers did NOT differ in their judgements of the well-formedness of his examples. 14. This applies to all variation phenomena. Variation is performance, not competence. Variation in the speech of an individual is the result of externally motivated exercise of options allowed by the speaker's internalized system, and finding 'variation' in a 'speech community' is just an artifact of the misguided
FORMALIZATION
15. 16.
17. 18. 19. 20. 21.
22.
23.
113
attempt to take an arbitrarily chosen group of individuals as the focus of the study of an internalized linguistic system (see section 1.2.3). We might refer to such underlying forms as Tinker Bell representations: they exist only so long as the speakers believe in them. The parallel with the Japanese lexicon is striking: in Japanese, the Chinese language serves the resource function fulfilled for English by Greek and Latin, and English borrowings into Japanese are increasingly coming to occupy a position similar to the the status of French loan vocabulary in English. Examples from Nitalu Sroka. Of course they fall out not because the theory controls the phenomenon, but because the grammar was constructed so that they would fall out that way. That is what scientific modeling is all about. This point differentiates it from Hudson's version of dependency grammar, which is much more powerful in allowing a word to have any number of 'heads' [i.e. regents] (Hudson 1984: 85). If they were single lexical items, they would by definition be co-lexical heads of the construction, for example conjunctions in a phrase like Bob and Carol and Ted and Alice, and the construction would not be endocentric. An alternative analysis due to Emonds (Emonds 1972: 547-55) would treat some prepositions as 'intransitive', thereby making PP an endocentric construction. In fact, this is the usual dependency grammar approach as well (Anderson 1971), and it remains a possibility within the lexicase framework. Facts about English and Dutch preposition stranding tend to support it, but facts about German case government tend to favor the exocentric solution. This is equivalent to the X-bar principle that X° is subcategorized by its Comp. However, Chomskyan grammar has to introduce various evasions to allow for the fact (which the practitioners from Chomsky all the way down vigorously deny) that verbs are subcategorized by their subjects, whereas lexicase is able to maintain it strictly because it has no VP to force us into such self-delusion. In the lexicase parsing algorithm discussed in Starosta and Nomura 1986, the effect of a virtual matrix has been achieved by copying the features of the phrasal head (the lexical head of the phrasal co-head, for example bus in on the bus) into the matrix of the lexical head (see on in on the bus in Figure 3.30). The matrix of the preposition on then becomes in effect the virtual matrix of the exocentric construction, representing the grammatically significant features for the whole PP.
4. R o l e s a n d relations
4.1 DIVISION OF LABOR In traditional grammar, the term 'case' refers to an inflectional category, for example 'Nominative', 'Accusative', which is marked on nouns to indicate particular semantic-syntactic relations which they bear to other words in the sentence, especially to regent verbs or prepositions. In generative grammar, 'case' in the context of Fillmorean 'case grammar' refers to the syntacticsemantic relation itself, for example 'Agent', 'Patient', etc., and in the Government and Binding and derivative frameworks, to an abstract category (for example 'Nominative', 'Objective') assigned to nouns to account for the distribution of empty categories and 'theta-relations'. In lexicase grammar, the term 'case' is a cover term for all of these phenomena and more. A lexicase grammar distinguishes among 'case relations' (similar to Fillmorean 'cases' or GB 'thematic relations'), 'case forms' (similar to case in traditional grammar, 'Case' in GB theory, or 'final terms' in relational grammar), 'case markers' (incorporating the case inflections of traditional grammar and other classes of overt grammatical markers), and the macroroles 'actor' and 'undergoer' (comparable to the initial terms 1 and 2 of relational grammar). As suggested by the name 'lexiaw*', these categories play a keystone role in lexicase grammatical descriptions. By allocating case-like phenomena among these various subsystems, the grammar is able to eliminate redundancy, limit the case role inventory to a small fixed universal set, and capture important generalizations, especially ones concerned with Patient centrality and semantic scope, ergativity and case marking, and verbal derivation. The remainder of this chapter and the following chapter will be devoted to a discussion of the properties of case categories and their modes of interaction. As an introduction, Table 4.1 provides an overview of the division of labor among these categories. 4.1.1 Semantic/situational roles and the case grammar tradition Case relations (CRs) are syntactic-semantic relations obtaining between (nonpredicate) nouns and their regents, which can be verbs, prepositions, or other nouns. CRs include the kinds of intuitive categories characterized in traditional and modern generative grammar by terms such as 'Agent',
ROLES AND RELATIONS
Table 4.1 Case categories Category:
Needed to capture generalizations about:
Case Relations: syntact i c-semant i c
Case marking, ergativity,
relations between a
scope of inner CRs and
noun and its regent
inner complements; word order, discourse cohesion
Macroroles: Actor (actr)
Imperatives; clitics in
AGT (transitive verbs) or
Austronesian languages;
PAT in intransitives;
agreement in some
the most salient
Australian languages and
participant in an action
in Achenese; case marking
or state, for example
and argument
the animate intentional
identification in
participant
accusative languages
Undergoer (ndrg)
Aktionsart,
the e n t i t y a f f e c t e d by
'unaccusative'-type
the action
phenomena
Case
forms:
sets of case markers that
Case marking: ergative vs
realize case relations
accusative languages; noun inflectional morphology, agreement, markedness, word order typology
115
ROLES AND RELATIONS
116 Category:
Needed to capture generalizations about:
Case markers: grammatical devices
Distinguish function
which signal the presence
(case marking) from
of CRs
category for various grammatical devices, such as ^postpositions' and 'complementizers'
localistic features:
Case marking; contextual
spatially oriented
subcategorization of Vs,
semanantic features of
Nrs, and Ps; express
case markers, especially
orientations of
case inflection, Ps,
objects, actions, and
Nrs, and local nouns
states in concrete and abstract space
'Patient', etc. Their function is mediating between overt grammatical configurations (word order, case inflection, etc.) and semantic role concepts and scope phenomena, and facilitating discourse cohesion in languages such as Dyirbal. Case grammar in the Fillmorean tradition is based on a not necessarily linguistic intuition about actions or processes or states, participants in those actions, processes, and states, and the roles played by the obligatorily or optionally present participants in these actions, processes and states. These roles are the 'case relations' or 'deep cases' in classical case grammars and many computational applications, and the 'thematic relations' in the Chomskyan tradition, which is based on Gruber's work (Gruber 1965). In practice, such roles are established without any necessary connection with language at all: one simply envisions a kind of silent movie of an action, for example a person loading hay onto a wagon with a pitchfork, identifies the elements necessary for such an action to take place (someone to do the loading, something to be loaded, someplace to load onto or into, and something to load with) and assigns a 'case relation' (or 'thematic relation') to each one. Then the case relations in any sentence referring to that situation must correspond to the previously established silent movie roles. Because this determination is independent of the particular way in which the action is
ROLES AND RELATIONS
117
described linguistically, case relations in this approach are constant across sentences and across languages. This situation-oriented procedure is reflected in the use of paraphrase to identify case relations in both the Fillmorean case grammar and Gruberian thematic relation traditions: two sentences are paraphrases if they have the same truth values, which is a more precise way of saying that they characterize the same external situations. If two sentences Sj and S2 describe the same situation, and if Xj is a participant in that situation, then all NPs in Sx and S2 which refer to xt have the same case relation. The motto of this kind of analysis could be 'Once a case relation, always a case relation'. It is assumed that case relations remain constant across paraphrases, and that if we can identify a case relation in one sentence by some test or criterion, then the identification holds good for all paraphrases of that sentence. A typical example of this is passive constructions, for example (1) and (la). If (1) and (la) refer to the same situation and if the referent oijohn is a participant in that situation, then John has a single case role in all sentences describing that situation. If we decide that John is the agent of the poodle-kicking, then the noun John (and any (non-reflexive) proforms referring to John) will be the Agent of any sentence describing that same poodle-kicking. One often noted consequence of this situationally oriented style of analysis is that no one agrees on how many case relations there should be, because case grammarians cannot agree on exactly how many different kinds of situational roles there are, and on which ones count as being instances of the the same case relation. (1) John kicked the poodle into the puddle. (la) The poodle was kicked into the puddle by John. In effect, what this practice amounts to is constructing a typology of situations which is independent of language, so that case relations are necessarily universal: Fillmorean 'case relations' and Chomskyan 'thematic relations' are invariant across languages as well as across constructions; John will be the Agent of any construction in any language which describes that same event, and as mentioned above, we do not even need any linguistic data to establish this kind of 'case relation': a silent movie would work just as well; we watch the movie and decide who the agent is, then decide what linguistic means has been used to designate that participant. The assumption that all paraphrases must have the same underlying representation results in certain paradoxes, losses of generalization, and compensatory increases in theoretical power, just as it did in the Standard Theory. The instances I will be concerned with in the following section are the ones most responsible for the alternative approach to case relations subsequently taken by lexicase. One difficulty with the Fillmorean paraphrase-based case analysis was pointed out by Rodney Huddleston. It is concerned with pairs of symmetrical sentences such as (2)-(4). (2)
The post o f f i c e is to the r i g h t of the bank. 0
L
118
ROLES AND RELATIONS (2a) The bank is to the left of the post office. 0
L (3) Abdul is similar to Ivan. 0
E
(3a) Ivan is similar to Abdul. E
0
(4) Fritz fought with Pierre. A
(C - Comitative)
C
(4a) Pierre fought with Fritz. A
C
The second class of problems concerns examples such as (5) and (5a), often referred to in case grammar the literature as 'spray paint' sentences. These pairs of sentences have the same truth values, so must have the same underlying structure by Fillmorean case grammar criteria. But which member of the pair is more basic and which one is derived? Is bank the Locus and post office the Object, or vice versa? Is it the stick which is the 'Object' of the action of hitting or is it the fence? Since such determinations are typically made on the basis of prelinguistic intuitions, there is no way to bring linguistic evidence to bear in deciding the issue. Similarly in (6) and (6a), paint is 'Object' (or Theme' or 'Neutral') in both sentences, though the examples are quite different in surface structure. However, there is no non-arbitrary way to decide whether the underlying representation should have the case relation structure of (6) or (6a), and either choice will require positing entities such as Goal objects and Object instruments (6a), constructs for which there is no other semantic or grammatical justification. More disturbing is that we are forced to treat (7), for example, as different in structure from (6a), although in all other respects the two are grammatically identical, simply because there is no corresponding sentence (7a). (5)
John hit his stick against the fence. A
0
L
(5a) John hit the fence with his stick. A
L
0
(6) John sprayed white paint on the hen coop. A
0
G
ROLES AND RELATIONS
119
(6a) John sprayed the hen coop with white p a i n t . A (7)
G
O
Hugo k i l l e d the rats with f i r e . A
0
I
(7a) *Hugo k i l l e d f i r e on/against the r a t s . The lexicase approach to case relations differs from the usual Fillmorean case relation and Chomskyan thematic relation analyses in that its definitions are stated not in terms of external situations directly, but rather in terms of sentence-specific PERSPECTIVES of those situations (cf. Grace's CONCEPTUAL
EVENTS; Grace 1987). Thus if two different sentences refer to the same situation but portray it from different viewpoints, they may contain quite different arrays of case relations. As I will try to show in the following pages, the redefinition of case roles in terms of perspective rather than situation results in a dramatic increase in the language-specific and cross-typological explanatory power of the theory, as reflected in the large number of generalizations that can be stated assuming this sort of definition but not assuming a Fillmorean situational one. 4.1.2 Localistic case markers Because Fillmorean case grammar is primarily concerned with situations and 'deep' matters, surface manifestations of case relations have generally been benignly neglected. As a result, it does not seem to have been noticed by linguists working in this framework that there is a systematic redundancy in case representations of the sort found in Fillmore's 1970 model (Fillmore 1970), as shown in Figure 4.1. The redundancy results from the fact that prepositions are regarded in such systems as arbitrary formatives which function to signal the presence of case relations such as S (Source) and G (Goal). However, they are not just 'formatives', they are words with their own lexical entries and semantic representations. If we do a componential analysis of their meanings by supplying enough semantic features so that (i) every preposition differs from every other (non-synonymous) preposition in at least one feature, and (ii) all prepositions which share a component of meaning also share a semantic feature, we will discover that the lexical representations of the prepositions themselves carry some of the same information as the
/
Prep
II
from
\
NP
Prep
sea to Figure 4.1
/
\
HP
ll
sea
ROLES AND RELATIONS
120
Prep
NP
i
I
from
sea
1
1
Prep 1 i to
+Prep
+Prep
+drcn
+drcn
+sorc
•t-goal
Figure 4.2 Fillmorean case labels (see Figure 4.2). In fact, this redundancy is present even when the case relations are realized by case inflections instead of prepositions, since case inflections carry exactly the same kinds of localistic semantic information-as prepositions do (see sections 5.2.1 and 5.3). In order to eliminate this redundancy, lexicase has attempted to determine for each language how much information is carried by the case-marking mechanisms directly, and let the case relations themselves carry only that information which cannot be recovered from the lexical representations of the case markers. This 'division of labor' has resulted in a significant reduction in the case relation inventory and a major increase in universal linguistic generalizations. 4.2 CASE RELATIONS In a lexicase grammar, the syntactic environments of verbs are stated in terms of case relations and case forms rather than syntactic nodes (NP, PP, etc.) or logic-type arguments. There is a major distinction between case relations and the arguments of logical functions: the relation between an argument and its predicate has no meaningful content. At best you distinguish between the first and second arguments of a predicate, say, but that is purely arbitrary, and could be reversed (for all tokens) without altering anything. There is no domain of logic which treats Tirst argument' as a significant term, as far as I know. For case grammar, on the other hand, the situation is quite different. The arguments of a verb, referred to as 'actants' in case grammar, have semantically significant roles to play in the argument structure, and these roles are drawn from a very limited universal set. Case relations are necessary because a grammar must account for everything a speaker knows about the sentences of his language which are constant for all situations in which the sentences are used. For example, we know not only that the verb devour always takes two arguments, but also that one of the arguments is the devourer and the other is the food. This pattern can be generalized to a broad range of examples, all involving a predicate and two arguments, one perceived as affected and the other as the external affector. In each instance, one of these roles, the affector, is signaled by one
ROLES AND RELATIONS
121
kind of overt grammatical configuration, 'subject', and the other, the affectee, by a different configuration, 'object'. The role relation obtaining between a verb and each of its nominal modifiers, or 'actants', is a case relation. 4.2.1 Characterization The lexicase analysis of case relations is an outgrowth of Fillmorean case grammar. The original class handout which can be seen as the first concrete manifestation of lexicase (Starosta 1970) was an attempt to formalize a Fillmorean verb subcategorization in terms of case relation features instead of the usual node labels, and the early lexicase dissertations assumed definitions of case relations quite similar to those used in Fillmore's papers between 1968 and about 1974. Subsequent versions of lexicase have placed increasing emphasis on grammatical criteria and grammatical generalizations, and as a consequence the inventory of case relations has shrunk and the semantic content of individual CRs has become very general. 4.2.1.1 Situation versus perception Lexicase distinguishes between case relations, which are tied to a particular 'conceptual event', the perspective of a situation expressed by a particular clause structure, and situational roles, which are independent of language. There may be several ways of looking at the same action, ways which are cognitively different and which are accordingly encoded and decoded differently, but which are not distinct in terms of truth values. Language encodes this perspective, not external facts (cf. Grace 1987), and thus logical systems defined in terms of truth values are inappropriate for conveying linguistic meaning. When I say that I hit the stick against the fence, I am talking about an action (hitv analogous to fling) in which an object (the stick) is moved through space to a goal (the fence), but when I say that I hit the fence with the stick, I am talking about an action (hit2, like damage) in which an object (the fence) is affected by means of an instrument (the stick). These sentences thus mean something different, and the difference in syntactic structure reflects these semantic differences. The fact that the two sentences have identical truth values is irrelevant to the linguistic analysis. Each verb encodes a particular perspective. It is a perceptual template which the speaker and hearer impose on situations. There is a lot of freedom in deciding which template is compatible with which situation, and it would be hard to make neat generalizations, especially given the absence of a good independent means of representing external reality. However, that is pragmatics. The linguistically interesting question is the relation between grammar and perspective, and there are many good generalizations to be made here, with the aid of the lexicase definitions of case relations. It follows from the definition of the triune sign (see 2.3.1.1) that if two verbs have different perspectives, regardless of whether or not they are homonymous, they are different verbs. Thus hit in hit the fence with is different from hit in hit the stick against. Each needs a separate lexical entry, since it is not
122
ROLES AND RELATIONS
possible to predict which verb will allow which perspectives. Although this conclusion is somewhat suspicious for English in that it results in a large number of homophonous lexical entries, this is just an accidental fact about English. In closely related languages such as Dutch and German, the fact that a root is appearing in a different perspective is often marked by a different derivational prefix. It is also supported by the fact that there are other words which may be synonymous or nearly synonymous with a word in one of its perspectives but not in its others. In this situation, it seems generally the case that verbs which share the same perspective also share the same syntactic properties, indicating that we will get farther in capturing generalizations about grammar-meaning correlations if we (i) treat verbs with different perspectives as different lexical entries, and (ii) state the correlations with respect to perspective/perception rather than with objective external situations. As a result, instead of a vast number of small verb classes with Gerrymander-like semantic ranges and syntactic distributions, there will be a fairly small number of verb classes, each with a neatly circumscribed distribution, meaning, and structure-meaning mapping pattern, and some analogical derivation rules relating roots which appear in more than one class. The fact that two semantically similar roots have different distributions is then a simple consequence of a typical property of lexical derivation: nonproductivity. Some roots simply have not been derived into the same classes as other perhaps semantically very similar roots. They could be someday, and for some speakers they already have been, which accounts for many individual differences in grammaticality judgements such as those noted for example by Salkoff (1983). 4.2.1.2 Grammatical criteria Within the lexicase framework, the conclusion drawn from the problems mentioned at the beginning of this chapter is that logical criteria are not appropriate for justifying a grammatical analysis. Instead, morphological and syntactic criteria are given priority in establishing an inventory of case relations and analyzing syntactic structures. This can be illustrated by giving lexicase analyses (2 ')-([+AGT]
-Ictn
I
+lctn
-Ictn
I
I
D[+LOC]
+lctn
I :D[+LOC]
Figure 4.9
however, this finally became untenable, and the implicational feature notation he adopted (Khamisi 1985: 340-42; cf. Blake 1983: 167) has now become generally established: a Patient is implied (3[+PAT]), not required (+[+PAT]), in the case frame of every verb, and missing PATs are reconstructed by rules of interpretation (Pagotto 1985a). The basic generalization has thus become a statement about properties of the lexicon rather than properties of clauses as such: a Patient is obligatory in the CASE FRAME of every predicate, rather than in the external environment of every predicate. In spite of this revision, however, the central and pivotal role of the Patient case in many morphological, syntactic, and semantic generalizations remains unaffected. 4.2.2.2 Gaps If case frames are now implicational rather than grammatical, then if a sentence lacks an actant, it is incomplete rather than ungrammatical, and it will be acceptable as long as the missing actants are recoverable from context. A case frame may go unsatisfied without causing ungrammaticality ('functional deviance'; see Brame 1978: 30) under several kinds of circumstance, including: (i) infinitives and 'tough-movement' constructions, in which the missing actant can be uniquely identified with another NP bearing a particular case relation and located in a structurally definable slot with respect to the missing element (Pagotto 1985a, Pagotto 1985b), as in (37)-(38). (37)
(38)
Mary t r i e d to [ / \
cheat on the exam].
PAT + [ - f i n t ]
-fint
My exams are tough to [cheat on / V J . +ptnl
142
ROLES AND RELATIONS
(ii) 'wh-movement' and topicalization constructions, in which the missing actant can be identified with a special operator in clause exposure position, for example (39)—(41). (39)
Who did Mary [ f i n d / \
to mind the baby]?
(40)
T h a t ' s the p r e s c r i p t i v e grammarian [ I warned you about / \ ] .
(41)
The three-body problem E i n s t e i n never solved
/V
(iii) 'PRO-drop' constructions, in which the inflection on the verb and/or contextual information allows the missing argument to be uniquely identified, as follows: J
'i
•
'
•
/ \ Quiere. a Rosita
'i
He loves Rosy.
(Spanish)
(iv) 'optional actant' languages such as Japanese and Korean, in which (in contrast to languages such as English) any actant can be omitted as long as it is recoverable from the linguistic or extralinguistic context; and (v) fragments, which typically occur in very casual conversation or as answers to questions, as follows: Hey, there!
/ \ Got a match?
Gaps of types (i)-(iii) are uniquely recoverable from clause-internal information alone,5 and so should be accounted for in a grammar of competence. Mechanisms for accomplishing this are presented in Pagotto 1985a and Pagotto 1985b. (iv) and (v) however require reference to clauseexternal and frequently extra-linguistic information, and so will be relegated to performance.
4.2.2.3 Optionality Is there a difference between missing constituents in languages such as English, Japanese, and Swahili? What is the grammatical difference if any between sentences such as (42), (43), and (44)? It seems to me that the examples represent three different kinds of phenomena. (42) is a pro-drop construction; definite subjects and objects are freely omissible in context for all verbs, and are recoverable (at least to the extent of recovering pronouns) from morphological information on the verb. (43) represents the 'optional actant' situation: any adnominal actant in Japanese is omissible as long as it is recoverable from context or from pragmatic information regarding the use of honorifics (cf. Dunbar 1973: 7; Lee 1986). This differs from (42) because it requires reference to clause-external and frequently extralinguistic information.
ROLES AND RELATIONS (42)
a- me- wa- ona
143
(Swahili)
3sg past 3pl see (S)he saw them. (43) kono hi to
ga
tabemasita
(Japanese)
this person Norn ate This person ate. (44) He ate. English 'optionality5 differs from optionality in Japanese and Swahili in several respects. (i) English has a less permissive attitude toward fragments than Japanese does. Subjects in Japanese, especially first and second person subjects, are very commonly omitted. In English, however, except in very casual speech and in imperatives, all actants which are required by the case frame but missing from the sentence (such as the subjects of infinitives) must be reconstructable from sentence-internal information. (ii) Optionality in English is to some extent lexically conditioned. (44) can be used in contexts where there is no linguistic or situational context available to establish the referent of the missing object; instead, the fact that what he ate was food rather than something else is part of the lexical semantics of eat. Certain other verbs on the other hand, such as devour and resemble, never lose their objects under any contextual circumstances. This can be explained if we say that English transitive verbs never drop their objects, and that ate in (44) is a derived intransitive. The Japanese verb taberu 'eat' does not seem to have a similar derived intransitive counterpart. According to most of my consultants, the object in an analogous out-of-context situation CANNOT be omitted: some object must be overtly expressed, for example (45). (45)
kono hi to
ga
gohan o
t h i s person Norn food
tabemasita
Ace ate
This person ate ( f o o d ) .
I conclude from these facts that there is no such thing as grammatical optionality, and that lexicase notation does not need to provide for that situation by allowing for a fourth value for contextual features, 'optional', in addition to +, —, and 3 . A verb is either transitive or it is intransitive, and
ROLES AND RELATIONS
144
there are only two kinds of inner actants in English, obligatory (as with the object of devour) and excluded (as with the object of reside), with apparent counterexamples such as (44) analyzed as zero-derived intransitive verbs. The contextual recovery of objects of transitive verbs is a matter of performance, with Japanese being more tolerant and resourceful in this area, and the lexicalization of indefinite objects by zero derivation of intransitives is a matter of grammar, with English being far more flexible in such derivation. This analysis allows for an explanation for the fact that: (i) the verb drink has a 'default5 object, 'alcoholic beverage', when used alone (Hudson 1984: 143): intransitive drink2 is derived from transitive drink x by intransitivization, and drink3 'drink alcoholic beverages' is subsequently derived from the intransitive drink2 by semantic shift. (ii) drink3 has different selectional restrictions from drink2; it can take the adverb heavily; and (iii) drink3 can serve as the input to further derivation rules to produce drinking (problem), (heavy) drinker, and perhaps (mixed) drinks, (strong) drink, etc. This situation is distinguishable from the Japanese taberu case. John Haig has suggested a test for telling whether taberu without an object was a distinct lexical item, in effect a derived intransitive verb. If tabemasita in Taroo ga tabemasita 'Taro ate' were an intransitive verb, it should allow an accusative object in the corresponding derived causative verb tabesaseru, but there is general agreement that the eater cannot be marked as accusative with tabesase- .6 (46)
Taroo ga Hanako ni gohan o tabesaseta [+Nom
+Otv
+Acc;
:+AGT
,+COR.
+PATJ
Taro made Hanako eat a meal. (46a) Taroo ga Hanako ni tabesaseta r+Nom
+Dtvi
l+PAT
.+CORJ
Taro made Hanako eat a meal. (46b) *Taroo ga Hanako o tabesaseta. [+Nom
+Acc]
l+AGT.
•PAT]
(Taro fed Hanako.)
ROLES AND RELATIONS
145
4.3 MACROROLES Lexicase has been getting along since 1971 with just two case-like categories, case forms and case relations. In recent years, however, facts about the languages of Austronesia and the work of relational grammarians have conspired to convince me that these are not enough, and that a third kind of category is required, with each category figuring in a different set of linguistic generalizations. Following Foley and Van Valin; I will refer to this new caserelated category as MAGROROLES, and to the two members of the set as ACTOR
(actr, [+aetr]) and UNDERGOER respectively. The category of macroroles in general and of Actor in particular, like those of Agent and Nominative, is established to account primarily for morphosyntactic rather than situational generalizations. Unfortunately, lexicase usage is somewhat different from Foley and Van Valin's. Again, as with Patient, the terminological alternatives were to choose a word whose content is similar but not identical to the lexicase concept, or to make up a completely new one. Once more I have chosen the former course because (i) I want to give credit to my predecessors, (ii) because I think the term is appropriate to the concept, and (iii) because I think it is easier for readers to modify a definition of an existing and familiar term than to assimilate a new and unfamiliar one. In the following sections I will define what lexicase means by these terms and then point out how they differ from the terminology of several other current frameworks. 4.3.1 Actor 4.3.1.1 Lexicase characterization A fairly good semantic definition of Actor in the lexicase sense is the definition proposed by Richard Benton for Kapampangan: 'the entity to which the action of the verb is attributed' where 'action' is interpreted broadly to include 'actions, happenings, and conditions in general' (Benton 1971: 167, quoted in Schachter 1976a: 498). 'Actor' in lexicase can be formally characterized in terms of a disjunction of case relations: it is the Agent of a transitive clause or the Patient of an intransitive one.7 In order to account for what relational grammarians refer to as 'inversion' constructions, it may be necessary to add a third subtype as well: the inner Correspondent of impersonal psychological verbs such as German halt in (47). (47)
Mir
ist kalt.
I feel cold.
[+actrl •COR
L+Dtv J While Actors are typically animate, this property does not have the criterial status in lexicase that it enjoys in Fillmorean case grammar, Anderson's
146
ROLES AND RELATIONS
localistic case grammar, and Government and Binding 'theta theory5. For example, the subject in (48) is grammatically an Actor in a lexicase analysis, though it is not erg (comparable to Fillmore's Agent) in Anderson's 1971 analysis (Anderson 1971: 95; but cf. Anderson 1977: 45); and (49) and (50) have Actor subjects in spite of the fact that they do not express volition. (48)
A statue occupies the p l i n t h . [+actr]
(49)
Marvin had his car towed away. [+actr]
(50) The general received a citation posthumously. [+actr] It appears that Actor, like Patient, is present in every clause. This idea goes back to Panini, as Harold Somers points out in commenting on the lexicase Patient Centrality constraint: This approach is reminiscent of Panini where too there was a primary obligatory case or karaka (the karta cf. ' . . . there can be no sentence in Sanskrit without one' (Ananthanarayana 1970: 19)) which figured in the descriptions of the other karaka (for example karma: 'that which is desired by the karta' (ibid., p. 1)). An interesting difference however is that for Panini the obligatory karaka was the Agent. [Somers forthcoming, Chapter 5]. The role Actor (actr) is not the same thing as lexicase Agent. For example, (51) and (52) illustrate where the categories may fail to coincide. The analysis of the subject of all intransitives as simultaneously Actor and Patient follows from the assumptions that both are obligatory categories, and from the subject choice hierarchy. It applies equally to ergative constructions, since in a lexicase Patient centrality analysis the representation of intransitive constructions is identical for both case-marking typologies. (51) The dog attacked the cat. +AGT
[+PAT]
+Nom •actr (52) John
r+PAT +Nom +actr
ran in front of the truck.
ROLES AND RELATIONS
147
4.3.1.2 Alternatives The Actor of a clause in the lexicase sense corresponds fairly closely to what has been termed the 'logical subject' in Chomskyan grammar. It should however be pointed out that this term has nothing to do with logic; according to Halliday, 'Logical Subject meant "doer of the action". It was called "logical" in the sense that this term had had from the seventeenth century, that of "having to do with relations between things", as opposed to "grammatical" relations which were relations between symbols' (Halliday 1985: 34). The main point where the two categories fail to coincide is in passives: for Chomsky, the agent of a passive is still the logical subject, but for lexicase it is MNS 'Means'. The fact that it is no longer a 'logical subject' accounts for the otherwise paradoxical observations regarding the scope of purpose clauses and adverbs such as intentionally described in Jackendoff 1972: 82-4 and illustrated in (53)-(54). In lexicase, however, the situational semantic definition is secondary, and grammatical criteria such as control of infinitival subjects and reflexives require that the subject of the passive, not the object of by, be treated as Actor. (53)
The doctor was deliberately irradiated by his own machine.
(54)
Mme. Clouseau was repeatedly attacked without warning by Kato to prepare herself for combat in the Metro.
'Actor' in lexicase overlaps to some extent with the colloquial terms 'actor' and 'agent' and with the categories 'initial 1' in relational grammar and 'Agent' in Fillmorean case grammar or Chomskyan thematic relations. The usual Fillmorean definition of Agent however involves volition or animacy, for example the 'typically animate perceived instigator of the action' (Fillmore 1968: 24), and this seems to be an implicit assumption of relational grammar as well, based on RG practice and to some extent on precept (cf. Perlmutter 1983: 12-13). Consequently the concepts diverge for non-volitional or inanimate entities, which could not be actors or agents or initial Is in Fillmorean, Chomskyan, or RG analyses. Again, 'agents' of passives would not be Actors in a lexicase analysis but would be actors or their equivalent in the other frameworks and in colloquial terminology. Thus Fillmorean practice and GB thematic relation analyses agree with Schachter's semanticsbased definition of actor (the 'Schachter actor'; Schachter 1976a: 498) in assigning the same label to the subject of a transitive verb and the by phrase of the corresponding passive. Foley and Van Valin provisionally define their 'actor' as 'the argument of a predicate which expresses the participant which performs, effects, instigates, or controls the situation denoted by the predicate', and the undergoer as 'the argument which does not perform, initiate, or control any situation but rather is affected by it in some way' (Foley and Van Valin 1984: 29). They characterize their categories as having syntactic as well as semantic implications (op.cit., p. 31), and not surprisingly there is a higher degree of
148
ROLES AND RELATIONS
correlation between the Role and Reference actor and undergoer and lexicase Agent and Patient respectively than there is with lexicase Actor and Undergoer, as shown by examples (55)-(64) taken from Foley and Van Valin 1984: 30. Foley and Van Valin's macroroles are shown in italics, and lexicase case relations and macroroles are added beneath. From these examples, it might appear that Foley and Van Valin had shaken off the old Fillmorean criteria of volition and animacy, but this is not so, as shown by (65)-(67). These examples are analyzed in Role and Reference Grammar as having undergoer subjects rather than Actors because they do not 'involve actions effected by willful, intending participants', but rather 'denote states or changes of states which the participant experiences or undergoes . . . Here again syntactic subject cannot be equated with Actor, since there are intransitive predicates which have undergoer subjects' (Foley and Van Valin 1984:29). Thus for these examples at least we are back to the early Fillmorean criterion of volition. As in the case-type analyses mentioned above, additional differences show up in the area of passives (see (57a), (59a), (63a), and (64a)), and so the Role and Reference analysis will be confronted with the same Jackendovian paradoxes that were mentioned earlier. (55)
Colin k i l l e d the t a i p a n . actor
AGT actr (56) The rock shattered the mirror. (57) The lawyer received a telegram. (58) The dog sensed the earthquake. (59) The sun emits radiation. (60) Phil threw the ball to the umpire. undergoer
PAT ndrg (61) The avalanche crushed the cottage. (62) The arrow hit the target. (63) The mugger robbed Fred of $50.00. (64) The announcer presented Mary with an award.
149
ROLES AND RELATIONS (65) The janitor suddenly became ill. undergoer
PAT actr (66) The door opened. (67)
Fritz was very unhappy.
(57a) A telegram was received by the lawyer. undergoer
actor
PAT
MNS
actr (59a)
Radiation is emitted by the sun. undergoer
PAT
actor
MNS
actr (63a)
Fred was robbed of $50.00 by the mugger. undergoer
PAT
actor
MNS
actr (64a)
Mary was presented with the award by the announcer. undergoer
PAT
actor
MNS
actr 4.3.1.3 Actor and Nominative In all the examples above it appears that lexicase 'Actor' is redundant, and could be replaced by 'Nominative', that is, Subject. In fact by the lexicase analysis the two categories do correspond completely in accusative languages such as English.8 However, a comparison with an ergative language reveals that this is not true across languages. In an ergative language, it is the Patient, not the Actor, which is encoded by the maximally unmarked case form, Nominative. In intransitive clauses, including antipassives, the categories will still coincide, but in transitive clauses, the Actor will be the grammatical
ROLES AND RELATIONS
150
Agent, in accordance with the universal definition, while the Nominative will mark the Patient. This can be shown with examples (68)-(70a) from Tagalog (Schachter 1976a: 496 ff., ergative analyses by Starosta). (68)
N a g t i i s ang babae ng kahirapan endured
woman
hardship
-trns
Norn
Gen
APS
PAT
MNS
actr The woman endured some hardship. (68a)
T i n i i s ng babae ang kahirapan +trns
Gen
Nom
AGT
PAT
actr A/the woman endured the hardship. (69)
Tumanggap ang estudyante ng liham received
student
letter
-trns
Nom
Gen
APS
PAT
MNS
actr The student received a l e t t e r . (69a)
Tinanggap ng estudyante ang liham •trns
Gen
Nom
AGT
PAT
actr A/the student received the l e t t e r . (70)
Lumapit ang ulap sa araw approached
cloud
sun
-trns
Nom
Lev
APS
PAT
LOC
actr The cloud approached the sun.
ROLES AND RELATIONS (70a)
151
Linapitan ng ulap ang araw +trns
Gen
Norn
AGT
PAT
actr A/the cloud approached the sun. 4.3.1.4 Grammatical functions of Actor Positing a separate Actor category is supported to the extent that such a category makes it possible to state more and better generalizations. Some of the generalizations which make crucial reference to this category are discussed below: IMPERATIVE CONSTRUCTIONS: The participant which is ordered to perform an action in an imperative construction is an Actor, and the actant which may be omissible in imperatives such as (71)—(73) is the Actor. Examples (74)-(76) from Tagalog demonstrate that this generalization applies to ergative as well as accusative languages, and that the crucial factor again is the Actor. (71) (You) go away! Norn PAT actr (72) (You) wash the dishes! Norn
Ace
AGT
PAT
actr (73) (You) be examined by a doctor immediately! Norn
Ace
PAT
MNS
actr (74) Umalis ka dian -trns
Norn PAT actr
Go away!
152
ROLES A N D RELATIONS (75) Hugasin mo +trns
ang pinggan
Gen Nom AGT PAT actr
Wash the dishes! (76) Maghugas ka
ng pinggan
-trns
Nom Gen
APS
PAT MNS actr
Wash some dishes! REFLEXIVIZATION: The element which usually controls reflexivization is the Actor rather than the subject;9 see for example (77)-(78). This holds true in passives as well, assuming the lexicase definition of Actor: an English passive is intransitive, so the subject of a passive is an Actor. Thus (79b) is better than (79a) because the subject, not the by phrase or the 'underlying Agent', is the Actor. (77)
The woman hurt
Nom AGT
trns
herself.
Ace PAT
actr (78) They think about themselves. Nom
-trns
PAT
Ace COR
actr (79) *Itself was bitten by the cat. 'Object'
'Agent'
(79a) *Itself was bitten by the cat. Nom
Ace
PAT
MNS
actr
ROLES AND RELATIONS
153
(79b) ?The cat was bitten by itself. Norn
Ace
PAT
MNS
actr In English and in accusative languages in general, Actor and subject (Nominative) coincide. However, we can see that the crucial category is Actor rather than Nominative by comparing (80)-(80a), examples from an ergative language, Tagalog (examples from Schachter 1976a: 503-4, analyses by Starosta). (80)
Nag-iisip
si la sa kanila-ng sarili
think about they
them
-trns
Norn
Lev
APS
PAT
COR
lig self
actr They think about themselves. •
(80a) Iniisip
w
nila ang kanila-ng sarili
consider they +trns
them
Gen
Norn
AGT
PAT
lig self
actr They consider themselves. WORD ORDER: The prevailing word order in Philippine languages such as Pangasinan must be stated in terms of Actor (Schachter 1976a: 507) rather than case form. Case form is morphologically marked on determiners and pronouns, but the word order is Verb-Actor-other constituents, regardless of whether the Actor is in the Nominative or non-nominative case form:10 IRR-1.
[+prdc]
—>
[-[-actr][+actr]]
and although word order in languages such as English is traditionally characterized in terms of 'Subject', that is, Nominative, it could also be formally characterized in terms of the Actor category because of the correspondence of these categories in an accusative language. CLITIC PRONOUNS: One of the difficulties in maintaining the Patient
centrality analysis of many ergative Austronesian and Australian languages is
154
ROLES AND RELATIONS
that while the case marking may operate completely in terms of the lexicase ergative subject choice hierarchy, the clitic pronouns may operate in terms of an accusative system (cf. Blake and Dixon 1979: 6). That is, the actant referred to by the clitic may not be the Patient or the subject (both terms used in the lexicase sense) but rather the Actor. The earlier lexicase system which maintains only two inventories of case elements, case forms and case relations, could not account for this phenomenon neatly, and in fact it was the clitic pronouns in Tsou (Starosta 1985d) which finally forced lexicase formally to recognize the existence of an additional inventory of macroroles. MORPHOLOGICAL DIFFERENTIATION: The criterion of maximal morpho-
logical differentiation (Schachter 1976a: 509-10) picks out the lexicase Actor in Tagalog. Schachter points out that in Tagalog, verbs in general have three aspects, but that 'actor-topic' verbs are inflectable for recent-perfective in addition to the other three aspects. He notes that only these 'actor-topic' verbs can agree with plural 'topics' (subjects), and that certain derived forms such as sociative verbs also only occur with 'actor topics'. According to the lexicase analysis, all such verbs are derived antipassives, verbs which have become intransitive and thereby acquired Actor subjects. As intransitives, they deemphasize the idea of one participant producing an effect on another, and emphasize the idea of an action being performed by an Actor without much regard for what it impinges on. From this point of view, the 'plural' is not really agreement, but simply 'multiple action', and the sociative too refers to an action engaged in by more than one Patient. NOM AND ACTOR: There are various other generalizations that can be stated for English in terms of either Nominative or Actor. For example, the term of nominalizations which can appear only as a possessive determiner (81)—(82) or preposed NP attribute (82a)11 is the Actor/Nom. (81)
the eltephant's destruction of the garden (Nom)
+N
COR
(actr) (PAT) (81a) the garden's destruction by the elephant (Nom)
+N
MNS
(actr) (PAT) (82) [John's winning the prize] amazed everyone. (Nom) (actr) (AGT)
+N
PAT
ROLES AND RELATIONS (82a) [John (Norn)
155
winning the prize] would be the last straw. +N
PAT
(actr) (AGT) 4.3.2 Undergoer The logical complement to the Actor is the Undergoer, or in Halliday's terms the Goal (or Patient, but not in the lexicase sense). He defines the term Goal as 'one to which the process is extended' (Halliday 1985: 103), which seems to work well enough for the cases observed so far. One difficult problem for lexicase has been the set of problems which relational grammarians treat under the heading of 'unaccusative'. Briefly, they divide intransitives into those which have an initial 1 argument and those which have an initial 2 on the basis of various grammatical properties, for example cooccurrence with 'have'-type or 'be'-type auxiliary verbs in European languages: Is are something like Fillmorean Agents and 2s are something like Fillmorean Objects (cf. Perlmutter 1983: 12-13), though Carol Rosen denies any semantic correlates with these grammatical relations (Rosen 1984: 38-9). However, it is not possible for lexicase to match Is with AGT's and 2s with PATs and set up two distinct classes of intransitive verbs because by Patient Centrality, intransitive verbs do not have Agents. The current hypothesis is that all intransitive verbs do have a PAT actant and no AGT, but that intransitives are divided into two different AKTIONSARTEN (cf. Helbig and Buscha 1972: 69-74) which interpret their PATs as either Actor or Undergoer (see further discussion in section 6.3.3.6). Thus in contrast to Rosen, we are treating this as a semantic subcategorization problem. 4.4 VERBAL DERIVATION The use of lexical derivation rules as a non-transformational alternative means of capturing the relationships among sentences is based on proposals made in Chomsky's Remarks on Nominalization' (Chomsky 1970) and JackendofFs subsequent development of the lexicalist hypothesis (Jackendoff 1972; Jackendoff 1975), though the use of this mechanism for relating active and passive sentences seems to have been a lexicase-internal innovation (cf. Starosta 1971b). Since most derivational relations among sentences involve the addition, subtraction, or reinterpretation of the case relations specified in a verb's lexical representation, it is only natural that much of the work in a theory which calls itself lexi-case should have been devoted to the characterization of such relationships. Given the size limitations of the present volume, it will not be possible to give more than a sampling of the literature on this topic that has accumulated since 1970, especially in the area of causativization. For more extensive information, the reader is referred to Starosta 1971b, Starosta
156
ROLES AND RELATIONS
1977; Chen 1985; Khamisi 1985; Sayankena 1985; Pagotto 1987, and the references cited in these works. 4.4.1 Verb classes The verb classes which can be defined in terms of cooccurrence with one or more inner case relations are universally characterized by the tree diagram given as Figure 4.10. The 'case frame5 for a given verb stem will be characterized as one or more of the implicational features shown in this tree. For example, the case frame for a simple transitive verb such as see would be simply p [ + A G T ] , D[+PAT]]. From the point of view of this classification, many of the sentence relatedness phenomena of languages such as English can be fruitfully viewed as the result of a single verbal root appearing in more than one stem form, with the stems occurring in distinct grammatical or semantic classes. In the remainder of this section, I will present examples of this phenomenon drawn from English. Examples are arranged according to the types of verbal derivation involved in relating the pairs of stems. For instance, section 4.4.2 includes pairs of sentences containing related stems such that one stem is regarded as being derived from the other by the addition of a particular case relation. For each pair of derivationally related examples (a and b) in each subsection, I will present two other pairs of examples in which the exemplified derivational process does not apply, so that a verb root appears in either only the source class (c) or only the target class (/), but not both (*