220 107 11MB
English Pages 365 [368] Year 1993
Grammar in Mind and Brain
Cognitive Linguistics Research 2 Editors Rene Dirven Ronald W. Langacker
Mouton de Gruyter Berlin · New York
Grammar in Mind and Brain Explorations in Cognitive Syntax
Paul D. Deane
1992 Mouton de Gruyter Berlin · New York
Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter & Co., Berlin
© Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability
Library of Congress Cataloging-in-Publication
Data
Deane, Paul Douglas. Grammar in mind and brain : explorations in cognitive syntax / Paul Deane. p. cm. — (Cognitive linguistics research ; 2) Includes bibliographical references. ISBN 3-11-013183-8 : (est. ; acid-free paper) 1. Cognitive grammar. I. Title. II. Series. P165.D4 1992 415 —dc20 92-32127 CIP
Die Deutsche Bibliothek
— Cataloging-in-Publication
Data
Deane, Paul D.: Grammar in mind and brain : explorations in cognitive syntax / Paul D. Deane. — Berlin ; New York : Mouton de Gruyter, 1992 (Cognitive linguistics research ; 2) ISBN 3-11-013183-8 NE: G T
© Copyright 1992 by Walter de Gruyter & Co., D-1000 Berlin 30. All rights reserved, including those of translation into foreign languages. N o part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Printed in Germany Printing: Gerike G m b H , Berlin Binding: Dieter Mikolai, Berlin
Acknowledgements
This book owes much to George Lakoff, Mark Johnson, and John Anderson, whose ideas helped me to conceive how to analyze the conceptual foundations of grammar. Special thanks to Jane Marmaduke, Lee Ann Kirkpatrick, Don Jones, and Dan White, who were always willing to listen as I worked out ideas; thanks also to Laura Janda, whose chance comment at the 1989 ICLA conference inspired Chapter 6. I would like to express my gratitude to Rene Dirven, Dick Geiger, Dirk Geeraerts, Geoff Nathan, Margaret Winter, and Jean Casagrande for their encouragement and support. I am particularly grateful to Ronald Langacker and Richard Hudson who reviewed the manuscript thoroughly and forced me to clarify key ideas, e.g., the exact nature of the Spatialization of Form Hypothesis and the precise relation between discourse function and entrenchment. Any remaining errors or confusions are of course my own. This book was partially supported by NEH Summer Stipend #FT-3262689, which enabled me to finish revising the manuscript in the summer of 1990 prior to submitting it to Mouton de Gruyter. A final, special thanks to my wife Debbie. While I was writing this book, my brainchild, she was bearing our second son and caring for him in infancy. I appreciate her patience and love both for me and for our children and all her efforts for our physical and spiritual welfare.
Contents
1. Island constraints as evidence for a cognitive theory of grammar 1.1. The fundamental issues
1 1
12. Syntactic autonomy: empirical issues 12.1. Extraction from NP: attribution effects 122. Extraction from NP in light verb constructions 123. Exceptional extraction from coordinate VPs
4 5 10 18
13. Alternatives to autonomy: functional and cognitive accounts 13.1. Functional correlates of extraction 132. Cognitive accounts of extraction
22 23 28
1.4. Attention and extraction 1.4.1. Psycholinguistic background 1.42. Topic and focus as attentional states 1.43. An attentional account of exceptional extraction
34 34 36 38
1.5. An alternative to strict modularity 1J.1. Grammar as metaphor? 1J2. C-command and syntactically channeled spreading activation
45 46
2. An integrated cognitive theory
48
55
2.1. Cognitive architecture
55
2.2. Knowledge representation 22.1. Temporal lists, spatial images, propositions, and kinesthetic representations 222. The perceptual grounding of propositional representations
57 57 59
23. Storage and retrieval from declarative memory
78
viii
Contents
2.4. Productions, relevance, and the matching process 2.4.1. Control of cognition: general considerations 2.42. Processing cost and the concept of relevance
81 82 85
2.5. Categorization and polysemy: a case study
88
3. The Spatialization of Form Hypothesis 3.1. The cognitive grounding of syntactic knowledge
95 95
3.2. Grammatical projections of the link schema 32.1. Types of linkage 322. Cooccurrence 323. Predication 32.4. Sense dependence 325. Referential dependence
97 98 98 100 101 102
33. Linkage, immediate constituency, and grammatical relations 33.1. A case study: restrictive versus nonrestrictive relative clauses 332. Taxonomy of grammatical relations 333. Mediated grammatical relations 33.4. The necessity of governors
102 104 106 113 115
3.4. Hierarchical structure 3.4.1. Headship and X-bar theory 3.42. Layers inside the phrase
117 117 121
3.5. Constituency and accessibility: the concepts of c-command and government
128
4. Applications of the theory to English syntax
137
4.1. Declarative memory for syntax
137
42. Interactions among schemata 42.1. Interaction between general and specific schemata: non-prototypical heads 4.2.2. 'Feature passing during interaction of discrete schemata 423. Tough movement and raising to subject: complex cases of schema interaction
142 143 151 154
Contents
ix
43. The syntactic function of productions 43.1. Productions which apply in basic clause structure 432. Effects of productions: that-trace effects
164 166 176
4.4. More prototype effects: 'believe'- and Svant'-class verbs
179
5. Attention and grammar
187
5.1. Topic and focus potential 5.1.1. Topic potential: salience through spreading activation 5.12. Intrinsic information focus: entrenchment and the orientation reflex
187 187
5.2. Entrenchment hierarchies 52.1. Properties correlating with entrenchment 522. Evidence of entrenchment
194 194 195
53. The Silverstein Hierarchy as an entrenchment hierarchy 53.1. Properties of the Silverstein Hierarchy correlating with entrenchment 532. Evidence that the Silverstein Hierarchy is an entrenchment hierarchy: naturalness as topic
199 200
5.4. Further evidence: viewpoint and reflexivization 5.4.1. Kuno's analysis of reflexives as an empathy phenomenon 5.42. Core reflexivization 5.43. Peripheral reflexivization 5.4.4. On the nonexistence of subject reflexives in English
205 206 210 215 222
5 J . Further evidence: ease of acquisition
224
5.6. Further evidence: directionality of metaphoric transfer
231
5.7. Other entrenchment hierarchies
236
5.8. Island constraints again 5.8.1. Grammatical mechanisms 5.82. Factors controlling automatic focus assignment 5.83. Syntactic patterns in extraction
237 237 241 243
190
201
X
Contents
6. Neurological implications of the theory
251
6.1. Background 6.1.1. Neural nets and tracts 6.12. Functional anatomy af the brain
251 251 253
62. Neurological implications: the Parietal Hypothesis 62.1. Basic predictions 622. Basic evidence for the Parietal Hypothesis 62 J. The inferior parietal lobe as the seat of the body schema 62.4. The itrferior parietal lobe as somatosensory integrator 62 J. The inferior parietal lobe as the seat of the object schema 62.6. Hemispheric specializations of the inferior parietal lobe
257 257 260 260 261 263 267
63. Aphasia and the Parietal Hypothesis 63.1. Underlying functional organization 632. Global aphasia and parietal agrammatism 633. Broca's aphasia and classical agrammatism 63.4. Wernicke's aphasia
271 271 278 281 293
6.4. Further implications
296
Notes
301
Bibliography
311
Index
329
Chapter one Island constraints as evidence for a cognitive theory of grammar
1.1. The fundamental issues What is the relation of grammar to mind and the brain? This is arguably the fundamental question of modern linguistics. Humans are the only known creatures to evince sophisticated linguistic abilities, and so it is natural to wonder what connection there may be between grammatical competence and other uniquely human abilities: i.e., mathematics and logic, metaphor and music, not to mention more mundane and less uniquely human capacities like memory, categorization and attention. Issues about the nature of mind necessarily raise questions about the structure of the brain, and so we are led to inquire not only about the relation of grammar to mind but about the physical embodiment of grammatical competence in the brain. These are large questions, whose answers must involve a variety of disciplines: Linguistics, philosophy, cognitive psychology, artificial intelligence research, neurology and in general, cognitive science. And yet there is a widespread conviction that language is the key to the locked room, that if we could fathom the intricacies of language we would have a Rosetta Stone for the study of human cognition. There may be widespread agreement as to the importance of the question: there is not even a whisper of consensus as to the shape of the answer. Despite the welter of claims, however, two essentially opposite views stand out. The first view stresses continuity between language and other mental capacities. Language is consistently placed in the context of its social and communicative functions. Linguistic structures, processes and categories are viewed as instantiations of the categories, processes and structures which comprise human intelligence. Emphasis is placed upon what Fodor (1983) terms horizontal faculties such as memory and attention. Language acquisition is looked upon largely as a learning process; it is assumed that differences between linguistic and nonlinguistic processes are a matter of degree. The second view stresses discontinuity between core linguistic abilities (i.e., grammar) and other, broader domains. Grammar is isolated and examined as an axiomatic formal system. Linguistic structures, processes and categories are viewed as specialized aspects of what Fodor terms a vertical faculty—a faculty concerned with a specific type of knowledge. Language
2
Island constraints as evidence for a cognitive theory of grammar
acquisition is viewed largely as the unfolding of an innately specified bioprogram. The latter position is broadly termed formalism and more specifically, Chomskyan rationalism, a view which has held center stage for three decades and has provided the impetus for detailed, insightful analyses of linguistic structure. The former position is associated with such terms as functionalism or cognitive linguistics, views founded on a critique of the formalist position. The two views differ on many issues, among the most important of which are the questions of special vs. general nativism and the autonomy and modularity of grammar. It is generally agreed, for example, that the properties of grammar involve innate aspects of human cognition. The early, extraordinarily rapid, and highly structured acquisition of language leaves no other conclusion. But it is far from clear whether the innate principles underlying language acquisition are specific to language or constitute general principles of cognitive structure which apply to a variety of different domains. Formalists prefer the former position, functionalists and cognitivists the latter (cf. Lightfoot 1982, 1984; O'Grady 1983,1986; Putnam 1980). Similarly, there has been continuing controversy over the question of the autonomy of grammar (what Harnish and Farmer 1984 term the external modularity thesis, as opposed to the question of internal modularity, whether the principles of grammar can themselves be divided into autonomous subsystems). Formalists have attempted to defend the thesis that syntactic competence is distinct in principle from the kinds of knowledge which underlie encyclopedic knowledge of the world, and that it can be analyzed independently of considerations of general cognitive structure. Functionalists and cognitivists have argued that syntax cannot be autonomous, that it forms part of a single fabric with other human abilities, as an aspect of general intelligence. The problem with large questions of this sort is that they are so broad as to leave considerable room for interpretation and debate, raising the prospect of interminable discussion without any progress towards resolution. The formalist position has an advantage in such ruminations, if only because its answers seem to shift the burden of proof onto the functionalist/cognitive position. If grammar is autonomous, if it is based on innate abilities specific to language, then any attempt to relate grammatical theory to broader cognitive capacities is doomed to failure. The only way to falsify the formalist position, therefore, is to produce a working counterexample—a worked out, detailed account of how grammatical knowledge fits into the larger picture, that is, how it is grounded in general cognitive principles and rooted in specific aspects of brain function.
The fundamental issues
3
This work is an attempt to discern what such a working counterexample might be like. Its strategy is as follows: The first goal will be to establish the relevance of cognitive concerns even for core syntactic phenomena. The present chapter will focus on that goal. First, evidence will be presented which militates against a strictly autonomist view of island constraints—a phenomenon which lies at the very core of syntax. Instead, a view will be advanced in which such phenomena depend critically on attentional states and other general cognitive variables. Second, an alternative to strict modularity will be advanced, based on what Lakoff (1987: 283) terms the Spatialization of Form Hypothesis, advancing the thesis that the capacity to process syntactic structure is based upon cognitive structures and processes which apply in the first instance to physical objects. The second goal will be to present an integrated cognitive theory within which it will be possible to elaborate upon the insights gained in Chapter One. Chapter Two will focus on this goal, sketching a general theory which incorporates: a) the insights regarding image schemas, conceptual metaphor, and natural categorization embodied in such works as Lakoff and Johnson (1980), Lakoff (1987,1990) and Johnson (1987). b) the insights into memory and recall processes embodied in John Anderson's (1983) theory of cognitive processing. c) the basic insights of Sperber and Wilson's (1986) theory of relevance. The third goal will be to elaborate a cognitively grounded account of syntactic structure. Chapter Three will fulfill this goal by elaborating a detailed syntactic theory based upon the Spatialization of Form Hypothesis. According to this hypothesis, syntactic structures are grounded in ordinary spatial thought via conceptual metaphor. That is, constituency is considered an abstract metaphorical projection of the basic PART-WHOLE schema which applies to ordinary physical objects. Similarly, grammatical relations are considered metaphoric projections of the LINK schema which captures the relation between parts of an integrated whole. It will be argued at length that basic grammatical properties emerge as metaphoric inferences out of basic inferential patterns applicable to ordinary physical objects. A fourth goal will be to explore the consequences of the Spatialization of Form Hypothesis for English syntax. Chapter Four will examine crucial grammatical patterns in English, arguing that their properties can be accounted for naturally from the theory sketched in Chapter Three. Several aspects of core grammar will be discussed, including such grammatical elements and constructions as complementizers and INFL, Raising, Equi, and
4
Island constraints as evidence for a cognitive theory of grammar
Tough Movement structures, thai-tra.ce effects, and Raising to Object structures. The fifth goal will be to examine the relation between grammatical structure and the management of attention. Chapter Five will examine a variety of linguistic phenomena in which attentional considerations are directly relevant. The chapter will argue that syntactic island constraints can be derived through an interaction of the Spatialization of Form Hypothesis with a general theory of attention. As a final goal, Chapter Six will apply the theory to the neurology of grammar. Interesting consequences ensue. The Spatialization of Form Hypothesis entails a close connection between bodily experience, spatial thought and grammar, an association which appears to be born out in the fundamental organization of the brain. The inferior parietal lobe of the brain plays a crucial role in linguistic processing, and is also the seat of the body schema and a crucial integrator of spatial information. The chapter therefore advances the Parietal Hypothesis, which claims that syntactic competence is normally seated in the left inferior parietal lobe. When this hypothesis is combined with the theory elaborated in Chapters Three and Four, interesting predictions emerge relating specific patterns of syntactic deficit to the location of brain damage, particularly in agrammatic Broca's aphasia. It is impossible in any work of reasonable length to address all the issues relevant to a hypothesis whose implications and scope are as sweeping as in the present volume. There will no doubt be many places where serious objections and counterarguments will be raised. Even so, there is much to be gained even from the first blurred photographs of an alien landscape: we may locate landing sites for future explorers, or learn at the least what obstacles and barriers will confront them.
1.2. Syntactic autonomy: empirical considerations The thesis of syntactic autonomy has often been treated as an unnegotiable given, part of the definition of the field rather than as a hypothesis to be accepted or rejected in accord with its empirical success. Yet it is, certainly, a strong hypothesis, ruling out a variety of interactions that we might otherwise expect to find. Syntactic autonomy is, of course, a special kind of hypothesis, serving to define a framework for analysis. Confronted with an apparent exception to syntactic autonomy, the analyst has many options, including finding an analysis which eliminates the apparent counterexample or redefining the phenomenon so that the exceptional material is no longer analyzed as a matter of syntax per se.
Syntactic autonomy: empirical considerations
5
For these reasons, it can be difficult to prove that autonomy has been violated. To do so, several constraints must be satisfied: (i) the phenomenon in question must be so central to syntax that it cannot be relegated to another domain (semantics, pragmatics, performance, etc.). (ii) interactions with extrasyntactic variables must be clearly present; (iii) the interaction must be intrinsic to the domain, and not submit to an account in which (for instance) syntactic overgeneration is subject to a pragmatic filter. Island constraints are usually cited as among the best evidence for syntactic autonomy, and hence for a modular, or at least formal syntactic theory. This has been the dominant interpretation from Ross (1967) on (cf. Bach-Horn 1976; Cattell 1976; Chomsky 1973, 1977a, 1977b, 1980, 1981, 1986). According to these accounts, there are structural limits on the operations of syntactic rules. The most important hypothesis claims that syntactic movement rules can only relate subjacent elements—i.e., elements not separated by more than one bounding node (e.g., clause or NP boundary). The effect is to place depth limits on syntactic rules, accounting for the ungrammatically of sentences like the following: (1)
a. *Who did you know the man that saw? [extraction from relative clause] b. *Who did you talk to Bill and? [extraction from a coordinate structure]
Island constraints are a quintessentially syntactic phenomenon, yet close examination reveals significant correlations and interactions between extraction processes and a variety of semantic, cognitive, and functional variables (Deane 1988a, 1991; Erteschik-Schir-Lappin 1979; Kluender 1990; Kuno 1987; Takami 1989). These studies present a prima facie challenge to the thesis of syntactic autonomy.
12.1. Extraction from NP: attribution effects One major class of examples involves exceptional extraction from NP. Typical examples include the following: (2)
a. Who did John take a picture of? b. Who do you have plans for? c. Which shops do you like the furniture in?
Sentences like (2) involve extraction from an adjunct of NP—a violation of Ross' Complex NP Constraint, of Subjacency, and of a variety of other strictly
6
Island constraints as evidence for a cognitive theory of grammar
syntactic accounts of island phenomena. The problem is that these sentences involve extraction from an adjunct of NP—a pattern which usually results in ungrammaticality, as in (3): (3)
*Who did John buy a picture that was for?
It has long been noted that the NPs which allow exceptional extraction have special semantic properties (Bolinger 1972; Cattell 1979; Kuno 1987). Essentially, the head noun describes an attribute, characteristic, or part of the extracted NP. (4) is Bolinger's example: (4)
a. Which store do you own the furniture in? b. *Which garage do you own the car in?
In (4a), where extraction is possible, there is a clear sense in which furniture, as a kind of permanent fixture, helps to characterize the building in which it has been placed. In (4b), where extraction is unacceptable, no similar relationship holds. (5) is Cattell's example. In this case, extraction works only if the head noun denotes a part: (5)
a. Which car do you like the gears in? b. *Which car do you like the girls in?
Finally, Kuno (1987) describes the relation as involving attribution. Specifically he claims that extraction from NP is possible whenever the head noun denotes an attribute of the extracted NP. Thus, extraction is possible in (6), where the head names an attribute, but not in (7), where the relationship is reversed: (6)
a. I have forgotten the name ofthat person. b. Who have you forgotten the name of?
(7)
a. I know people with the names Sue, Jeff and George. b. *Which names do you know people with?
Deane (1991) supports this interpretation at length, arguing that exceptional extraction is licensed when the head noun denotes an attribute and the extracted NP denotes what Langacker (1987, 1: 147-150) terms the cognitive domain against which the attribute is defined. There are a variety of NPs which allow extraction of an adjunct, as (8) illustrates:
Syntactic autonomy: empirical considerations
(8)
a. b. c. d. e. f. g. h. i. j. k.
7
Which newspapers do we maintain strict editorial control over? Which apartments do we have security keys to? Which reserve divisions do you know the secret locations of? Which girls did you notice shapely legs on? Which wines did you praise the excellentflavorof? Which books did you enjoy the varied contents of? Which judgements were there significant variations in? What did you give me a choice about? Who were you astonished at the incredible treachery of? Which products did you praise the high quality of? Which subjects did you discuss the controversial status of?
Deane (1991) argues that there is an underlying unity among these examples which can be revealed by inverting the attribute and domain nouns, in which case the resulting NPs employ one of two specialized prepositions: (9)
a. newspapers with strict editorial control b. apartments with security keys c. reserve divisions with secret locations d. girls with shapely legs e. wines with excellent flavors f. books with varied contents g. judgements with significant variations h. participation with no choice
(10) a. someone of incredible treachery b. products cf high quality c. subjects of controversial status Essentially, the preposition with is used in (9) to identify attributes of the type traditionally termed possessions, whereas the preposition of is employed in (10) to identify attributes of the type traditionally termed qualities. Deane (1991) terms these uses possessive with and predicative of. Consider (11) and (12): (11) a. b. c. d. e. f. g. h.
*How much editorial control do you publish a newspaper with? *What sort of security keys do you have an apartment with? *Which locations do you have divisions with? *What kinds cf legs do you like gjuis with? *What flavor do you like a wine with? *What kind of contents do you like a book with? *How much variation do you accept judgements with? *How much choice do you allow participation with?
8
Island constraints as evidence for a cognitive theory of grammar
(12) a. *How much treachery did he commit acts of? b. *How high a quality will you buy sugarcane of? c. *How high a status do you read books of? As (11) and (12) illustrate, possessions and qualities cannot be extracted, unlike their counterparts in (8). The data reviewed thus far suggest a correlation between extraction and semantic variables. They do not in and of themselves present counterevidence to the thesis of syntactic autonomy. Rather, these patterns are problematic because every attempt to reconcile them with the thesis of syntactic autonomy has failed, as can be gleaned from the literature reviews in Kuno (1987), Takami (1989) and Deane (1991). No syntactic generalization seems to account for the data. Perhaps the most obvious hypothesis would claim that extraction is possible from complements but not from modifiers. In such an account, possessive with and predicative of would be analyzed as modifiers, not as complements (as many of the PPs in (8) could reasonably be analyzed). It could then be argued that this structural difference is critical: that modifiers of NP are always islands, whereas PP complements of NP marginally allow extraction. There are two problems with this suggestion. The first problem is that it is inaccurate, except as a statistical generalization. Sentences like (13), for instance, both contain the preposition in used as a modifier, yet only (13a) is acceptable: (13) a. Which store did you buy [the furniture in] b. *Which crate did you buy [the furniture in]? Even predicative with allows extraction at least marginally in cases like the following:1 (14)
Arnold Schwarzenegger has the kind of muscles that your average red-blooded American male just dreams of heroes with.
Extraction does seem more frequent with complements than with modifiers, but it is not possible to predict the distribution of exceptional extraction from the complement/modifier distinction. Remaining proposals share one key characteristic: they place the relevant adjunct outside its apparent matrix NP when extraction takes place, thereby salvaging the generalization that extraction is impossible from adjuncts of NP. The most frequent suggestion assumes that there is a reanalysis rule which moves the PP out of its matrix NP (Chomsky 1977b; Köster 1978).
Syntactic autonomy: empirical considerations
9
Another proposal has suggested that the PP is never an adjunct of NP (Grosu 1978). Deane (1991) argues that these proposals share two major flaws: first, they cannot predict which NPs are subject to the special rules they postulate without allowing the rules to be semantically triggered; second, they cannot account for the full range of exceptional extraction. There are sentences involving what Deane (1988a) terms deep extraction, with elements being extracted across two, three, or even four levels of embedding with NP. The following examples from Deane (1988a, 1991) are typical. (15) illustrates extraction across two levels of embedding within NP, (16), across three levels, and (17), across four and five levels. (15) a. Which NPs are there unusual possibilities for extraction from? b. ?This is one newspaper that the editor exercises strict control over the publication of. c. Which laws do you advocate an end to the enforcement of? d. Which issues are you prepared to discuss the full range of opinions about? e. Which games have you discovered workable strategies for victory in? (16) a. Which committee did he have aspirations for appointment to the chairmanship of? b. Which pages does the editor exercise strict control over the height of the lettering on? c. At the annual convention, there were several games that the experts proposed alterations to the rules for victory in. (17) (In a context and sarcastic tone of voice which imply that Grice's maxim of manner is being violated on purpose:) a. My dear sir, this is the only committee that I have seen fit to extend recognition to your aspirations for appointment to the chairmanship of b. Very well, Ο Genius, if success at strange pastimes like playing variable rule games is how you wish to establish your credentials, then tell me: which variable rule games have YOU devised strategies for the exploitation of alterations to the rules for victory in? These sentences cannot be assimilated to Chomsky's, Roster's, or Grosu's proposals without requiring rules so powerful as to render the whole account vacuous. There would be little point in postulating island constraints if the grammar contained readjustment or association mechanisms which allowed the grammar to eliminate two, three or even more levels of embedding within NP.
10
Island constraints as evidence for a cognitive theory of grammar
122 Extraction from NP in light verb constructions Another class of exceptions arises with what is now generally termed the light verb construction. The light verb construction is built by combining three elements: (i) a so-called light verb like make or have; (ii) an abstract noun like claim, or hope; (iii) a phrasal modifier of the noun which supplies most of the actual content of the sentence. The following are typical examples of the construction: (18) a. b. c. d. e.
John made the claim that he was happy. Mary has hopes that she will win the championship. They have a chance to tell us about their plans. They have opinions about politics. They cast votes for their favorite candidate.
The light verb construction is set apart semantically by the fact that it usually can be paraphrased by similar sentences with a verb plus complement structure: (19) a. b. c. d.
John claimed that he was happy. Mary hopes that she will win the championship. They are enabled to tell us about their plans. They voted for their favorite candidates.
As early as Ross (1967) it was noted that the light verb construction allows extraction, a pattern which is ruled out if we substitute verbs with more specific semantic content: (20) a. How much money are you making the claim that the company squandered? b. *How much money are you discussing the claim that the company squandered? It was thus natural for Ross (1967) to analyze sentences like (18) and (20) as involving a reanalysis rule in which V—NP sequences like make a claim, have hopes, etc., are restructured as complex verbs, giving the sentences in (20) exactly the same structure as their counterparts in (19). Similar effects have been achieved in recent versions of Government-Binding theory through the assumption that light verbs assign no theta-roles of their own, but instead
Syntactic autonomy: empirical considerations
11
assign roles that are implicit in their object nouns (cf. Grimshaw—Mester 1988). Syntactic analyses of this sort presume that the light verb construction is discontinuous from ordinary verb—object structures. This is a questionable assumption. For example, Deane (1991) cites sets of sentences like the following, where there is a gradient of acceptability: (21) a. b. c. d. e. f. g.
Which posts did you get an appointment to? Which posts did you seek an appointment to? Which posts did you refuse an appointment to? ?Which posts did you appreciate an appointment to? ?Which posts did you discuss an appointment to? *Which posts did you describe an appointment to? *Which posts did you study an appointment to?
(22) a. b. c. d. e.
Which subject do you have opinions about? Which subject were we expressing opinions about? ?Which subject were we examining opinions about? ??Which subject were we describing opinions about? *Which subjects were we overhearing opinions about?
(23) a. b. c. d. e.
Who did you cast votes for the impeachment of? (?)Who did you find votes for the impeachment of? ?Who did you buy votes for the impeachment of? ??Who did you criticize votes for the impeachment of? *Who did you describe votes for the impeachment of?
Similar results can be noted even for examples like (24) with a complement clause: (24) a. b. c. d. e.
Who did you make the claim that I was acquainted with? Who did you advance the claim that I was acquainted with? ?Who did you reject the claim that I was acquainted with? ??Who did you discuss the claim that I was acquainted with? *Who did you write down the claim that I was acquainted with?
The gradience of the phenomenon is itself cause for doubt whether a purely syntactic analysis can be upheld. If light verbs differ in their grammatical properties as much as syntactic analyses claim, it is not obvious how one would account for gradients like those given above. Equally crucially, there is evidence for the relevance of such semantic variables as framing.
12
Island constraints as evidence for a cognitive theory of grammar
Consider the fully acceptable sentences in (21) through (24). In each case, there is a clear redundancy built into the phrase. Claims are the sorts of things that one makes or advances; votes are the sorts of thing that one casts; opinions are the sorts of things that one has or discusses; appointments to posts are the sorts of things one gets, seeks, or even refuses. This is consistent, of course, with the fact that light verbs make minimal, highly abstract semantic contributions. But note what happens with the borderline cases. They appear to be understood in terms of the same semantic frame as the fully acceptable cases. For example, to discuss appointment to a post is the same as to discuss getting appointed to a post. In the same way, to examine an opinion is to discuss it, and to buy a vote is to buy the way it is cast. The least acceptable sentences are those which are not easily construed in terms of the relevant frame. The connections between studying and appointment, voting and describing, claiming and writing are less than automatic to say the least. Other properties of the light verb construction support the above interpretation. Various authors (Deane 1988a, 1991; Takami 1989) have noted that the acceptability of extraction is often improved considerably if the extracted phrase contains or refers back to a lexical noun. Contrasts like the following are typical: (25) a. ?*What did we discuss the claim that anthropologists despise? b. What view of human nature did we discuss the claim that anthropologists despise? If semantic frames can make it easier to parse an extraction structure, then there is a ready explanation for the acceptability of (25b): its structure guarantees that the relevant frames will be cued before the extraction structure has to be processed. It has also been noted (Ross 1967) that the abstract noun in a light verb construction is implicitly controlled by the subject. That is, sentences like (21a), (22a), (23a) and (24a) are readily paraphrased with a possessive determiner referring back to the subject: (26) a. b. c. d.
Which posts did you get your appointment to? Which subject do you have your own opinion about? Who did you cast your vote for the impeachment of? Who did you make your claim that I was acquainted with?
The ability of speakers to infer this relationship implies that speakers already know that appointments are things that one gets for oneself, that one has one's own opinion, that one casts one's own vote, and so forth. In other
Syntactic autonomy: empirical considerations
13
words, there is every reason to believe that light verb constructions are construed directly in terms of background knowledge, or semantic frames, evoked by the object noun. It is possible in fact to demonstrate rigorously exactly what role semantic frames play in sentences like those given above. Wheeler (1988) sets forth a variety of tests which can be applied to determine whether information is or is a matter of semantic framing. We shall apply these tests to the semantic frame of making claims about subjects to illustrate just how intimately the possibility of extraction can interact with the content of the relevant semantic frame. Wheeler proposes that collocational restrictions may be used as a first test, since information which belongs together conceptually is likely to be used together linguistically. Let us therefore begin by examining sentences like (24a) and (24b) to see what collocational properties they exhibit. We shall focus on the relation between the verb and its object. Verbs like make are quite general in meaning, so collocational tests would reveal little for that verb. Collocational patterns for the verb advance in the abstract sense of (24b) are rather more revealing. In this sense, advance collocates with abstract nouns, especially speech-act nominalizations: (27) They advanced— a. a claim b. a suggestion c. a proposal d. an idea In short, advance collocates with speech-act nouns and with other abstract nouns (like idea) which may readily be construed in speech-act terms. It therefore falls into the same collocational class with such verbs as retract and defend: (28) a. They retracted a claim, suggestion, proposal, idea. b. They defended a claim, suggestion, proposal, idea. (29) a. Which spies has he retracted the claim that I am acquainted with? b. Which spies has he defended the claim that I am acquainted with? As (29) illustrates, while these are not light verbs per se they appear to allow extraction about as easily as the verb advance—i.e, not as well as with a light verb, but within the pale of acceptability: Certain other verbs, such as reject and discuss, also accept speech-act nominalizations as objects, but yield less acceptable results under extraction.
14
Island constraints as evidence for a cognitive theory of grammar
They have other properties of interest. To begin with, their object nouns are not construed as under subject control. That is, the object nouns in sentences like (30) are ordinarily construed as referring to speech acts not of the subject but of some other party: (30) a. I rejected the claim, suggestion, proposal, etc. b. I discussed the claim, suggestion, proposal, etc. The loss of implicit subject control renders sentences like (30) less like the light verb construction, and also reduces their acceptability under extraction. And yet, at the same time, there is some reason to believe that these sentences are still being construed in terms of a speech-act frame. The subject may not denote the speaker, but it arguably denotes an addressee to whom the claim, suggestion, etc. has been directed. However, they describe the reaction of the addressee to the speech act. In terms of Austin's theory of speech acts (Austin 1975), the verbs describe perlocutionary acts, not the speech act itself. In other words, the sentences may in fact be construed in terms of a speech-act frame, but the information which must be supplied is arguably peripheral to the frame. Finally, consider a verb like write. The verb itself describes a physical activity, one which may readily lend itself to the performance of such speech acts as making claims or advancing suggestions, but which need not involve the performance of a speech act at all. It is, in Austin's terms, a mere locutionary act. It may reasonably be argued that the clause as a whole is not construed in terms of a speech-act frame, even though its object is a speechact noun. It is thus of great interest that sentences like (24e) are essentially unacceptable. Wheeler's second test is context repair or construal, which occurs when an expression shifts from its expected interpretation in order to assimilate to information from a frame. For example, an expression like under the tree is normally interpreted in ways compatible with human interaction with trees—and not strictly literally. Such construal is arguably present in many of the examples discussed above. It may also be observed in the use of general abstract nouns like idea or thought. Sentences like (31) are entirely acceptable. In this context—after verbs like advance or defend— it is possible to construe nouns like thought and idea as expressing speech acts; that is, (31) has essentially similar import to (32):
Syntactic autonomy: empirical considerations
15
(31) a. / advanced the thought that he should resign. b. / defended the idea that he should resign. (32) a. I advanced the suggestion that he should resign. b. I defended the proposal that he should resign. In other words, there is a metonymic pattern in which prepositional nouns are construed as speech-act nouns; the two senses are connected by their common prepositional content. This pattern is only possible after verbs like advance and defend, whose abstract senses arguably evoke a speech-act frame. A light verb like make is unlikely to evoke any kind of frame by itself, in which case the metonymic interpretation is absent: (33) a. I made the suggestion that he should resign. b. *I made the thought that he should resign. (34) a. I made the proposal that he should resign. b. *I made the idea that he should resign. Such patterns provide evidence that abstract senses of verbs like advance evoke the same frame as the nouns which usually function as their objects, and hence support the claim that the light verbs which can be used to paraphrase them are construed in the same terms. On the other hand, verbs like write appear to evoke rather different metonymic patterns: (35) a. I wrote down the proposal that he should resign. b. I wrote down the idea that he should resign. In (35a) and (35b), both proposal and idea are construed as expressing sentences, that is, locutionary forms. Both are similar in import to (36): (36)
I wrote down the words "he should resign".
The difference in construal thus supports the thesis that the phrasal verb write down fails to evoke a speech-act frame. Wheeler's third test is based on the fact that frames represent stereotypic information, and so ought to count as given any time a word is used which evokes that frame. This test originates in Roger Schank's work on scripts (Schank—Abelson 1977) where he observed the possibility of sequences like the following:
16
(37)
Island constraints as evidence for a cognitive theory of grammar
He went into a restaurant but left without paying the bill.
The noun bill is definite on first mention, entailing that it is given in a restaurant context. We thus infer that it expresses information contained within the restaurant frame. When we examine a sentence like (36) in this light it is apparent that a verb like advance evokes a speech-act frame: (38)
John advanced an idea, but his suggestion was ignored.
The noun suggestion is definite when it has not been explicitly introduced, which makes sense if the verb advance has already evoked the relevant frame. Similar observations apply to a sentence like (39): (39)
Although a claim may be widely publicized, the retraction seldom attracts much publicity.
Here the noun, claim, enables the nominalization of retract to be definite, despite its lack of previous mention. Wheeler's fourth test is also due to Schank. For example, there are sequences like (40): (40)
When the waitress came, he had no money.
To understand this sentence, it is crucial to recognize that having money is necessary to pay the bill, so that the sentence describes an impediment to the normal sequence of events. Recognizing such an impediment entails having a background frame which specifies what the normal sequence should be. So far we have presented evidence which implies that there is a speech-act frame, specifically a frame for making claims, which includes the following information: (41) A claimant advances his claim to an audience. The audience then considers the claim, and may either accept or reject it. If the audience rejects the claim, the claimant may either defend it or retract it. This describes a natural sequence, and so it is possible to recognize impediments to it—what one would expect if this information constitutes a background frame. For example, the bizarre quality of (42) derives from its
Syntactic autonomy: empirical considerations
17
failure to adhere to the normal script for the presentation of claims to an audience. (42) John claimed that he was God, but no one was even willing to discuss it with him. When a man finally stopped to listen, he neither accepted nor denied John's claim but walked off shaking his head. Finally somebody stopped to tell him he must be crazy, but John just smiled and thanked him for his support. Wheeler's final test concerns metaphorical patterning. When a metaphor becomes conventional, it draws upon background information about the vehicle of the metaphor—i.e., on information contained in a semantic frame evoked by the literal meaning of the metaphorical expression. Moreover, a conventional metaphor by definition expresses knowledge about the subject of the metaphor which is generally shared and hence expresses background, framing information about the subject. Thus metaphorical language provides another source of evidence about framing. Much of the vocabulary associated with making claims is based on the conceptual metaphor A CLAIM IS AN OBJECT WHICH ONE PERSON OFFERS TO ANOTHER, as the following expressions illustrate: (43) a. He put forth a new claim. b. He held out a new claim for our consideration. c. He's pushing this weird claim on everyone he meets. In each case, the metaphor represents the speaker as placing the claim in a position where the audience may but need not choose to take it. Various metaphors then describe the audience's response: (44) a. He's latched onto this new theory of yours. b. You don't have to shove my theories back in my face. The structure of the metaphor is essentially that postulated for the frame: an initial proffering, with the audience having the option to reject or accept the claim. As we have seen, there are many different kinds of evidence for the relevance of semantic framing to extraction. The sentences in (24) which freely allow extraction are precisely those which recapitulate information from the background frame, with acceptability decreasing as the sentence evokes information less central to the frame. The implication, of course, is that light verbs differ not in kind but in degree from other verbs. Because of their
18
Island constraints as evidence for a cognitive theory of grammar
minimal semantic content, they are maximally subject to construal in terms of other frames, with the apparent facilitating effect this has upon extraction.
123. Exceptional extraction from coordinate VPs Another area where semantic factors appear to have an effect on extraction involves coordinate VP structures. In general, coordinate structures are subject to J.R. Ross' Across-the-Board condition on extraction, which requires parallel extraction from all conjuncts. Normally, that is, we observe patterns of acceptability like the following: (45) a. What did John eat and Bill refuse? b. *What did John eat and Bill refuse some bologna? c. *What did John eat some bologna and Bill refuse? With certain coordinate VP structures like (46) and (47), however, a different pattern emerges (Ross 1967; Goldsmith 1985; Lakoff 1986): (46) a. What did Harry go to the store and buy? b. How much can you drink, and still stay sober? (47) a. That's the stuff the guys in the Caucasus drink and live to be 100. b. Thats the kind of firecracker that you can set off and scare the neighbors. These sentences involve extraction from only one conjunct: either the first conjunct (in 46b and 47) or the second (in 46a). Such patterns are not supposed to occur in coordinate structures. Two observations are critical: (i) except for their behavior with regard to extraction, these sentences appear to be normal coordinate structures, so they constitute genuine exceptions to the coordinate structure constraint; (ii) semantic variables seem to license these exceptional patterns of extraction. Lakoff notes three patterns, or scenarios: Α-scenarios, or natural sequences such as (46a); B-scenarios, or violations of expectation, such as (46b); and Cscenarios, or cause-result sequences, such as (47). In short, we appear to have a semantically licensed exception to the Coordinate Structure Constraint. Lakoff (1986) presents a variety of arguments for both of the points given above. He points out that sentences of the type under consideration display properties only observed in coordinate structures: multiple conjuncts, Acrossthe-Board extraction, and comma intonation. Among the examples cited are:
Syntactic autonomy: empirical considerations
19
(48) a. What did he go to the store, buy, load in his car, drive home, and unload? b. How many courses can you take for credit, still remain sane, and get all A'sin? c. Sam is not the sort of guy you can just sit there, listen to, and not want to punch in the nose. d. This is the kind of brandy that you can sip after dinner, watch TV for a while, sip some more of, work a bit, finish o f f , go to bed, and still feel fine in the morning. e. I went to the store, bought, came home, wrapped up, and put under the Christmas tree one of the nicest little laser death-ray kits I've ever seen. It is difficult to see how such sentences could be analyzed as anything other than coordinate in structure. Lakoff (1986) rebuts one counteranalysis, which suggests that these are really parasitic gap structures. There is, however, a lack of parallelism between the two structures: (49) a. Sam is not the kind of guy you can just sit there, listen to, and not want to punch in the nose. b. *Sam is not the kind of guy you can just sit there while listening to without wanting to punch in the nose. (50) a. How many courses can you take for credit, still remain sane, and not get bad grades in? b. *How many courses can you take for credit while still remaining sane without getting bad grades in? Another, more plausible suggestion would treat the exceptional sentences as being like serial verb constructions. Semantically, this is appropriate, for in the languages that employ them, serial verbs are used instead of coordinate structures for precisely the kinds of narrative relationships expressed in sentences like (48). In fact, such a suggestion has been advanced in a recent working paper (Pullum 1990). It is less clear that such an analysis is syntactically appropriate. English, of course, is not a language with explicit serialization structures, so that we are dealing with an argument whose strength depends upon indirect evidence. There is, in particular, no obvious empirical difference between postulating a semantically licensed exception to the Coordinate Structure Constraint and positing a reanalysis of coordinate VPs which just happens to occur under semantically defined conditions.
20
Island constraints as evidence for a cognitive theory of grammar
There is, moreover, at least some evidence to suggest that such an analysis would run into serious difficulties. Consider the following sentence: (51)
Which items does John admit that he went to the store but deny that he bought?
This sentence appears to be acceptable or marginally acceptable for some speakers. (One factor which seems to aid acceptability is an intonation pattern which highlights the contrasting verbs admit and deny.) Yet it does not seem plausible to reanalyze such a sentence as a serial construction syntactically, for the sequence in question is buried inside a coordinate structure which explicitly contrasts the two main verbs. The evidence reviewed thus far suggests that we are dealing with genuine exceptions to the Coordinate Structure Constraint. It is, moreover, a set of exceptions with clear semantic motivation. Deane (1991) refines Lakoffs classification, correlating specific semantic functions with specific patterns of extraction. Certain conjuncts appear to function as background or explanation within a larger, narrative sequence. These are the VPs which need not undergo across-the-board extraction. According to Deane (1991), there are six types of coordinate VP which need not undergo across-the-board extraction: First, there are preparatory action conjuncts, i.e., VPs denoting actions undertaken not for their own sake but as part of an established routine for performing some other action. Preparatory action conjuncts precede the main action conjunct, as illustrated below: (52) a. b. c. d. e.
What did he go to the store and buy? Who did he pick up the phone and call? Who did he grab a pen and start writing to? Who did he open his arms wide and hug? What did he sit down and start typing?
These correspond to the majority of Lakoffs A-scenarios. Second, there are scene-setter conjuncts, VPs which provide background information about the scene in which the main action(s) take place. Examples include: (53) a. Sam is not the sort of guy you can just sit there and listen to. b. Who did you stand in the parlor and tell jokes about? c. Which party did we wear Halloween costumes and get drunk at?
Syntactic autonomy: empirical considerations
21
Third, there are internal cause conjuncts, which describe an internal state which causes the agent to perform the main action. These include both mental and physical states: (54) a. Which problem did he get bored and give up on? b. Who did he go berserk and start shooting at? c. What did he lose his balance and fall on top of? d. Which part of your plan did he get confused and forget? Fourth, there are incidental event conjuncts, which describe events which are incidental to the main narrative line. Most often, these are sandwiched between main event conjuncts, but they can occur finally also: (55) a. This is the sort of brandy that you can sip after dinner, watch TV for a while, sip some more of, work a bit, finish o f f , go to bed, and still feel fine in the morning. b. This is the kind of job that you can work on all morning, take a lunch break, and finish off by 2 p.m. c. What did you talk about all night, take a shower, and then have to lecture on at your 8 a.m. class? (56)
This one of those unforgettable meals that you eat in front of the TV and watch Monday night football.
Fifth, there are violation-of-expectation conjuncts, which describe an event which departs from the normal and expected sequence. These correspond to LakofPs B-scenarios: (57) a. b. c. d.
How much can you drink and still stay sober? How many courses can you take for credit and still remain sane? Sam is not the sort of guy you can listen to and stay calm. How small a meal can you eat and feel satisfied?
Finally, there are result conjuncts, which describe consequences of the main action. These correspond to Lakoffs C-scenarios: (58) a. b. c. d.
That's the stuff the guys in the Caucasus drink and live to be 100. What did you set off and scare the neighbors? What kind of herbs can you eat and not get cancer? This is the kind of machine gun you can shoot off and kill a thousand men a minute.
22
Island constraints as evidence for a cognitive theory of grammar
In fact, semantic differences among the types appear to result in different patterns of extraction. Certain types normally function as islands with respect to extraction, including preparatory actions, scene-setters, and incidental events: (59) a. b. c. d. e.
??Which store did he go to and buy groceries? *Which phone did he pick up and call his mom? *What did he grab and write his Congressman? *What did he pick up and call me? *What did he open wide and hug me?
(60)
*There is no place I have sat in and listened to Sam.
(61)
*How long a break can you work on a job all morning, take, and still finish off the job by 2 p.m.?
Internal cause conjuncts, violation-of-expectation conjuncts, and result conjuncts are not islands—extraction from them is often possible: (62)
Which park did the Hollywood Hunter go berserk in and let loose with his AK-47?
(63)
Who can you take twelve tranquilizers and still stay angry at?
(64)
How many enemy troops can you take this machine gun and kill in one minute flat?
We observe, in other words, a variety of semantically licensed exceptions to syntactically-defined island constraints. Such exceptions seem incompatible with the thesis of syntactic autonomy.
1.3. Alternatives to autonomy: functional and cognitive accounts Thus far we have exclusively considered sentence types which seem to constitute semantically licensed exceptions to the normal syntactic patterns. Their existence establishes a prima facie case against the autonomy of syntax. A close examination of ordinary extraction patterns yields further arguments in favor of an account which pays heed to functional, semantic/pragmatic, and cognitive variables. The syntactic process of extraction logically involves three aspects: (i) the extracted phrase; (ii) the extraction site (i.e the gap and/or the matrix phrase
Functional and cognitive accounts
23
containing the gap); (iii) what we shall term the bridging structure—the syntactic configuration which intervenes between the extracted phrase and the extraction site. The literature indicates that there may be special cognitive and functional properties associated with each of these aspects. Kuno (1976, 1987) and Erteschik-Schir and Lappin (1979) indicate that the extracted phrase must be a potential topic, or at least be potentially dominant (meaning that the speaker intends the hearer's attention to focus on it). Takami (1989) argues that the extraction site constitutes new, or "more important" information than the rest of the clause. Kluender (1990) argues that the bridging structures should contain a minimum of semantic barriers, phrases whose meaning blocks the capacity to attend to more deeply embedded structures. Deane (1991) argues for an analysis which attempts to integrate Erteschik-Schir and Lappin's, Kuno's and Takami's theories, arguing that the extracted phrase and the extraction site command attention simultaneously when extraction can proceed—and that potential topic and focus status are the natural means by which this can occur.
13.1. Functional correlates of extraction Among the major functional theories of extraction is that proposed by Susumo Kuno, which employs the concept of sentential topic (or theme, cf. Firbas 1964). The theory argues that extracted phrases are topical, or at least potential topics: Topichood condition for extraction: Only those constituents in a sentence that qualify as the topic of the sentence can undergo extraction processes (i.e., Wh-Q Movement, WTi-Relative Movement, Topicalization, and It-Clefting.) (Kuno 1987: 23) Kuno tests for (potential) topic status by prefixing the participial phrase speaking of X. Kuno argues on these grounds that topicality correlates with extraction in a variety of structures. Consider the following contrasts: (65) a. *This is the child who John married a girl who dislikes. b. This is the child who there is nobody who is willing to accept. (66) a. *The person who I will go to see Mary if I can't see is Jane. b. The person who I would kill myself if I couldn't marry is Jane. In each case, there is also a contrast in potential topic status. That is, we observe parallel contrasts like the following:
24
Island constraints as evidence for a cognitive theory of grammar
(67) a. b. (68) a. b.
?Speaking of the child, John married a girl who dislikes her. Speaking of the child, there is nobody who is willing to accept her. ?Speaking of Jane, I will go to see Mary if I can't see her. Speaking of Jane, I would kill myself if I couldn't marry her.
The (a) cases are not only somewhat less acceptable than the (b) cases, but also read more as an attempt to change the subject than as a comment on the putative topic. In short there appears to be a correlation between the availability of extraction and the ease with which the extracted phrase can be treated as topic. Another major functional theory is that of Erteschik-Schir and Lappin (1979) who argue for a direct correlation between extraction patterns and potential dominance (in the psychological sense). They define dominance as follows (1979: 43): a constituent c of a sentence S is dominant in S if and only if the speaker intends to direct the attention of his hearers to the intension of c, by uttering S. The major test for dominance is the so-called Lie Test, which is based on the assumption that one can only effectively deny the truth of matters on which one's audience is capable of focusing its attention. By this test, the clause that Orcutt is a spy is dominant in (69) but not in (70): (69)
Bill said: John believes that Orcutt is a spy. a. which is a lie—he doesn't. b. which is a lie—he isn't.
(70)
Bill said: John carefully considered the possibility that Orcutt is a spy. a. which is a lie—he didn't (consider it). b. *which is a lie—he isn't (a spy.)
The correlations between extractability and (potential) dominance of the extracted phrase appear quite systematic. For example, the dominance of a picture noun complement and its extractability both depend on the absence of a specified subject, as (71) and (72) illustrate. Likewise, the island status of relative clauses correlates with the nondominance of phrases within a relative clause, as (73) illustrates: (71)
You saw a picture of the Prime Minister yesterday. a. Do you remember it? b. Do you remember him?
Functional and cognitive accounts
(72)
You saw Mary's picture of the Prime Minister yesterday. a. Do you remember it? b. *Do you remember him?
(73)
Bill said: I saw the man who was reading the Times yesterday and invited him home for dinner. *which is not true, the Times didn't appear yesterday.
25
Similar observations apply to coordinate structures and sentential subjects (which fail the Lie Test and are islands under extraction) and extraction from ί/ιαί-clauses (which fail the Lie Test and are islands only with manner-ofspeaking verbs like lisp): (74)
Bill said: The nurse polished her trombone and the plumber computed my tax. a. *Ifs a lie—she didn't. b. *Ifs a lie—he didn't. c. It's a lie—they didn't.
(75)
Bill said: That Shelia knew all along is likely. a. which is a lie—it isn't. b. *which is a lie—she didn't.
(76)
Bill said: you said that he had committed a crime. a. which is a lie—you didn't. b. which is a lie—he hadn't.
(77)
Bill said: Jane lisped that she had committed a crime. a. which is a lie—she didn't. b. * which is a lie—she hadn't.
Both Kuno's and Erteschik-Schir and Lappin's theories link extraction to the functional status of the extracted phrase. Another theory, that presented in Takami (1989), links it instead to the status of the extraction site. Takami's theory directly focuses on preposition stranding, although he suggests it may extend to extraction from NP as well. Essentially, Takami claims that extraction sites represent new or "more important" information (like the similar concepts of focus in Givon (1976) and information focus in Huck and Na (1990). (78) and (79) illustrate the contrast with which Takami is concerned.
26
Island constraints as evidence for a cognitive theory of grammar
(78) a. John gave the book to a young girl. b. The gang opened the safe with a drill. c. John was still a small boy in 1950. (79) a. Which girl did John give the book to? b. What did the gang open the safe with? c. *Which year was John still a small boy in? Takami characterizes the PPs to a young girl and with a drill in (73) as communicating more important information than the rest of their respective VPs. On the other hand, the PP in 1950 is background information, and does not provide the most important information within the VP. Takami proposes that ability to function as the focus of question or negation is a useful test of the relative importance of information, pointing out a direct correlation between these tests and the extraction patterns displayed above: (80) a. Did John give the book to a young girl? (no, to a grownup) b. Did the gang open the safe with a drill? (no, with dynamite) c. Was John still a small boy in 1950? (*no, in 1940) (81) a. John didn't give the book to a young giri. (... but to a grownup) b. The gang didn't open the safe with a drill. (... but with dynamite) c. John was not yet a grownup in 1950. (... *but in 1960) He therefore postulates the following theory: (82) An NP can only be extracted out of a PP which may be interpreted as being more important (newer) than the rest of the sentence. He provides the following definition of more important information: (83) An element in a sentence represents new (more important) information if the speaker assumes that the hearer cannot predict
τι
Functional and cognitive accounts
or could not have predicted that the element will or occur in a particular position within the sentence.
would
Takami modifies this position slightly after examining cases like the following: (84) a. b. c. d.
*What did John eat salad without? *What do you eat everything except for? *Which parent's wishes did you get married against? *What did John climb up the mountain in spite of?
These cases, he points out, are counterexamples to Kuno's and ErteschikSchir and Lappin's theories, for the extracted phrase is both a potential topic by Kuno's tests and dominant by the Lie test. It is also the focus of question and negation: (85)
Bill said: John got married against his father's wishes. Which is a lie: it was against his mother's wishes.
(86)
Speaking of his father's wishes, John got married against them.
(87) a. Did John get married against his father's wishes? (no, against his mother's) b. John didn't get married against his father's wishes but against his mother's. Thus both Kuno's and Erteschik-Schir and Lappin's theories falsely predict the possibility of extraction, as does Takami's. However, he claims that with one modification his approach yields correct predictions. He hypothesizes that: (88) An NP can only be extracted out of a PP in which the NP may itself be interpreted as being more important than the rest of the sentence. He argues that in these sentences there is an implicit negation in the preposition which makes it provide more important information than its object. That is, he claims that the problem with extraction in (84c) is essentially the same as that seen in (89): (89) a. John went to Hawaii without his wife. b. *Who did John go to Hawaii without?
28
Island constraints as evidence for a cognitive theory of grammar
132. Cognitive accounts of extraction Two recent works take a cognitive approach to extraction, arriving at distinct but arguably complementary conclusions: Kluender (1990) and Deane (1991). Unlike the authors reviewed above, Kluender proposes that difficulties in extraction result from semantic properties of the bridging structure which intervene between the extracted phrase and the extraction site. His approach begins by postulating the concept of a semantic barrier. Kluender's theory may be characterized as follows (1990:188): (a)
(b) (c)
Open-class, low-frequency, referentially specific constituents are the best candidates for extraction but simultaneously difficult to extract over (semantic barriers). Conversely, closed-class, high-frequency, referentially nonspecific constituents are relatively easy to extract over. Severity of violation in extraction processes can at least partially equated with the number of semantic barriers crossed.
He relates these patterns both to Kuno's observations about the topical properties of extracted phrases and to Keenan's (1974) Functional Principle. High frequency, open-class, referentially specific elements are salient, making it easy for them to qualify as topics; on the other hand, if the same elements occur between the extracted phrase and its extraction site, they violate the principle that functionally dependent elements should not refer independently of their arguments. The most interesting aspect of Kluender's analysis is its presentation of a large mass of data which suggests that extraction structures can be graded in acceptability according to the number of semantic barriers to be crossed. For example, there is a scale of acceptability from the fully acceptable (but exceptional) extractions at the top of the following list of examples to the obvious violations of the island constraints at the bottom; as Kluender illustrates with sentences like (90). Kluender is able to subsume a variety of effects under the concept of semantic barrier. The minimal contrast between (90a) and (90b) involves substitution of a referentially specific subject (we) for the indefinite pronoun someone. This introduces the first semantic barrier. A second semantic barrier is added simultaneously through the shift from an infinitive phrase (which is nonspecific in reference) to a finite clause with a specific time reference.
Functional and cognitive accounts
29
(90) a. This is a paper that we really need to find someone to intimidate with. b. This is a paper that we really need to find someone we can intimidate with. c. This is a paper that we really need to find someone that we can intimidate with. d. This is a paper that we really need to find someone who we can intimidate with. e. This is the paper that we really need to find the linguist who we intimidated with. f. This is the paper that we really need to razz the linguistic who we intimidated with. g. This is the paper that the audience really need to razz the linguist who we intimidated with. h. This is the paper which the audience really need to razz the linguist who we intimidated with. i. Which paper do the audience really need to razz the linguist who we intimidated with? j. What do the audience really need to razz the linguist who we intimidated with? The contrast between (90b) and (90c) involves the insertion of the complementizer that, a closed-class element which nonetheless characterizes the embedded clause as declarative and thus renders it marginally more specific in reference. Substituting who for that in (90d) adds a further element of referential specificity while substituting a less frequent word. (90e) adds several semantic barriers at once: first, the relative clause is now part of a definite NP (one barrier) with an open-class head noun (a second barrier) and it is simple past tense rather than employing a modal (an increase in referential specificity). In addition, the matrix NP a paper is made definite and hence referentially more specific. The multiple barriers seem to correspond to a relatively large drop in acceptability. In (90f) the frequent verb find is replaced by the far less frequent verb razz; then in (90g) the closed-class subject pronoun we is replaced by the open-class subject the audience. Then in (90h) the nonreferential complementizer that is replaced by the more referential and less frequent relative pronoun, which. In (90i) the shift from a relative clause structure to an interrogative structure makes the extracted phrase less referential, hence a poorer candidate for extraction; substituting the closed-class element what makes it an even poorer candidate. As a result, (90j) is a fairly standard island violation.
30
Island constraints as evidence for a cognitive theory of grammar
While Kluender focuses on specific properties which predict the acceptability of extraction, Deane (1991) seeks to infer underlying cognitive principles from them. Reviewing Kuno (1976, 1987), Erteschik-Schir and Lappin (1979) and Takami (1989), the article observes that each of these functional theories makes crucial reference to the concept of attention. To begin with, Kuno claims that the extracted phrase is topical (perhaps more accurately, a strong potential topic). But capacity to attract attention is a logical prerequisite of topic status, since it would be difficult to talk about something to which the audience could not be expected to attend. ErteschikSchir and Lappin's theory, of course, appeals directly to attention in its definition of dominance. Likewise, Takami's theory proposes that the extraction site describes new, or more important information than the rest of the sentence, which is why it can function as the focus of question and negation. But new or more important information is precisely the kind of information which is likely to attract attention automatically. In a sense, it is intrinsically focal. Deane (1991) then proposes an analysis which may be outlined as follows: (i)
(ii)
Extraction is an intrinsically difficult processing task since the extracted phrase and its extraction site are discontinuous but must be processed together. In other words, extraction can be analyzed as a task which places a heavy load on the attentional resources available in short-term memory. Intrinsically difficult tasks may be impossible to perform except under conditions which optimize performance. If the problem with extraction is an attentional overload, then optimal performance should occur when the extracted phrase and the extraction site attract attention automatically. It should also help if the rest of the sentence attracts too little attention to function as distractors. This theory readily subsumes Kuno's, Erteschik-Schir and Lappin's and Takami's approaches. And it meshes naturally with Kluender's account as well.
Kuno's theory (and also Erteschik-Schir and Lappin's) may be reinterpreted as claiming that the extracted phrase is readily extractable because it is naturally salient and hence a potential topic. The failure of Kuno's theory to account for sentences like (79) can be explained by pointing out that it describes only those conditions which apply to the extracted phrase; for extraction to occur, the bridging structure and the extraction site must also display appropriate properties. Takami's theory may be interpreted as claiming that the extraction site provides new or more important information which naturally attracts the
Functional and cognitive accounts
31
focus of attention. As Deane (1991) points out, it does not explain sentences like (91): (91) a. He sang with sorrow. b. He was acquitted out of cowardice. c. He danced from joy. As (92) and (93) illustrate, these sentences pass all of Takami's tests: (92) a. Did he sing with sorrow? (... no, with joy.) b. Was he acquitted out of cowardice? (... no, out of wisdom.) c. Did he dance from joy? (... no, from sorrow.) (93) a. He didn't sing with sorrow. (... but with joy.) b. He wasn't acquitted out of cowardice. (... but out of wisdom.) c. He didn't dance from joy. (... but from sorrow.) And yet they resist extraction: (94) a. *What emotion did he sing with? b. *What attitude was he acquitted out of? c. ??What emotion did he dance from? These sentences present no difficulty, however, if we recognize that Takami's theory only characterizes properties of the extraction site. The extracted phrase must also display the appropriate properties, and as Deane (1991) points out, it does not. That is, the object of the preposition is not a potential topic: (95) a. * Speaking of sorrow, he sang with it. b. * Speaking of cowardice, he was acquitted out of it. c. *Speaking of joy, he danced from it. Finally, consider the variables which Kluender isolates: referential specificity, open- vs. closed-class status, and frequency of occurrence. These are precisely the kinds of variables which affect salience. That is, referential,
32
Island constraints as evidence for a cognitive theory of grammar
open-class, low-frequency items are intrinsically salient whereas nonreferential, closed-class, high-frequency items (i.e., grammatical function words) are the very opposite. In other words, we may account for Kluender's data by asserting that extraction is best when the extracted phrase is highly salient (or topical) but the bridging structure is not salient at all. Note, however, that this approach yields slightly different effects than Kluender's actual thesis. On Kluender's account, a phrase counts as a semantic barrier regardless of its functional status, as long as its referential properties, frequency, and open- vs. closed-class status are not altered. According to the present theory, however, anything that reduces the salience of the bridging structure will facilitate extraction. In particular, Deane (1991) and Takami (1989) argue that extraction is facilitated when the bridging structure is presupposed or given. For example, Takami points out that sentences like (89) do not work well out of context: (96) a. ?*What did John eat salad without today? b. ?*What sort of weather did the ship leave port in spite of? However, in a context where the bridging structure is unambiguously given and/or presupposed the extraction becomes acceptable (Takami 1989: 324, 325): Imagine a situation in which everyone knows that John always eats salad with two of the following: French dressing, Thousand Island dressing, and Italian dressing. Given a situation like this, the sentence What did John eat salad without t today ? will turn out fine. Suppose that a teacher and her pupils are reading a story about a ship. In the story, the ship leaves port in spite of a storm, the people are all worried, but the ship doesn't come back even after it gets dark . . . After reading the story, the teacher could ask the pupils the following questions to check the pupils' understanding of the story: ... What sort of weather did the ship leave port [in spite oft]? Similar observations may be made about sentences like (97a), noted in Kuno (1987). As Kuno notes, in a context like (97b) in which the bridging structure is given, the sentence becomes acceptable. In short, the present theory, unlike Kluender's, makes it possible to account for the effect of contextual factors on extraction.
Functional and cognitive accounts
33
(97) a. *Who did they destroy more pictures of? b. Speaker A: Right after Chairman Mao died, they started taking pictures of Central Committee Members off the wall. Speaker B: Who did they destroy more pictures of, Chairman Mao or Jiang Qing? At this point it is important to note a crucial criticism of Takami's theory and of a natural interpretation of Kuno's theory: their assumption that extraction relates to actual functional status. It is doubtful that extractability is a mere reflex of discourse function. Extraction is simply not as context-sensitive as topic and focus assignment. The very same sentence may have very different topics and foci in different discourse contexts without a concomitant shift in extraction patterns. There are some places where extraction is sensitive to context, but these appear limited to manipulations of the status of the bridging structure. For the most part, extraction patterns are stable from one context to the next.2 We can account for this apparent discrepancy by considering the tests that Kuno and Takami employ in their theories. These tests arguably test not for actual topic or focus status but for topic potential and default or natural information focus (approximately the same as what Erteschik-Schir and Lappin term potential dominance). For example, Kuno tests for topical status by prefixing the formula Speaking of X to a sentence and replacing the original occurrence of X by a pronoun. If the result is acceptable, then X should be construable as the topic of the original sentence. It would be inappropriate to conclude that it is always the topic in all contexts. Similarly, Takami tests whether a phrase counts as "new" information by constructing a contrastive or interrogative sentence based on the original. From John ate an apple, for instance, Takami would construct John did not eat an apple, but an orange. Such tests indicate that the NP an apple is naturally construed as the information focus, not that it is the information focus in all contexts. In short, there is very strong evidence that the extracted NP is a potential topic, regardless of its actual discourse function (after all, while w/i-items may be potential topics, they typically function in discourse as foci). There is equally strong evidence that the extraction site is what we shall term a natural information focus—that is, a phrase which in the default case is likely to be construed as the focus of the sentence. The possibility of extraction seems to depend, in other words, not upon discourse function per se but upon an underlying capacity to attract attention. (For a programmatic statement of another view, based on a pragmatic interpretation of island constraints in terms of domains of illocutionary force and assertability, see Van Valin 1986).
34
Island constraints as evidence for a cognitive theory of grammar
An attentional theory might well subsume an account like Chomsky (1986), in which maximal projections are syntactic barriers unless lexically marked as arguments. Lexically marked arguments arguably form the only syntactic configuration automatically to meet the dual requirements of the attentional theory sketched above. Arguments arguably constitute potential topics on semantic grounds; simultaneously, lexically marked elements are automatically in the post-head position reserved for natural information foci. In other words, if the attentional theory is correct, this syntactic configuration may properly be characterized as an important special case derived by isolating the effects of certain syntactic variables which happen to exert a strong influence on the distribution of attention.
1.4. Attention and extraction 1.4.1. Psycholinguistic background At this point it will be useful to relate the issues we have been discussing to psycholinguistic accounts of attention. The requisite concepts include activation and salience, spreading activation, focus, and entrenchment, (cf. Anderson 1983:182-189; Langacker 1987: 59-60). Activation is a scalar measure of the amount of attention a concept receives. It can vary from an inactive level where the concept is not accessible for processing to a highly active or salient level. As a concept increases in activation, it is easier to use and can be accessed more quickly. Salient concepts are thus immediately available for processing at little or no processing cost. There are two basic routes by which a concept may be activated: spreading activation and focusing. These are fundamentally different processes. Spreading activation is the process by which one concept facilitates the recall of another, associated concept (cf. Collins—Quillian 1969 for the seminal article and Anderson 1983: 86-125 for a literature review). It involves the following assumptions: (i) that human long-term memory can reasonably be viewed as a network of concepts, with each concept occupying a specific and stable position relative to other concepts; (ii) that the activation level of each concept is at least partially—and automatically—a function of the activation of its neighbors in the net. For example, the concept ROOF would be linked to the concept BUILDING; as a result, activation would spread from ROOF to BUILDING, facilitating recall of the latter when the former was already active. Certain basic effects follow, depending on whether activation converges from several sources or diverges from a single source. The following diagram may be used to explain these effects:
Attention and extraction
35
(98) Β Χ I
Π
Ε
F
Consider first what would happen if A, B, C, D and Ε were all active. Since each is linked to X, and since each will increase X's activation level by some increment, we can predict that X will be highly active. In what follows, this pattern shall be termed convergent activation. Convergent activation will occur whenever a concept is central to an active conceptual network. On the other hand, suppose that X were salient and A through Ε were inactive. Spreading activation theory treats activation as a limited resource. That is, there is an automatic competition which goes into effect when activation can spread in more than one direction. Even though X may be highly salient, the activation it can spread will be divided among its neighbors, which means that it cannot significantly increase the activation level of any one concept. In what follows, this pattern shall be termed divergent activation. Cognitive focus represents a very different route to salience. It is essentially a selective control mechanism. That is, any concept may be selected for cognitive focus and while it is in focus it is automatically salient. Focus is, in other words, the center of attention, the cognitive correlate of visual focus and other orienting behaviors. Focus is, however, a very limited resource. It defines the limits of short-term memory: and, as is well known, no more than five to seven items may be retained in short term memory at a time. Thus there is an upper limit to the assignment of focus: under conditions of maximum concentration, no more than five to seven items may be in focus; for automatic processing, it is presumably much lower—perhaps not above one item at a time. Focus of attention tends to be attracted to the new, the different, and the unfamiliar since these are most likely to require extensive processing. Entrenchment is the term used by Langacker (1987, 1: 59), corresponding to the category of concept strength in Anderson (1983: 182). This is yet another variable which affects attention; it measures the familiarity of a concept, that is, the frequency with which it has been used successfully. The more entrenched a concept becomes, the easier it is to recall. Anderson treats entrenchment (strength) as a multiplier which interacts with spreading activation: a highly entrenched concept will become salient rapidly with a small input of spreading activation, whereas a poorly entrenched concept might only achieve salience through being placed in focus.
36
Island constraints as evidence for a cognitive theory of grammar
1.42. Topic and focus as attentional states In a theory of the type we have just sketched, few concepts are automatically salient. There are, however, two configurations whose chances of achieving salience are distinctly above average. First, consider concepts which are conceptually central and highly entrenched. Because they are entrenched, they activate readily; because they are conceptually central—i.e., linked to a variety of other concepts—they benefit from convergent activation. Such concepts are very likely to achieve salience through spreading activation. On the other hand, consider prototypically "new" information, that is, information that is neither entrenched nor integrated into current knowledge. Such knowledge can only be processed by focus assignment; thus, if the information it contains is important enough to process, it will be placed in focus and will therefore achieve salience. There are strong reasons to connect these possibilities to the functional concepts of topic and focus. A topic is conceptually central and relatively entrenched more or less by definition: it recurs throughout the discourse, with other information relating to it. Conversely, there is every reason to analyze linguistic foci as also being cognitive foci: they are elements of the sentence singled out for special attention, often because the information they provide is new. Thus, consider (99): (99)
(99) represents a typical pattern in discourse: the topic is mentioned in several sentences and rapidly becomes given. However, each sentence contains a different comment, typically providing new information. Given this structure, the topic benefits from convergent activation, and is therefore likely to remain active no matter where focus is placed. Each comment is far more isolated. Divergent activation minimizes its chances of achieving salience through spreading activation, so that it must be placed in focus to be processed efficiently.
Attention and extraction
37
These considerations suggest that there is an association between functional and cognitive status. A topical concept may be analyzed as a concept whose salience is due to spreading activation. A focal concept may be analyzed as a concept whose salience is due not to spreading activation but to the assignment of cognitive focus. Finally, a concept's potential to function as topic or focus arguably reflects underlying cognitive properties. Strong potential topics should be entrenched and conceptually central; natural information foci should represent highly relevant information which would not ordinarily achieve salience through spreading activation. The considerations we have adduced clarify the link between extraction and functional status. The present theory allows syntactic processing (such as the processing of extraction) to interact with the processing of discourse function without either making direct reference to the other. The relation can be mediated by attention—one of the prototypical general faculties of mind. As a result, constraints on extraction may very well reduce to attentional constraints on parsing. In theories of parsing, the extracted phrase is generally placed in a stack where it remains until it can be integrated with the rest of the sentence. Very often it cannot be integrated with the rest of the sentence until late in the parse. This fact suggests that extraction will proceed most efficiently if the extracted phrase possesses cognitive properties which minimize the cost of keeping it active in memory. If the theory presented above is correct, these are precisely the properties which characterize potential topics. Conversely, the extraction site could occur just about anywhere in the sentence and so might not be identified with assurance until the end of the matrix clause. At that point it would be very inefficient to have to search the entire tree for an extraction site. Extraction structures could be parsed most efficiently if the extraction site were located in a sentence position which "stood out" from the rest of the sentence because it attracted attention automatically. But this is precisely the property we have attributed to natural information foci, for we have defined them as constituents which automatically attract the focus of attention because of the importance of the information they represent. The present view entails an account in which island constraints are not a consequence of syntactic structure per se, except to the extent that syntactic configurations channel the distribution of attention (cf. section 1.5). Let us examine some of the consequences. Standard syntactic theories treat long-range extraction as the result of COMP-to-COMP movement. On the present theory, there need be no intermediate steps in the syntax. Instead, it is possible to analyze long-range extraction as involving recursive definition of the natural information focus. For example, consider a sentence like the following:
38 (100)
Island constraints as evidence for a cognitive theory of grammar
Who did you say that John thinks that Susan will marry?
One may reasonably argue that the subordinate clause [that John thinks that Susan will marry t] is the most informative part of its matrix clause. Similar considerations then apply within this clause to its subordinate clause [that Susan will marry t]. In this particular case, the distribution of information parallels the syntactic structure; in an exceptional sentence, such as with a manner-of-speaking verb, it does not. Similar considerations apply to standard island configurations, such as sentential subjects (whose syntactic position is also a default topic position) and to relative clauses (which represent presupposed information inside a semantic barrier). 1.4 J. An attentional account of exceptional extraction The evidence we have surveyed provides what appear to be strong arguments against an autonomous syntactic account of the island constraints. We are therefore nearly ready to examine the second fundamental issue—syntactic modularity. Before we do so, however, it is important to examine how the present theory applies to the exceptional patterns of extraction discussed in section 1.2. If the present theory is correct, exceptional extraction patterns should arise when special factors alter the normal distribution of attention. The effects of attribution. The attribution effects discussed in section 1.2.1 display the expected patterns. That is, the object of a PP within NP is a potential topic only when it can be extracted. Thus, consider the following contrasts: (101) a. Speaking of the newspaper, the editor exercises strict editorial control over it. b. * Speaking of editorial control, the editor wants a newspaper with it. (102) a. Speaking of the apartment, do you have the security keys to it? b. * Speaking of your security keys, which is the apartment with them? (103) a. Speaking of our reserve divisions, do you know their secret locations? b. *Speaking of secret locations, do you have reserve divisions with them? (104) a. Speaking of popular opinion, I am impressed with its unprecedented range. b. *Speaking of an unprecedented range, I want to discuss an opinion of it (*its opinion).
Attention and extraction
39
(105) a. Speaking of his actions, I was shocked at their incredible treachery. b. * Speaking of treachery, he committed an act of it (*its act). (106) a. Speaking of our sugar cane, let us maintain its high quality. b. * Speaking of high quality, you should buy sugar cane of it (*its sugar cane). In each case, the (a) sentence corresponds to earlier sentences with exceptional extraction; the (b) sentence, to the inverted patterns where extraction was ruled out. Why should attribution affect potential topic status? The present theory offers a simple explanation: (i) Potential topics are salient by virtue of spreading activation. (ii) Activation spreads between related concepts. (iii) Attributes and their cognitive domains are related concepts by definition. Thus the asymmetry between the (a) and (b) sentences may be inferred to represent an asymmetry in activation spread. Deane (1991) offers the following principle: (107)
Attribute/Domain Salience Principle A. If an attribute of a domain is salient, the domain is also salient. B. If a domain is salient, its attributes are not thereby rendered salient.
(107) arguably follows from the properties of spreading activation. To begin with, an attribute is by definition an attribute of but a single domain, so that activation which spreads from an attribute to its domain will not be diluted by competition with other, similar concepts. On the other hand, concepts possess multiple attributes, so that activation spread from a concept to its attributes is necessarily distributed among them. If (107) is correct, it motivates exceptional extraction from NP as follows: the matrix NP is a natural information focus, and is therefore salient. However, it denotes an attribute of the concept named by the embedded NP. Activation therefore spreads to the embedded NP, making it salient, and thereby qualifying it for extraction. (107) predicts that there is a general asymmetry between attributes and their domains, which should show up in other phenomena, and not just extraction. As a matter of fact, such asymmetries have been noted in at least
40
Island constraints as evidence for a cognitive theory of grammar
two purely pragmatic phenomena: metonymy and anaphoric peninsula phenomena, as discussed in Deane (1988a, 1991). Metonymy occurs when one concept stands for another, associated concept (cf. Borkin 1972; Norrick 1981; Nunberg 1978, 1980). Deane (1988b) offers a theory of metonymy in which metonymy depends on spreading activation. Following up this idea, Deane (1991) argues that the theory of reference entails the following principle: (108)
An entity qualifies as a potential referent of a NP only if it is salient at the time reference is being determined.
(108) entails the possibility of metonymy whenever one concept renders another concept salient by spreading activation. The present theory is therefore confirmed by the existence of attribution effects on metonymic reference. Consider the following examples from Deane (1991): (109) a. The ten million dollar inheritance just walked in. b. *The heiress is being held at Prudential-Bache Securities. (110) a. The biggest engine just won the race and is having its tires changed. b. *The car is two feet wide and three feet long, [meaning the engine] (111) a. The blue jerseys are winning. b. *The team needs to be soaked in bleach. [meaning the team's uniforms] (112) a. Your table looks delicious today. b. *Dinner is made of wood. [Meaning the table] These examples display the predicted patterns: metonymy proceeds readily from attribute to domain, but not readily in the other direction. Similar asymmetries have been observed with so-called anaphoric peninsula effects (cf. Postal 1969 [1988]; Corum 1973; Sproat-Ward 1987). Normally, words are anaphoric islands, meaning that a pronoun cannot refer back to lexically specified antecedents which are not also independent words. Exceptions to this generalization are termed anaphoric peninsulas. Sproat and Ward argue that anaphoric peninsulas are governed by attentional considerations. For example, they argue that in sentences like (107) the pronoun is able to refer back to the inferred antecedent because it is conceptually salient despite being syntactically unexpressed:
Attention and extraction
41
(113) a. When Little Johnny threw up, was there any pencil eraser in it? b. Tom dreams a lot, but seldom remembers them. That is, the phrasal verb throw up saliently evokes the concept 'vomit' in (107a), just as the verb dream saliently evokes the corresponding nominal concept. If Sproat and Ward's analysis is correct, we would expect to observe anaphoric peninsulas motivated by attribution. And, as Deane (1991) observes, such effects are present: (114) a. I saw headlights coming straight at me, but I was able to get out of its [=the car's] way. b. *As a car came at me, I noticed they [ = its headlights] were bright. In (114) and in a variety of other examples, attribute nouns facilitate inferred anaphoric reference. Thus if we examine sentences like the following without supporting context, the effects of attribution are quite clear: (115) a. ?Did you know that large inheritances don't always make them [=the inheritors] rich? b. *The heiress wanted to give it [=the inheritance] away. (116) a. The engine block is cracked, so you will never drive it [=the car] again. b. *When I drove the car, I didn't realize it [=the engine] was a twocylinder with an aluminum block. (117) a. We're going to have to soak the jerseys in Clorox. They [=the players] always get them so dirty. b. *The team came in from the big game. They [=their jerseys] would have to be soaked in Clorox. The present account is strongly confirmed by such data. If attribution effects are a consequence of the distribution of attention, then we would expect to find parallel asymmetries in other attentional phenomena. Semantic frames and the light verb construction. The theory also appears to account for the effects of semantic frames and its connection with the light verb construction. Light verbs are referentially general and high in frequency—properties which, as Kluender (1990) points out, make them very easy to extract over. They are also so abstract as to evoke no framing information of their own. This is where semantic framing comes in, for a verb
42
Island constraints as evidence for a cognitive theory of grammar
which recapitulates information from a semantic frame is by definition given. Consider sentences like (118): (118) a. He made the suggestion that we join the Artists Guild. b. Which guild did he make the suggestion that we join? The verb make provides information independently present in our knowledge about suggestions. It is given. And this is exactly what we need for extraction to proceed: the bridging structure (in this case, the light verb) should be as inconspicuous as possible. We do not need a special syntactic analysis of light verbs: their special properties under extraction follow directly from the theory. After all, light verbs display nearly ideal properties for a bridging structure. High-frequency, from a small class of verbs which often function as closed-class items, referentially nonspecific, containing almost no information, and in fact presenting information which is predictable from the rest of the sentence: such a verb is by definition inconspicuous and thus unlikely to compete with the extracted phrase and the extraction site for attention. Such properties are less clearly present in a sentence like (119): (119) a. He retracted the suggestion that we join the Artist's Guild. b. ?Which guild did he retract the suggestion that we join. Retraction is certainly one thing one can do with a suggestion, but such uses are, as observed above in section 1.2.2, less central to the frame, less predictable than the actual illocutionary act. The verb is also (precisely because it is more informative) less frequent and more referentially specific than a true light verb. The decrease in acceptability of extraction is therefore in line with the theory. Finally, extraction is least acceptable with sentences like (120), where the verb is even less predictable: (120) a. He photocopied the suggestion that we join the Artisfs Guild. b. *Which guild did he photocopy the suggestion that we join? The verb photocopy is even less capable of receding into the background than the verb retract: it is referentially more specific, and occurs even less frequently, facts which follow directly from the very specific framing information that it evokes. Exceptions to across-the-board extraction. In order to deal with exceptional extraction from coordinate structures it is necessary first to account for the
Attention and extraction
43
existence of Across-the-Board extraction. Why does extraction from coordinate structures have to occur in parallel from each conjunct? Deane (1991) argues that the explanation has to do with the essential parallelism of coordinate structures. Coordinate phrases are supposed to be parallel syntactically. They also appear to be parallel conceptually. One reflex of this parallelism is a strong presumption that conjuncts are parallel in importance. Linguistic evidence for this point can be found in stress assignment, where conjuncts receive parallel emphasis: (121)
The nurse polished her TROMBONE and the plumber computed my TAX.
(122)
The NURSE polished her trombone and the PLUMBER computed my tax.
If coordinate structures are supposed to be parallel in emphasis, interesting consequences follow. On the present theory, the extraction site must be a natural information focus. If so, extraction from one conjunct but not the other breaks parallelism: one conjunct, but not the other, is treated as having special importance. Deane (1991) argues that Across-the-Board extraction can be explained on this basis. The idea is simple: Across-the-Board extraction is the only way extraction can occur from an ordinary coordinate structure, because it is the only way in which each conjunct can be treated as being parallel in emphasis. In fact, stress assignment provides confirming evidence: when extraction does occur, both conjuncts receive equal, and parallel stress on the phrases functioning as extraction sites: (123)
Who did John KISS and Mary SPANK?
Further confirmation may be seen with exceptional extraction, for when extraction does not proceed across the board, it is the conjuncts undergoing extraction which receive main stress: (124) a. b. c. d.
Who did you pick up a paddle and SPANK? Who did you stand in the parlor and tell JOKES about? Which problem did he get bored and give UP on? What did you TALK ABOUT all night, take a shower, and then have to LECTURE ON at your 8 a.m. class? e. How much can you DRINK and still stay sober? f. Thafs the stuff the guys in the CAUCASUS drink and live to be 100.
44
Island constraints as evidence for a cognitive theory of grammar
Tests for dominance also confirm the pattern: many conjunct types cannot be dominant. To be precise, those which resist extraction fail the Lie Test, i.e., preparatory events, scene-setters, and incidental events: (125)
Bill said: Here's the cheese I went to the store and bought. *Which is a lie; he didn't go to the store. Which is a lie; he didn't buy any.
(126)
Bill said: Sue's the one I stood in the parlor and told jokes about. *Which is a lie; he didn't stand in the parlor. Which is a lie; he didn't tell jokes about her.
(127)
Bill said: This is the job that I worked on all morning, took a lunch break, and finished off by 2 p.m. *Which is a lie; he didn't take a lunch break. Which is a lie; he didn't work on it all morning. Which is a lie; he didn't finish it off by 2 p.m.
On the other hand, those which allow extraction optionally (internal cause, deviation from expectation, and result conjuncts) pass the Lie Test: (128)
Bill said: This is the problem I got bored and gave up on. Which is a lie—he didn't get bored (he just has low self-esteem). Which is a lie—he didn't give up on the problem.
(129)
Bill said: Seven courses is the most I have taken and still remained sane enough to carry on a coherent conversation. Which is a lie—he didn't take seven courses. Which is a lie—he didn't remain that sane.
(130)
Bill said: This is the machine gun you can shoot off and kill a thousand men a minute. Which is a lie—you can't shoot it off (its a fake). Which is a lie—it won't kill a thousand men a minute.
This is essentially what we would expect if extraction is attentionally based. Where a conjunct cannot be the natural information focus, extraction fails; where it can be the information focus but need not be, extraction is optional; where conjuncts are parallel and hence may reasonably be analyzed as parallel information foci, extraction applies across the board.
An alternative to stria modularity
45
1.5. An alternative to strict modularity Thus far the argument has focused on evidence against the strict autonomy of syntax. We have argued at length that island constraints can only be explained by appealing to a general theory of attention. The resulting theory raises serious difficulties for an autonomous view of syntax, for if as central a phenomenon as island constraints cannot be handled in an autonomous syntactic theory, there are serious reasons to question the entire approach. Similar points apply to the thesis of syntactic modularity. Thus far, the evidence that has been adduced does not refute modularity per se, though if not rebutted it arguably may force the conclusion that island constraints are not part of the syntactic module. However, a closer look at the data suggests the possibility of another approach, an alternative to the modularity thesis as it is usually understood. The theory of extraction presented above leaves one very important point unaddressed: syntactic localism. Even if long-range extraction can occur for reasons that transcend syntactic topography, the fact remains that syntactic processes usually operate within limited syntactic domains. Why should this be? If we reject a strictly modular view of syntactic structure, an explanation readily presents itself. Let us assume for the moment that syntactic structures are nothing more or less than conceptual structures, and subject to the same constraints. If this is true, then syntactic structures should be subject to normal attentional processes, including activation spread. An interesting parallelism emerges. Activation spread is local by definition: each concept is capable only of stimulating its immediate neighbors. The speed and efficiency of cognitive processing depends entirely on activation level; thus, to the extent that activation level depends on spreading activation, we can anticipate that cognitive processes will apply within local domains defined by spreading activation. If syntactic structures are conceptual structures, the same inference applies: syntactic operations should normally apply within local syntactic domains defined by spreading activation. Let us consider the implications. If constituency structures are conceptual structures, what kind of concepts are they? Presumably partwhole structures of some kind. If this is true, we would expect constituency structures to display patterns of activation spread characteristic of part-whole relationships. The proposed theory is certainly not modular in the Chomskyan sense, for syntactic structures would not exist within a specifically grammatical faculty of mind. They would be part of whatever mode(s) of thought underlie the processing of part-whole relationships. Some sort of modular structure might exist, but it would not be specific to language.
46
Island constraints as evidence for a cognitive theory of grammar
15.1. Grammar as metaphor? The hypothesis we have just discussed has in fact been advanced elsewhere. One version is advocated by Lakoff (1987: 283), who terms it the Spatialization of Form Hypothesis. According to this hypothesis, grammatical structure is understood as a metaphoric extension of basic spatial schemas. Specifically: (i) (ii)
Constituency relationships are understood as part-whole relationships. Grammatical relations like subject and object are understood in terms of the linkage relationships which unite the parts of an object into an integrated whole.
We can in fact argue for a further extension. People typically perceive physical objects as having not only a part-whole and a linkage structure but also as being oriented in terms of center and periphery. A prototypical object possesses a central, or core part and several peripheral parts. For example, the torso is the core part of the human body, whereas the head, arms and legs are peripheral. The hand, similarly, has a core part to which are attached the peripheral parts (fingers and thumb). It is at least plausible to suggest that phrases have a similar center-periphery orientation: that the head of the phrase is the core part, with adjuncts attached around the periphery. The value of the hypothesis outlined above depends of course on its predictive value. At this point, it is important to note that the general approach has clear predictive potential. If syntactic structure is in some sense derivative from spatial form, then we should be able to predict the properties of grammar from independently motivated properties of spatial cognition. In particular, the behavior of syntactic structure under spreading activation ought to be predictable from the behavior of the corresponding spatial schemas. This question can easily be addressed, for we have already explored the behavior of part-whole relations in our examination of attribution effects. At this point, however, it is important to clarify certain fundamental questions about the Spatialization of Form Hypothesis. There are at least two ways in which it can be interpreted. In one interpretation, it can be read as asserting that grammatical processing involves an explicit spatial metaphor. According to this view, spatial concepts are activated whenever grammatical structure is processed. For example, it would be asserted that representations of physical part-whole relations are activated as an analogical basis for the apprehension of constituency. Ronald Langacker, in a review of Lakoff (1987), has pointed out that there are serious difficulties with this version of the Spatialization of Form
An alternative to strict modularity
47
Hypothesis. For example, there is much evidence that complex motor sequences have a hierarchical structure—a kind of constituency. Would we wish therefore to assume that action sequences must be processed in spatial terms? There is, in other words, a large gap between demonstrating that a given pattern can be understood in spatial terms and establishing that it is actually processed as if it were a spatial concept. In addition, many of the properties we would like to predict, such as spreading activation, are founded upon low-level neural activities, making it difficult to see why such processes would necessarily require any explicit reference to spatial structure per se The present work is founded on a very different interpretation of the Spatialization of Form Hypothesis. According to this view, there is an implicit metaphoric mapping involved in syntactic cognition. As we shall see in Chapter Six, there is strong evidence that certain parts of the brain are dedicated to the processing of abstract spatial schemata. However, brain regions by themselves are simply neural processing mechanisms, and are spatial only because they are integrated parts of a system which processes spatial information. The very same neural structures, connected differently, might serve equally well to process abstract concepts. It will be argued that human linguistic abilities depend upon the processing of linguistic information by brain structures whose primary function is the processing of spatial structure. Notice that the present view implies that grammatical knowledge need have no direct connection to spatial knowledge. At the same time though it postulates a direct isomorphism between spatial and grammatical knowledge. The two types of knowledge are processed by the same specialized neural mechanisms, and so will share a common representational format. Such a common format entails an implicit analogy between grammatical and spatial knowledge, creating the possibility of explicit metaphors which exploit the implicit parallels between the two types of knowledge. Moreover, the presence of common innate processing mechanisms would entail that we can infer properties of grammatical cognition from corresponding properties of spatial cognition. On this view, therefore, we should expect parallels between grammar and spatial form only at an abstract level likely to reflect innate conceptual structure. Such basic schemata as OBJECT, PART, CENTER, or LINK are likely enough to be innately represented in the neural architecture of the brain. By contrast, we should not expect detailed metaphoric mappings elaborating grammatical structure in terms of specific spatial concepts. Such metaphors would be explicit metaphors, not the implicit type postulated in the present theory. It should be noted that the present theory is modular in one sense: it postulates an innate mental module localized in a specific brain region which
48
Island constraints as evidence for a cognitive theory of grammar
processes information in terms of such concepts as part-whole structure and linkage. Such a theory is not modular in Fodor's sense, in that it is not domain specific: there is, in particular, no mental module innately specified for the processing of grammatical structure alone.
152. C-command and syntactically channeled spreading activation Let us now examine some of the possibilities inherent in such a line of analysis. How, for instance, would activation spread operate within a constituency structure? To address this issue we shall address the following questions: (i) how does activation spread within a part-whole structure? (ii) how does activation spread within a linkage structure; (iii) how does activation spread within a center-periphery structure? Answers to these questions should yield precise patterns of activation spread. If the Spatialization of Form Hypothesis is correct, these patterns should define the normal limits of activation within a syntactic structure. We have already partially addressed the question of activation spread in a part-whole structure, for part-whole relations are one of the attribute-domain relationships discussed in previous sections. We have observed that part nouns facilitate exceptional extraction, function readily as metonyms standing for the whole, and function as anaphoric peninsulas. In each case, the generalization is that expressed by the attribute/domain salience principle (107). Deane (1988a, 1991) proposes to address the question by constructing charts which present a simplified model of patterns of activation spread. These simplify the description of activation by postulating only three levels—salient, active, and inactive. There are two basic patterns: (131)
Activation of attribute salient active inactive
Minimal activation of domain salient active inactive
(132)
Activation of domain salient active inactive
Minimal activation of attribute active inactive inactive
An alternative to strict modularity
These are, in effect, a more detailed version of the Attribute-Domain Salience Principle. (131) claims that a conceptual domain cannot be less active than its attributes, guaranteeing that point A of the Attribute/Domain Salience Principle will apply. In the opposite direction, activation cannot spread so strongly, but since the two concepts are linked, some activation spread will occur. (132) claims that attributes are normally less salient than their conceptual domain, but are activated by normal spreading activation when the domain is salient. It conforms with the observation that semantic priming effects—that is, the facilitating effects of spreading activation—tend to be strictly local. That is, a primed concept is not usually active enough to prime a third concept (cf. De Groot 1983). Since the part-whole relationship is a type of attribution, the present theory predicts that activation spread will conform to (131) and (132). Activation should spread strongly from part to whole (that is, upwards in a tree diagram)—but only weakly from whole to part (that is, downwards in a tree diagram). The result is a pattern which bears a striking similarity to ccommand as defined in Reinhart (1983:23). Consider the following diagram: (133) s
NP,
VP
V
np2
Suppose that NP2 is salient, perhaps because it is a pronoun whose antecedent must be recovered. Using charts (131) and (132), we may calculate the following activation values: VP—salient by activation spread from NP2 (chart 131) V—active by activation spread from VP (chart 132) S—salient by activation spread from VP (chart 131) NPj—active by activation spread from S (chart 132) In other words, NPj is active whenever NP2 is salient. Now consider the converse situation. Suppose that NPj is salient. The following activation values can be calculated:
49
50
Island constraints as evidence for a cognitive theory of grammar
S—salient by activation spread from NPj (chart 131). VP—active by activation spread from S (chart 132) V—inactive; no activation spread from VP (chart 132) NP2—inactive; no activation spread from VP (chart 132) In other words there is an asymmetry in spreading activation which yields results exactly parallel to c-command. Activation spreads indefinitely upward in a constituent structure but only one level downward; thus, subjects are active (and hence accessible for processing) whenever the objects are salient, whereas objects are inactive and hence inaccessible when subjects are salient. These considerations can be refined by considering next the semantic relationship of linkage. Linkage will be discussed in detail in Chapter Two: for now, it is enough to observe that we can infer its properties under spreading activation much as we did for part-whole relationships. That is, we may observe the behavior of physical linkage; by hypothesis, grammatical linkage should display the same pattern. The critical observation to note is that linkage does not automatically yield exceptional extraction, metonymy, or anaphoric peninsula effects. The head, for example, is linked to the neck, yet sentences like the following simply do not work: (134) a. *Whose neck should this be a head on? b. *Whose head was this the neck to? Metonymie patterns work no better: (135) a. *He has a stiff head [meaning his neck]. b. *He has a swelled neck [meaning his head]. Nor do we get anaphoric peninsula effects when sentences like the following occur without prior context: (136) a. *He has a long neck, so it [=his head] is high in the air. b. *His head is very big, so it [=his neck] has a heavy load to bear. In short, there is no evidence of the powerful activation patterns summarized in chart (132). On the other hand, linkage—that is, the association between contiguous and connected objects—is certainly a genuine semantic relationship. As a result we do find occasional metonymies based on linkage, such as the metonymy which serves as the diachronic source for such terms as cheek (originally meaning 'jaw') or breast (originally meaning 'chest') It is thus reasonable to assume that linkage involves the weak spreading activation pattern expressed in chart (132).
An alternative to strict modularity
51
Finally, let us examine the status of core parts. Core parts have a distinct pattern of behavior under spreading activation. They appear to be salient whenever the whole is salient, and in fact seldom are linguistically distinguished from the whole, as if whole objects and their core part are mentally "fused" into a single compound concept (cf. Deane 1988b). In particular, note their behavior in metonymy. As the following examples illustrate, it is normal to use the term for the whole to stand metonymically for the core part: (137) a. We lifted the body off the ground. b. After the monster ate the extremities, it put the body in a cave. c. His bodys pretty enough even though the rest of him is downright ugly. (138) a. They cut off the head. b. After the face and jaws were eaten, the head rolled off into a corner. c. His head was normal enough, but when he turned around his face gave you nightmares. (139) a. He picked up the fork. b. After removing the handle, he put the fork in the dishwasher. c. The fork itself looks normal, but it is attached to a fancy handle. Deane (1991) argues from this data that the following chart applies to activation spread from whole to core part: (140)
Activation of whole salient active inactive
Minimal activation of core part salient active inactive
That is, the core part is no less active than the whole object. The hypothesis expressed in (140) may be supported by another line of reasoning which observes that core parts are likely to benefit from convergent activation. A core part is the part to which other parts are linked. Convergent activation spread from each of the peripheral parts should guarantee that the core part is the most active element in the configuration next to the whole object itself. Let us now observe how the hypotheses proposed above work out when applied to typical syntactic structures. Before we do so, it is worthwhile to point out conventions which will be followed throughout this work. First, constituency relations are represented using double lines; second, the core
52
Island constraints as evidence for a cognitive theory of grammar
parts, or heads, are boxed off with double lines. Thus consider the following diagram: (141) s'
While (135) represents the hypothesis that COMP is the head of S' and INFL of S, it need not concern us here whether that hypothesis is correct. For now we need merely observe the patterns of spreading activation which hold within such a structure. The proposed theory yields the following activation patterns: (i) (ii)
if COMP or S' is salient: S' is salient, S and INFL are active. If NPj or VP is salient, then: S, INFL, S' and COMP are salient; VP and V are active. (iii) If S or INFL is salient, then: S, S' and COMP are salient; NP^ VP and V are active. (iv) If VP or V is salient, then: INFL, S, S' and COMP are salient; NPX and COMP are active; PP and Ρ are active. (v) If Ρ or PP is salient, then: V, VP, INFL, S, COMP, S' are salient, NP t and NP 2 are active. (vi) If NP2 is salient, then: P, PP, V, VP, INFL, S, COMP, S' are salient; NP t is active. The pattern that we yield is very interesting, on two counts. Not only does it continue to show the c-command effects discussed previously, but it also places severe limits on access to internal constituents. The limits are actually
An alternative to strict modularity
more severe than subjacency: from COMP position, for example, the lowest accessible constituent is INFL. The general prediction is that activation spreads no deeper than the head of a sister, and no further laterally than the head of the daughter of a dominating node. This result does not predict that deeper constituents may not be accessed, only that they cannot be activated by syntactically channeled activation spread. Other possibilities, such as extraction of a u>/i-phrase, would require the intervention of other factors such as those discussed in preceding sections. At this point, it remains to be proven whether the approach we have just discussed will prove adequate when it is elaborated and applied to the details of syntactic analysis. However, we have elucidated at least one way in which the Spatialization of Form Hypothesis possesses potential for interesting and detailed predictions. In short, the considerations we have adduced certainly justify exploring the hypothesis in further detail.
53
Chapter two An integrated cognitive theory
The considerations adduced thus far suffice to establish that it is both plausible and appropriate to seek a cognitive account of syntax based upon the Spatialization of Form Hypothesis. We are far, of course, from fully motivating such a theory, since that goal requires grounding. A cognitive syntax presupposes a general cognitive theory articulated in enough detail to permit its application to specific cognitive domains like syntax. Such a theory will by its very nature embody a broad range of controversial or only partially motivated hypotheses, and will no doubt require modification as research proceeds. However, it will serve the invaluable function of providing a framework within which more specific issues can be formulated and explored. Once such a theory has been articulated, it will be possible to explore specific syntactic applications. The theory will be motivated to the extent that it is capable of providing insightful and otherwise unavailable explanations for seemingly arbitrary phenomena. Later chapters will therefore examine a variety of syntactic phenomena, returning to island constraints by the end of Chapter Five. We may now address the first goal: articulating the necessary analytical framework. Such a theory can be formulated by integrating several recent proposals. Anderson (1983) provides a useful model of the overall architecture of cognition. George Lakoff and Mark Johnson provide an account of the grounding of abstract knowledge in bodily experience. Finally, Sperber and Wilson (1986) present a theory of relevance whose leading ideas can be adapted as part of a general model of salience and attention. Once the overall cognitive model is in place, the remaining chapters will apply it to syntactic analysis, and (more specifically) to the analysis of island constraints.
2.1. Cognitive architecture The following diagram summarizes John Anderson's theory of cognition. Declarative memory contains information about the world. Production memory contains productions: condition-action pairs which specify alterations to the content of working memory. Working memory contains perceptual inputs, outputs from productions, and facts retrieved from declarative memory (Anderson 1983:19):
56
An integrated cognitive theory
DECLAR
storage
WORKING
MEMORY
match PRODUCTION
ATIVE
MEMORY MEMORY retrieval
execution
->
encoding
performances
application V
OUTSIDE WORLD
Figure 1. Reprinted by permission of the publishers from The Architecture of Cognition by John Anderson, Cambridge, Mass.: Harvard University Press. Copyright (c) 1983 by the President and Fellows of Harvard University.
As Anderson (1983: 20) points out:
By itself, this general framework does not constitute a theory. A predictive theory must specify the following matters: 1. The representational properties of the knowledge structures that reside in working memory and their functional consequences. 2. The nature of the storage process. 3. The nature of the retrieval process. 4. The nature of production application, which breaks down to: a) the mechanism of pattern matching; b) the process that deposits the results of production actions in working memory, c) the learning mechanisms by which production application affects production memory.
Anderson therefore sets forth an array of specific hypotheses about each of these issues. The theory adopted below will use Anderson's theory as a point of departure, altering it where appropriate in order to incorporate insights from other theories. It is important, however, to note that the present account employs Anderson's theory more for its lucid presentation of key concepts from cognitive psychology than for its more specific and controversial claims. Thus the present work should not be viewed as an endorsement of Anderson's theory vis-a-vis any specific competitor.4
Knowledge representation
57
2.2. Knowledge representation 22.1. Temporal lists, spatial images, propositions, and kinesthetic representations Knowledge representation is an issue parallel in many ways to the use of special data structures in computer programming languages. In a computer, it is often possible for the same information to be represented in several ways: as the value of one or more numerical variables; as a list; as entries in an array, etc. Special representational modes are necessary because different kinds of information have different salient properties and commonly require different operations: e.g., multiplication of a number, searching a list, or matrix multiplication of the elements of a matrix. Anderson assumes that human cognition employs multiple representational modes, including temporal strings, spatial images, and abstract propositions. He also allows for the existence of a fourth, kinesthetic mode of representation. These types, he claims, differ along several parameters: a) the class of information directly represented; b) the information which may be extracted quickly with minimum effort; c) the conditions under which two representations match, or count as identical; d) the range of cognitive operations which may be applied to them. Temporal strings are the first type he postulates. They represent information about relative order, but not absolute duration, thus taking the form of ordered (potentially hierarchically structured) lists. For example, memory of events is claimed to take the form of temporal strings. Since that form of knowledge representation does not encode absolute duration, it is simple for the theory to account for the variable pace of subjective time, depending on whether an event is subdivided into a variety of objectively short episodes, or into a few monotonous, internally undivided time periods. Anderson claims that certain properties follow from the fact that temporal strings take the form of ordered lists. Specifically, he assumes that the simplest properties to extract from a list are relative order (e.g., does A come before D?), the identity of the first or last items in the string, and immediate consequence (what comes next?). If two temporal strings are matched, the matching process begins at the first item in each string, and compares the strings item by item from beginning to end. Operations on temporal strings typically involve the addition or deletion of items from the sequence. Thus, if the list [A Β C D] were being processed, it would be easy to determine that A was first, and D last, that A immediately preceded B, or that C came after A. It would be more difficult to determine that C was two items after A, or to go through the list backwards. Moreover, it would be easier to determine that the list [Ζ Β C D] was distinct from [A Β C D] than it would to determine the
58
An integrated cognitive theory
distinctness of [A Β C Z] and [ A B C D]. In fact, in terms of the similarity metric for temporal strings, [A Β C D] would actually resemble [A Β C Z] more strongly since the initial sequence was the same. Spatial images constitute the second mode of representation. These are images of the sort studied by R. N. Shepard (cf. Shepard—Cooper 1982). The information they encode is configural in nature: shapes, angles, and relative (but not absolute) distance. Two images match to the extent that they contain the same shapes in the same angular relations. The most interesting fact about spatial images, of course, is the variety of mental operations to which they are subject. As Shephard's research illustrates, people are able to enlarge them, shrink them, rotate them, move them across the mental "field of vision", even superimpose them (Cf. Gibson 1979's discussion of perceptual invariants). People thus have a surprising capacity to construct complex mental images, an ability exploited in the following passage from the autobiography of the British author C.S. Lewis: You are looking across what may be called, in a certain sense, the plain of Down, and seeing beyond it the Mourne Mountains. It was K.—that is, Cousin Quartus' second daughter, the Valkyrie—who first expounded to me what this plain of Down is really like. Here is the recipe for imagining it. Take a number of medium-sized potatoes and lay them down (one layer of them only) in a flat-bottomed tin basin. Now shake loose earth over them till the potatoes themselves but not the shape of them, is hidden; and of course the crevices between them will now be depressions of earth. Now magnify the whole thing till those crevices are large enough to conceal each its stream and its huddle of trees. And then, for coloring, change your brown earth into the checkered pattern of fields, always small fields (a couple of acres each), with all their normal variety of crop, grass, and plow. You have now got a picture of the "plain" of Down, which is a plain only in the sense that if you were a very large giant you would regard it as level but very ill to walk on—like cobbles. (Lewis 1956:155-156) Anderson's third representational type is propositional in nature; unlike the other two formats, it is extremely abstract. Propositions represent semantic information—i.e., information crucial to inference—and match to the extent that they support identical inferences. Propositions have internal predicateargument structure, which Anderson represents via labelled networks; he assumes that propositional operations are highly sensitive to the structure and connections thus defined. Anderson only mentions the possibility of kinesthetic representations, but it seems highly probable that they exist. Just as spatial perception is
Knowledge representation
59
organized around configural concepts like angle and shape, so also we analyze the movement of objects and living things in terms of forces, causes, actions and motivations. Talmy (1985) presents a useful picture of how such representations might function. According to Talmy, there are what he terms force-dynamic conceptualizations. The basic elements in such conceptualizations are: a) Two participants—an agonist, who exerts force, and an antagonist, an entity which exerts a counterforce of some sort; b) Intrinsic force tendencies of the agonist and the antagonist, which involve natural tendencies either to move or to remain at rest; c) the balance of strengths between agonist and antagonist; d) the result of the conflict, e.g., either motion or rest on the part of the agonist. In Talmas analysis, these basic elements underlie not only the conceptualization of motion but also apply to the conceptualization of psychological motives, which are, of course, intrinsically connected to the initiation of purposeful bodily action. Since kinesthetic representations are by definition concerned primarily with the movement of the body and its interaction with objects in the outside world, Talmas analysis provides a natural format for the formulation of a kinesthetic representational type.
222. The perceptual grounding of prepositional representations Anderson's account of knowledge representation seems to have a major weak point: its analysis of propositional representation. In Anderson's theory, propositional representations have no intrinsic connection to any other form of representation; they exist sui generis. And this is unlikely, for there is clearly reason to believe that many predicates are perceptually grounded. Some may be grounded in temporal representations (DURING), others in image representations (UP, TURN, ROUND), and yet others in kinesthetic representations (CAUSE, LET, DO). But the existence of some kind of grounding appears highly likely. This issue has been of central concern in cognitive approaches to linguistic theory, where a number of proposals have been advanced. We shall adopt the theory presented in Lakoff and Johnson (1980), Johnson (1987) and Lakoff (1987). According to this theory, propositional knowledge is grounded either directly or indirectly in temporal, imagistic, or kinesthetic representations. Basic, concrete predicates represent what Johnson (1987: 19) terms embodied schemata: representations of recurrent, structured pat-
60
An integrated cognitive theory
terns which emerge from bodily experience. More abstract concepts are grounded via conceptual metaphors, mappings which allow well-understood concepts to serve as models for the understanding of more difficult cognitive domains. Embodied schemata (image schemas). Embodied schemata differ crucially from classical propositions, for their properties derive in part from a perceptual substrate. However, they also behave like propositional representations, preserving inferential information and displaying predicate-argument structure. Johnson (1987) defines embodied schemata as "recurrent patterns of bodily experience"; as such, they are much like the sensorimotor schemas postulated in the theories of Jean Piaget (cf. Boden 1980). For example, Johnson points out that we have numerous experiences of CONTAINMENT: putting food in our mouths, being in a room, etc. These experiences display a recurrent structure in terms of which we may then comprehend new experiences. Moreover, they are useful abstractions since they provide access to a variety of useful inferences and appropriate action sequences. For example, Johnson discusses several inferences raised by the CONTAINMENT schema: (i)
containment usually protects the contained object from external forces; (ii) actions within a container are constrained by its boundaries; (iii) consequently, the location of a contained object is fixed by the location of its container; (iv) the container strongly affects one's ability to observe the contained object, depending on one's ability to see into the container; (v) and finally, containment is transitive: if A is in B, and Β is in C, then A is in C. Embodied schemata should be distinguished both from propositional knowledge, as it is usually understood, and from specific perceptual representations. Thus, we may note that embodied schemata resemble propositions in being abstractions. Certain similarities follow. Propositional representations have predicate-argument structure, which enables them to provide specific descriptions by the combination of semantically generic elements. For the same reason, Johnson analyzes embodied schemata as having a gestalt structure in which certain aspects of a schema are maximally schematic (and hence correspond to arguments) while others are relatively concrete and specific (and hence correspond to predicates).
Knowledge representation
61
This similarity is extremely important. Anderson assumes that prepositional representations are stored and recalled in terms of the network structure created by predicate-argument relations in semantic memory, and given the literature on semantic priming effects (c.f. Anderson 1983: Ch. 3; Marslen-Wilson 1975; Schvaneveldt—Meyer—Becker 1976; Swinney—Hakes 1976) it seems probable that we need to make similar assumptions about embodied schemata. In other words, the gestalt structure of a schema is its most salient aspect and thus is of the highest significance in determining whether two concepts are either similar or associated. Nonetheless, while propositional representations denote only via abstract truth conditions, embodied schemata are perceptually grounded, differing from actual percepts only by virtue of their schematicity and hence their integration with general knowledge of the world. Thus they undergo mental operations analogous to actual physical operations. For example, Johnson (1987: 24-27) cites research which indicates that image schemata (embodied schemata grounded in spatial experience) are capable of being rotated, moved, enlarged or shrunk in the "mind's eye". According to Anderson (1980) these operations are "abstract analogs" of physical motion, or to be precise, of the perceptual experiences which accompany physical motion. In short, embodied schemata are essentially dynamic, not passive molds into which experiences are poured, but patterns which emerge from the very process of thought, much as the shape of a fountain emerges from the flow of water which constitutes it. It is important at this point to note one way in which the present work may differ from Johnson's or LakofPs conception of image schemata. Johnson's discussion suggests an almost empiricist view of image schemata, in which their properties are based entirely on early experience. Since as Chapter Six will discuss, there appears to be evidence that specific brain regions are responsible for the perception of image schematic structure, it may be better to conceive of basic image schemata as having an innate basis. Johnson (1987: 126) lists a variety of basic embodied schemata (Figure 2). For our purposes, we will focus on only three of these, since they will be central to the grammatical analysis we develop: namely, the link, part-whole and center-periphery schemata. The link schema. Lakoff (1987: 274) describes the LINK schema in the following terms: Bodily experience: our first link is the umbilical cord. Throughout infancy and early childhood, we hold onto our parents and other things, either to secure our location or theirs. To secure the location of two things
62
An integrated cognitive theory
relative to one another, we use such things as string, rope, or other means of connection. Structural elements: two entities, A and B, and LINK connecting them. Basic logic: If A is linked to B, then A is constrained by, and dependent upon, B. Symmetry: If A is linked to B, then Β is linked to A. The last of these points is questionable: despite its initial plausibility, it seems doubtful that the link schema is basically symmetric. CONTAINER BLOCKAGE ENABLEMENT LINK NEAR-FAR MERGING MATCHING CONTACT OBJECT
COMPULSION RESTRAINTREMOVAL ATTRACTION MASS-COUNT CENTER-PERIPHERY CYCLE SCALE PART-WHOLE FULL-EMPTY SPLITTING ITERATION SUPERIMPOSITION PROCESS SURFACE COLLECTION PATH BALANCE COUNTERFORCE
Figure 2. Basic embodied schemas To illustrate this point, consider the relation between the human body and clothing, jewelry, and other attachable items. It makes sense to speak of such items being attached, or linked, to the body: but this view suggests that the concept of linkage is not inherently symmetrical. Notice, that is, that moving the body entails a parallel movement of the clothing attached to it. On the other hand, moving an item of clothing need imply nothing about the movement of the body even when the clothing is being worn. It is true, of course, that many items are symmetrically linked (e. g. a man and a dog connected by a leash) but this can easily be analyzed as mutual linkage (two items linked to each other). If we make this assumption, the link schema may be analyzed as a kinesthetic, force-dynamic schema involving: a) two elements, one of which, the dependent element, will be linked to the other, autonomous element; b) some configuration (prototypically, physical contact) which relates the two elements; c) the actual link, a causal factor which endows the dependent element with configural inertia: that is, an inherent tendency to maintain its configural relation to the independent element. For example, an item of clothing will move with the body, thus maintaining existing body/clothing configurations.
Knowledge representation
63
Finally, we should also note the rigidity of the link—i.e., the extent to which the configuration may shift, and the strength of the link—i.e., its capacity to overcome outside forces. A weak and flexible link, such as that between a dress and the body, will allow a variety of interactions in which the configuration may shift or even be destroyed. A stronger but still reasonably flexible link, like that between the foot and leg, will allow considerable freedom of movement but within limits. A strong, rigid link places the most constraints on the dependent element. Now, it is true that linkage is transitive, as Lakoff observes, but an indirect link tends to lose strength and rigidity. To be precise, an indirect link is like a chain: it is no stronger than its weakest link. Indirect links are also like chains in that their flexibility tends to multiply as more links are added. The part-whole schema. Lakoff (1987: 273) characterizes the part-whole schema as follows: Bodily experience: We are whole beings with parts we can manipulate. Our entire lives are spent with an awareness of both our wholeness and our parts. We experience our bodies as WHOLES with PARTS. In order to get around in the world, we have to be aware of the PARTWHOLE structure of other objects. In fact, we have evolved so that our basic-level perception can distinguish the fundamental PART-WHOLE structure that we need in order to function in our physical environment. Structural elements: A WHOLE, PARTS, and a CONFIGURATION. Basic logic: The schema is asymmetric: If A is a part of B, then Β is not a part of A. It is irreflexive: A is not a part of A. Moreover, it cannot be the case that the whole exists, while no PARTS of it exist. However, all the PARTS can exist, but still not constitute a WHOLE. If the parts exist in the CONFIGURATION, then and only then does the WHOLE exist. It follows that, if the parts are destroyed, then the WHOLE is destroyed. If the whole is located at a place P, then the parts are located at P. A typical, but not necessary property: the PARTS are contiguous to one another. While this characterization may be accurate for the most part, it leaves many details vague: What is a configuration? When do the parts of a whole count as being "in" a configuration? Correct answers to these questions should predict the properties that Lakoff lists. Let us note that the part-whole schema is conceptually complex, with spatial and kinesthetic subschemas. In spatial, typically visual terms a whole object corresponds to a stable (typically moving or at least moveable) figure
64
An integrated cognitive theory
which stands out against some ground. In kinesthetic terms a whole object is defined by the mutual linkage of its parts, which causes it to move as a unit. As a spatial figure, a whole object constitutes a pattern of spatial relations among its parts: in a triangle, for instance, it is possible to specify the angle and distance by which each point is separated from the others. This is a minimum requirement, of course, since many items stand in specific spatial relations without counting as a configuration (much less as parts of a whole.) In order for a figure to stand out as a perceptual whole, its parts should display a property I will term perceptual adjacency. Perceptual adjacency includes but is not the same as contiguity. Perceptual adjacency is more flexible: if no salient percept intervenes between two concepts, then they are perceptually adjacent even if separated by an intervening gap. Thus, consider the following diagram: (142)
Χ Χ Χ X
The four X's in (142) are not contiguous, since they are separated by spaces. However, each is perceptually adjacent to the neighboring X, since nothing salient intervenes between them. Configurations appear to require that each element in the configuration be perceptually adjacent to at least one other element. Thus, while the X's in (143a) form a configuration, the X's in (143b) do not (unless the Y's are taken as a continuous background, in which case they are no longer salient): (143) a. X X X Y Y Y b. X Y Y X Y Y X This property explains why parts are often contiguous: contiguity is the limiting case of perceptual adjacency. To this, we need to add a second perceptual property: continuity. If a collection of elements forms a configuration, it should be possible to begin at any element in the configuration, proceed step by step among perceptually adjacent elements, and arrive at any other element in the configuration. If two elements are not mutually accessible in this fashion, they do not form a configuration. Thus, the X's in (144a) form a configuration whereas the X's in (144b) do not (though of course they do in combination with the Y's). These properties are the sort with which Gestalt psychologists were concerned. However, they do not suffice, by themselves, to create a partwhole structure. Consider a situation in which three birds fly across the sky. At any moment, their positions in space form a triangle; however, if the birds were to fly in three different directions, it would be difficult to think of them
Knowledge representation
65
(144) a. X X X XXX XXX b. Χ Χ X YYYYYYYY XXX YYYYYYYY XXX as forming parts of some larger whole. On the other hand, if they were all to fly in the same direction, then they would form a stable configuration which could reasonably be conceptualized as a whole (i.e., a formation.) These considerations suggest a final perceptual property: temporal stability. Parts are not perceived as a whole unless they maintain the same configuration over time. While some changes are possible, they are generally variations in shape or relative distance, and not changes in the pattern of perceptual adjacency among the parts. For example, a flight of birds or a line of fenceposts will easily be perceived as wholes, but more evanescant configurations will simply fail to qualify. It is true, of course, that a flight of birds or a line of fenceposts fail to constitute prototypical wholes. They are perceptual wholes, since they form stable spatial configurations. But they are not whole objects, since they do not satisfy the kinesthetic requirements for a whole object. These requirements include: (i) that each part of the whole be mutually linked to at least one other part; (ii) that a whole object consist in a continuous network of mutually linked parts. With ordinary physical objects the kinesthetic and spatial subschemas coincide: mutual linkage entails perceptual adjacency, and continuity of linkage entails perceptual continuity. Continuous mutual linkage thus endows the whole object with temporal stability and configural inertia: each part moves with the other parts to which it is attached, and so the entire network forms a stable configuration. Given this characterization, it is easy to see why the part-whole relation should have the logical properties that Lakoff lists. The whole is equated with a collection of elements displaying perceptual adjacency and continuity; a part, with one element in the collection. Of necessity, then, the part-whole schema will be asymmetric and irreflexive. Since the whole consists of the parts in configuration, its existence depends on theirs. Moreover, a prototypical part will be contiguous with and attached to the rest of the whole, as these properties are associated with mutual linkage. Somewhat more controversially,5 let us note that as presently defined, the PARTWHOLE schema is transitive: a part of a part is also a part. The reason is
66
An integrated cognäive theory
simple: a whole consists of a network of mutually linked parts. If a part itself has parts, then it is itself a network of mutually linked parts, and the effect is simply to expand the network. For example, it does not matter whether we analyze a whole as a network of three parts (each of which is subdivided into three more parts) or as a network consisting of the nine subdivisions directly. The two networks are equivalent. The center-periphery schema. Lakoff (1987: 274) describes the CENTERPERIPHERY schema as follows: Bodily experience: We experience our bodies as having centers (the trunk and internal organs) and peripheries (fingers, toes, hair). Similarly, trees and other plants have a central trunk and peripheral branches and leaves. The centers are viewed as more important than the peripheries in two ways: injuries to the central parts are more serious (i.e., not mendable and often life threatening) than injuries to the peripheral parts. Similarly, the center defines the identity of the individual in ways that the peripheral parts often do not. A tree that loses its leaves is the same tree. A person whose hair is cut off or who loses a finger is the same person. Thus, the periphery is viewed as depending on the center, but not conversely: bad circulation may affect the health of your hair, but losing your hair doesn't affect your circulatory system. Structural elements: an ENTITY, a CENTER, and a PERIPHERY. Basic logic: The periphery depends on the center, but not vice versa. Strictly speaking, LakofFs CENTER-PERIPHERY schema is a schema for core and peripheral parts, and thus is a special case of a more general CENTER-PERIPHERY schema. Johnson (1987: 124) describes the general center-periphery schema as follows: Our world radiates out from our bodies as perceptual centers from which we see, hear, touch, taste and smell our world. Our perceptual space defines a domain of macroscopic objects that reside at varying distances from us . . . At a certain distance from this perceptual center our world "fades off' into a perceptual horizon which no longer presents us with discrete objects. We may move in one direction toward the horizon, thus opening up new perceptual territory, but this only establishes new horizons presently beyond our grasp.
Knowledge representation
67
This more general schema has two particularly salient experiential properties: a) the center is by definition the point with the least average distance from all other points; b) centrality is experientially tied to intensity of interaction; that is, the closer something comes to me the more I can do to it—and the more it can do to me. Both of these properties are true of the more specific schema, which shall henceforth be referred to as the CORE PART schema. In this schema centrality is a function of distance from other parts—but only if distance is calculated in terms of mutual links among perceptually adjacent parts. For example, consider two rather different bodies: the body of a human, and that of a four-foot long octopus with eight foot tentacles. In terms of absolute distance, the center would lie somewhere among the bases of the tentacles. However, if we calculate distance in terms of the linkages which establish the tentacles and body as parts of an integrated whole, a different result emerges. The tentacles move freely relative to each other, so they lack configural inertia and are not directly attached to one another. The only direct relations of mutual linkage are those which hold between each tentacle and the main body. Thus, if we calculate distance in these terms, we find that the main body directly links up with each tentacle (a distance of one step), whereas the tentacles are two steps removed from each other. Thus, the main body is perceived as the core part, and the tentacles as peripheral. In other words, a part is central to the extent that it is centrally placed in the network of mutual linkages which define a collection of parts as an integrated whole. As Lakoff noted in the passage quoted above, centrality appears directly connected to intensity of interaction and to the relative importance and vitality of body parts, indeed to the very identity of the whole entity. Much of this is directly tied to the basic nature of physical interaction, which requires physical proximity if not actual contact before any interaction can take place. Much of the rest follows indirectly from the biological advantage this state of affairs gives to having one's vital organs centrally located where they can be protected from direct interaction with the (potentially hostile) external environment. But the importance of central parts also would seem to follow from their purely cognitive role in establishing the integrity of a collection of otherwise unconnected parts. A collection of entities will be perceived as a whole only if it displays continuity. Thus, if we remove a part from the whole, there are two possibilities: either (i) the remaining parts will display continuity, and so continue to be perceived as a whole, or (ii) they will no longer display continuity, in which case the whole ceases to exist as a single cognitive unit. Observe further that the first scenario is what one would expect for peripheral parts, since they are by definition at the edge of the chains of mutual linkage and perceptual adjacency that establish continuity, whereas the second result is
68
An integrated cognitive theory
what one would expect for core parts, whose removal will necessarily destroy some of the chains which establish continuity. For example, removal of the head from the body does not disrupt the relations between any of the remaining parts. Removal of the torso, by contrast, eliminates not only its own relation to the head, arms and legs, but also their relations to one another. The object schema. As our discussion indicates, the link, part-whole and center-periphery schemata form an integrated system essential to the conceptualization of arrangements in physical space. Jointly, they constitute a higher level schema characterizing objects as integrated wholes. In consequence, the inferential properties of the three schemata mesh to yield new inferential patterns. Consider first the relation between linkage and parthood. If two entities are mutually linked, they form a stable configuration which moves as a unit: in other words, they form a whole. This is an axiom of the system: If A is linked to Β and Β is linked to A, then there is a whole, C, of which A and Β are both part. Moreover, linkage is transitive. So if an entity is linked to one part of the whole, it is indirectly linked to every part, and will maintain a predictable (if sometimes flexible) configuration with respect to each member of the whole. But this is tantamount to saying that the object is linked not just to one part but to the whole. For example, if a ring is attached to a finger, it will form a configuration not only with the finger but with the entire body. Notice that the strength and rigidity of the two links are equivalent. For example, a ring may be removed easily from the finger, but while it is being worn its position with respect to the finger is quite rigid in most dimensions. It necessarily follows that the ring may easily be removed from the body, but that its position with respect to the body (e.g., on the finger) is quite rigid. Thus, we may derive the following rule of inference: If A is linked to Β and Β is part of C, then A is linked to C. By the same reasoning we can also establish the converse principle: If Β is linked to A and Β is part of C, then C is linked to A. We may also argue in the opposite direction. Consider a typical physical object, such as the human body. By the very nature of physical attachment, linking an object to the body generally involves linking it directly to at least
Knowledge representation
69
one part of the body. Conversely, attaching the body to an object generally involves attaching a specific part of the body to the object. 6 An example would be grasping the door-handle of a moving car. This act successfully links body to car, but only through the linkage of hand to car. We may therefore add the following rules of inference: If A is linked to Β and Β has parts, then A is linked to at least one part of B. If Β is linked to A and Β has parts, then at least one part of Β is linked to A. Further consequences emerge when we consider the interaction between linkage and the CORE PART schema. The considerations presented so far suffice to characterize the conceptual nature of whole objects, but they do little to explain why objects are hierarchically articulated into smaller and smaller parts. A major factor, clearly, is a tendency to divide objects into parts at particularly weak and flexible linkages. But there seems to be another factor. It appears that the articulation of an object into subparts may be defined by the presence of local cores. For example, the coherence of the head as a separate entity from the rest of the body depends on the fact that it consists of a central part with other peripheral parts attached to it. The same goes for the hand and for other body parts. Even the arm has relatively central parts (on either side of the elbow) with peripheral areas by the shoulder and at the hand. The presence of local centers arguably supports the division of the whole into discrete subparts. On the other hand, certain objects cannot sensibly be articulated into parts. A circle is a good example: no arc from the circle has any better claim to being a distinct part of the circle than any other arc. Thus a circle does not articulate naturally into parts. Why? Because no point on the circle is central; all have the same average distance from one another. Lacking a core part to give subparts coherence, there may be no cognitive justification for subdividing the whole in any particular way. Why should this be so? It is arguably a consequence of the relation between linkage and centrality. If an object consists in a tightly integrated network of parts, as in Figure 3, only one center is possible (though it may be diffuse). If we trace a path between any two parts, there are always alternate routes: we cannot divide the whole by severing any one connection. On the other hand, if we construct "bottlenecks"—places in the network where the indirect linkages funnel through a few specific parts—then we get networks like Figure 4.
70
An integrated cognitive theory
X X X X / \ / \ / \ / \ X X X X X / \ / \ / \ / \ / \ X X X X X X \ / \ / \ / \ / \ / X X X X X \ / \ / \ / \ / X X X X
Figure 3. An integrated linkage pattern χ
X X / \ / \ X X X \ / \ / X X
X
/ \ X X \ / X / \ X X \ / X / \ X X \ / X
X
X X / \ / \ X X X \ / \ / X X
Figure 4. A linkage pattern with local centers Two consequences follow: (i) local centers emerge; (ii) it becomes possible to break the object into parts by breaking a single set of links or by removing a single part. These are sides of the same coin: parts with local centers, which can be separated from the whole by breaking a single link, come very close to being whole objects themselves, and divide naturally from other parts at the same level of organization, to which they are attached by only a few links. It is natural that where such parts exist that they should be singled out. Conceptual metaphor. Lakoff and Johnson (1980) claim that abstract human thought is grounded in concrete experience via metaphor. In other words, they claim that metaphor is a basic mode of human cognition which enables us to make sense of relatively abstract, difficult, or amorphous experiences by relating them to highly salient and richly structured aspects of our concrete experience of the world. Lakoff and Johnson's theory is thus an interactionist rather than an Aristotelian theory of metaphor (cf. Ortony 1979). In an Aristotelian theory,
Knowledge representation
71
metaphor consists in the recognition of objective similarities between the source and target of the metaphor. Like other interactionists, Lakoff and Johnson claim that metaphor requires no objective similarities but consists rather in the construal of the target domain in terms which assimilate it to the source. Another way of expressing the difference is that metaphor is claimed to involve, not the recognition of objective similarities but the construal of subjective similarities. To be more precise, Lakoff and Johnson claim that a conceptual metaphor involves two basic elements: (i) establishment of a systematic correspondence between the source and target domains; (ii) exploitation of the mapping by transferring inferential patterns from the source to the target domain. For example, Lakoff and Johnson analyze English discourse about theories as involving the metaphor THEORIES ARE BUILDINGS. This involves at least the following mappings: —Theories are buildings —The propositions which enter into a theory are the building blocks from which the building is constructed. —For one proposition to provide evidence for the truth of another proposition is for one building block to support another. This parallelism is then exploited via reasoning processes like the following: If we wish to demolish a theory then we should undermine its foundations. If part of a theory is shaky then we can shore it up with additional supporting arguments. In each case, the vocabulary provided by the source domain suggests appropriate inferences, thus structuring cognition in the target domain. Perhaps the most crucial criticism of this view is expressed in Keller's (1988) review of Johnson (1987). She points out (777): . . . the claim that understanding is constructed on the basis of irreducible analogies relies paradoxically on the claim that such analogies are only possible as a consequence of inherent similarities in a given source and a given target domain. The paradox lies in the fact that an analogy is not irreducible if it relies on a priori structural similarities. What might inherent similarities be, if not yet prefigurative (dare I say literal?) structural representations? For instance, in the metaphor THEORIES ARE BUILDINGS, it is crucial that propositions be construed as (abstract) parts of the theory, and that they
An integrated cognitive theory
72
be placed into correspondence with (concrete) parts of buildings. But for this to be natural, abstract and concrete parts should have something in common that would justify their alignment. Langacker (1987, 1: 166-182) resolves such difficulties by analyzing the necessary similarities as similarities in cognitive processing—not as similarities in objective properties. For example, he develops a detailed account of abstract motion to account for the parallelism between sentences like (145) and (146): (145) a. The men went to Paris. b. The men fell straight down. (146) a. The road goes to Paris. b. The clifffalls straight down. The crucial idea is this: the experience of motion involves a sequence of mental states in which the experiencer's attention focuses first at one location, and then at another adjacent location. Experiencing physical motion may thus be viewed as a sequence of mental states of the form shown below:
Ε
Ε >
L
1
Ε >
L
2
.
.
.
Ε >
> L
n
t L
n+l
n+1
Figure 5. Ε represents the viewer's focus of attention, L the location on which attention is focused, and t the relative time. In the case of physical motion, as in (145), this sequence of cognitive states corresponds directly to a sequence of physical states. But the same cognitive structure may occur without objective physical designata; the structure itself is, after all, not itself a representation of anything in particular but an experiential sequence which may be apprehended in a variety of percepts. It is possible, for instance, to examine a stable spatial configuration by directing one's attention along a sequence of adjacent locations. This is the processing mode which underlies sentences like (146), with attention flowing in the same sequence as in the corresponding physical motion. Since the cognitive pattern is the same, both concepts are being organized in terms of the same image schema.
Knowledge representation
73
Consider, moreover, the analyses presented above for the link, part-whole, and core part schema. These analyses rely crucially on the following concepts: (i) (ii) (iii) (iv)
linkage (and hence configural inertia); perceptual adjacency, continuity, and distance.
It is possible to show that each of these corresponds to a particular experiential pattern (without regard for issues of representation or of referential semantics). The relevant schemata thus apply equally well either to abstract or to concrete concepts, and hence provide the necessary substrate of cognitive similarities to bring concepts from very different domains into alignment, and thus to map one knowledge domain into another. Let us first consider the cognitive structures underlying the link schema. It presupposes the concepts of location and movement in space, which have the following salient properties: (i) (ii)
Any object can be assigned only one location at a time. There are specific cognitive operations for moving from one location to another (e.g., "up one, over two"). Given any location, these operations yield another location as output. (iii) Given (i) and (ii), it is possible to define a (spatial) configuration as a set of elements whose locations do not matter provided that it is possible to move from one element to another by applying specific cognitive operations. The link schema is thus a kind of dependency in which the location of the dependent element is determined by the location of the independent element. This basic structure is not particularly spatial in character. In fact, it is precisely the structure which underlies the mathematical concepts value, function and dependent vs. independent variable. The link schema should therefore be appropriate for concepts with this experiential structure, as may be seen in sentences like Lung cancer is connected to cigarette smoking. Three basic properties remain to be discussed: perceptual adjacency, continuity, and distance. These fit naturally into Langacker's analysis of abstract motion. According to Langacker, the essence of abstract motion is the scanning sequence (the pattern shown above), in which attention "moves" systematically from one concept to the next in the course of processing. Also central to Langacker's analysis is the concept of immanence, which may be defined as follows: a concept is immanent to a scanning sequence if attention
74
An integrated cognitive theory
focuses on it during processing of the sequence. One sequence may be defined as immanent to another if all its component concepts are immanent in both sequences in the same order. Perceptual adjacency may then be defined as follows: if concepts X and Y are immanent to scanning sequence S, then X and Y are perceptually adjacent if and only if there is a scanning sequence immanent to S whose only immanent concepts are X and Y. Continuity between X and Y may be defined as the existence of a scanning sequence beginning at X and ending at Y. Distance between X and Y may be defined as the smallest number of simple (two-element) scanning sequences immanent to any scanning sequence beginning at X and ending at Y. Given these definitions, a whole may be defined as a collection of mutually continuous elements connected by scanning chains whose immanent and perceptually adjacent elements are mutually linked. A part is any concept immanent in such a set of scanning chains. Finally, a core part is a part minimally distant from other parts, while a part is peripheral to the extent that its distance from the other parts is greater than average. These abstract definitions thus provide the subjective similarities necessary to Lakoff and Johnson's theory of conceptual metaphor. Combined, Lakoff and Johnson's theories of embodied schemata and metaphor ground "propositional" representations in perception, while preserving their essentially inferential character. Declarative memory thus takes the form of a complex network of embodied schemata and conceptual metaphors which combine to characterize a person's conceptual world. Cognitive similarity. As the above discussion indicates, the critical factor in metaphor is the perception of cognitive similarity: in fact, the theory of metaphor sketched above depends upon a number of assumptions about cognitive similarity judgements. If metaphor is a mapping between cognitively distinct domains, if it is established by correspondence of image schematic structure, then certain points follow. Not only should image schemas be cognitively basic, but they should also play a huge role in judgements of cognitive similarity, for otherwise their contribution would be overwhelmed by the obvious differences between the source and target domain. The possibility of conceptual metaphor thus points directly toward an account of cognitive similarity which distinguishes two factors: First, there should be structural relations, which establish systematic mappings between concepts, and which should be sufficient, by themselves, to force a judgement of semantic similarity, no matter how different two concepts may be otherwise. Second, there should be background properties and relations, which should play an important role in judgements of cognitive similarity
Knowledge representation
75
within a conceptual domain, but whose contributions should ordinarily be outweighed by structural relations. Such an account is implicit in much recent work: in Gentner's work on analogy (Gentner 1983), in the cognitive topology hypothesis of Lakoff (1989, 1990), which claims that metaphors preserve image schematic structure, and in Palmer's (1977) work on perception of spatial figures. Let us consider, therefore, how such an account of cognitive similarity would work. To judge cognitive similarity is to make a comparison between two concepts. Now, any comparison starts somewhere, with two items: the concept to be compared and the concept which provides a standard for comparison. In fact, comparisons usually seem to be anchored at some part or aspect which serves to align the two concepts for the purposes of comparison. For example, it is natural to compare two shapes by attempting to superimpose one or another part; this alignment then anchors the way in which other parts are compared. Let us term these starting points the anchor and the target. Once the anchor has been aligned to the target, the problem is to bring the rest of the comparison into line. This is where structural relations come in. Suppose that the anchor bears certain structural relations: it is part of one thing, is linked to another, etc. To the extent that the target matches these relations, it will count as cognitively similar regardless of its other properties. The goal is to create a perfect set of correspondences in which the same number of elements enter into the same structural network of relationships. On this account, the difference between conceptual metaphor and ordinary similarity between concepts is that metaphor relies solely on structural relations to achieve a judgement of cognitive similarity. Ordinary similarity generally involves similarity not just in structural relations but in background relations and properties. A theory may resemble a building by virtue of abstract part-whole and support relationships, but not in any other way. By contrast, two buildings may resemble each other in abstract structural terms, but they also share a variety of other properties (such as keeping out the wind and rain) which are specific to the domain of physical objects. Cognitive similarity is a gradient phenomenon, and a variety of factors affect it. At least the following are relevant: a) Structural vs. background relations. Structural relations appear to have a distinctly greater impact on cognitive similarity. b) Structural closeness to anchor or target. Concepts close to anchor or target seem to have a greater impact on similarity than more distant concepts. Thus the concepts conceptually adjacent to anchor or target have the greatest effect on similarity.
76
An integrated cognitive theory
c) Activation level. Salient concepts appear to have a greater effect on similarity judgements than less active ones. Since the anchor and target may reasonably be assumed to be in focus during a comparison they have the largest effect but other, structurally close concepts will also be relevant since they will be retrieved by spreading activation. Now similarity judgements should differ markedly when there are differences as to which conceptual relations are structural and which are background relations. This variable defines a framework for distinguishing the different sorts of knowledge representation postulated by Anderson, and even a mechanism for generating special-purpose representations for specialized skills. Each representational type may be defined by treating a different set of conceptual relations as structural. For example, Anderson notes that temporal strings have the following properties: a) They generally preserve information about relative order of events, but not information about absolute temporal duration. b) They are front-anchored: two lists that begin with the same elements are judged far more similar than two lists that end with the same elements. c) In processing a temporal string, it is easy to determine what what item immediately follows an element in the string. It is also easy to make judgements of relative order. These properties follow automatically if we assume that temporal strings involve a single structural relation: succession. Succession relations code relative order; thus, if succession is the only structural relation for temporal lists, it will play the single largest role in similarity judgements. Conversely, if absolute duration is a background relation, it will have no more significance than any other of a vast array of incidental concepts whose major effect on the similarity of temporal strings is to determine whether the items in corresponding strings are identical or distinct. If succession is the only relevant structural relation for temporal strings, it should follow that temporal strings are front-anchored, for only concepts succeeding the anchor or target would then be relevant to similarity judgements. Finally, succession relations should be easy to judge, since their status as structural relations would require ease of processing. Similar considerations apply to Anderson's spatial image representations. These can be treated by assuming that spatial images involve two types of structural information: orientation and angle (and hence relative but not absolute distance). If these are the structural relations, judgements which
Knowledge representation
77
depend on them (e.g., judgements of distance, direction, relative position, or overlap) should be relatively easy and should affect judgements of similarity very strongly. Anchors would then be either points or entities conceived as points with their internal structure ignored. This is because the structural relations of orientation and angle by definition relate an entity to other entities external to it; internal spatial relations should therefore be relegated to background status once the entity involving them is chosen as anchor. Of course, each concept in the network of spatial relations will, if complex, constitute a configuration in its own right, thus creating the possibility for hierarchically structured sets of images (cf. Anderson 1983: 58-61). Anderson notes that propositional representation has several crucial properties: a) It is relatively abstract, encoding "relational categorizations of experience" but not concrete details; b) It displays structural constraints (selection restrictions) leading to such phenomena as the filling-in of default values for unspecified (but semantically necessary) slots in propositional structure. c) It is far easier to judge connectivity—whether two concepts are associated—than it is to determine exactly what the relation is. These properties can be explained by assuming that embodied schemata are the relevant structural relations for this type of knowledge representation, and thus are the single most important judgements of propositional similarity. In particular, note that embodied schemata are relational categories with what Johnson (1987: 41) terms gestalt structure—the structured relations between "argument slots" typical of natural language semantics. Moreover, if embodied schemas constitute the structural information, then other information will constitute the background. Now, embodied schemas per se are highly abstract cognitive structures; more detailed information (e.g. perceptual information) needs to be provided to flesh them out into specific concepts. Thus, the information provided by embodied schemas will be highly abstract, including connectivity judgements but not including specific judgements involving concrete, perceptually situated concepts. This hypothesis is equivalent to the claim of Lakoff (1989, 1990) that conceptual metaphor preserves cognitive topology—i.e., the network of embodied (or image-) schematic relations inherent to a concept. Treating embodied schemata as the structural relations relevant to propositional thinking guarantees that two concepts with identical schematic structures will seem highly similar regardless of the conceptual domain to which they apply.
78
An integrated cognitive theory
Now, there is no inherent reason why temporal strings, spatial images, kinesthetic representations and propositional representations should be the only types of knowledge representation (as Anderson postulates). Indeed, the approach sketched above allows for a variety of special representational types, since choosing a different set of structural relations will yield a different form of knowledge representation. This is useful because it seems clear that people with specialized skills and abilities possess the ability to instantly detect the important features of a situation and to reason accordingly. For example, as a Chess player the author is instantly able to detect a variety of patterns inherent to the game: lines of attack, protective relations among pieces, potential forks and exposed attacks, etc. For a Chess player, it is these relations which determine the similarity of two positions—and it is these relations which guide decisionmaking during a game. That is, skill in Chess arises when the player learns to perceive certain relations as structural relations because they are usually relevant to making a correct decision. Thus, it seems probable that each form of knowledge representation exists because its structural relations are crucial to a specific skill or ability: temporal strings, for example, are important in learning about action sequences, since the same actions in a different order will usually yield different results. Angles and orientation are crucial to spatial reasoning, while embodied schemata are central to a variety of bodily interactions with the world.
2.3. Storage and retrieval from declarative memoiy In Anderson's theory knowledge is represented in essentially the same form regardless of its status in memory. Thus, having established in general terms how knowledge is to be represented, the next step is to consider how information is stored and retrieved. At the beginning of the chapter a diagram was reproduced which contained the labels working memory and declarative memory. However, Anderson claims that the boundary between working and declarative memory is fluid, depending on a concept's status, and not on its metaphorical "transfer" from one mental box to another. Two variables are relevant: a) what Langacker (1987, 1: 59-60) terms entrenchment, i.e., the extent to which a concept has become familiar, and hence is processed as a cognitive unit (cf. Anderson 1983: 182-189); b) activation, i.e., the extent to which a concept has been placed in the center of attention. Essentially, a concept belongs to declarative memory only to the extent that it has become entrenched, and is available in working memory only to the extent that it has been activated. The various concepts we have just mentioned—i.e., entrenchment and activation—are, of course, precisely those invoked in Chapter One to account for constraints on extraction.
Storage and retrieval
79
In the theory we have sketched, there are a variety of possible states, but only four polar possibilities: concepts that are neither entrenched nor active, concepts that are active but not entrenched, concepts that are entrenched but not active, and concepts that are simultaneously entrenched and active. The first case involves the traces of an unfamiliar concept in declarative memory. In Anderson's theory, a concept gains in strength—i.e., it is further entrenched—every time it is successfully used. Ease of activation corresponds to degree of entrenchment; thus, such traces are difficult to activate and are relatively unimportant during the processing of well-established cognitive structures. Of course, they are central to the learning process. The second case involves what Anderson terms temporary representations in working memory: temporary in the sense that once they cease to be active they are difficult, if not practically impossible, to reactivate. Anderson assumes that a concept decays—i.e., loses activation—quite rapidly unless it is reinforced; thus, temporary representations can be held in memory only if some means is found to reinforce them continually. One route, of course,is for the concept to enter working memory as a perceptual input. Percepts have a high level of activation that they maintain as long as the relevant stimulus continues to impinge on consciousness. Another route is for a concept to be stimulated by spreading activation, in which case temporary concepts remain in memory as long as their activation source is sufficiently active. The most powerful route, as discussed in Chapter One, is the use of focus (cf. Grosz 1981). Essentially, we may picture focus as a limited store of surplus activation which may be allocated to keep a concept salient which would not otherwise maintain such prominence. The relation between temporary representations and the assignment of focus has been studied extensively under the heading short term memory. As is well known, the capacity of short term memory is strictly limited: no more than five to seven unrelated items may be retained in short term memory at any given time. In Anderson's theory this translates into limitations on the number of items which may be placed in focus: no more than five to seven. And this, of course, is the maximum possible with total concentration; conditions of lower concentration will sustain fewer items in focus. For example, language processing is highly automatized and requires little concentration: thus it seems highly probable that under normal conditions very little surplus activation is allocated to the process of sentence construal. The remaining cases both involve fully entrenched concepts; as such, they are ipso facto part of declarative memory. Thus, the only question is whether they are sufficiently active to be readily available for cognitive processing, and hence to be part of working memory. Anderson recognizes one basic route by which an inactive but entrenched concept can be recalled to active
80
An integrated cognüive theory
status in working memory: spreading activation, which was discussed at length in Chapter One. In Anderson's formulation, activation spread is governed by a complex formula whose variables include the activation levels of the associated concepts, their entrenchment, and the entrenchment of the relation which connects them. Anderson's equation is sufficiently complex that it will be easier to characterize it in terms of its essential effects. For present purposes it is far more important to calculate qualitatively correct results than to have an unnecessarily precise model whose predictions are more difficult to calculate. Without going into detail, the formula gives rise to three crucial effects: divergent activation (what Anderson terms the fan effect), convergent activation, and effects arising from the interaction of spreading activation with entrenchment. We have already discussed divergent and convergent activation at length in Chapter One, where we adopted the tables numbered (131), (132) and (140) to calculate activation spread in attribute/domain and whole/core part relations. These tables will be used hereafter whenever activation is calculated. They fail to address two major concerns, however: the exact conditions under which convergent activation takes place and the role of entrenchment in activation spread. Convergent activation has to do with the effects of multiple cues, effects which are presumably larger the more cues are present. However, there is presumably some threshold below which convergent activation has no significant effects. It seems reasonable to assume that a minimum of three converging activation sources must be present, if only because so very many concepts are likely to receive concurrent stimulation from two or fewer activation sources. This assumption may be represented more precisely as follows: (147) a. Ordinary spreading activation behaves like activation spread from domain to attribute, so that (131) in Chapter One is the appropriate table to calculate activation spread between two perceptually adjacent concepts. b. However, if a concept is perceptually adjacent to three or more active or salient concepts, the following chart is in effect: If the concept would otherwise be:
Then the concept is at least:
inactive
active
active
salient
salient
salient
Storage and retrieval
81
Let us now focus on the last major effect involving spreading activation: the interaction of entrenchment with spreading activation. The following diagram may be used to illustrate the basic concept:
(148)
A
Β
salient A and Β are perceptually adjacent concepts, and A is salient. Ordinarily, this will guarantee that Β is active by spreading activation. Consider, however, what will happen if Β is either highly entrenched or hardly entrenched at all. If Β is highly entrenched, the activation which spreads to it will be multiplied, making it salient rather than active. Conversely, if Β is hardly entrenched at all, it will resist the activation spreading to it, and remain inactive even though it is adjacent to a salient concept. (Of course, it will become active by convergent activation if there are enough active nodes adjacent to it to eliminate its resistance.) Entrenchment is also relevant to focus: a highly entrenched concept will become active after drawing only a small amount of surplus activation, leaving more available for other purposes. Entrenchment is thus a major uncontrolled variable, constituting a topic of such significance that the greater part of Chapter Five will be devoted to it. It plays a major role not only in recall through spreading activation but also in determining the processing costs of production application.
2.4. Productions, relevance, and the matching process At this point we may focus on production memory, the last major element in Anderson's account of cognitive architecture. Production memory resembles declarative memory in being a permanent form of memory; it differs by containing productions: condition—action pairs which effect most cognitive processes, either singly or in combination. A production contains, first, a condition: that is, a schematic description of the concept(s) to which the production applies. Second, it contains an action, an alteration to the content of working memory. For example, a human baby might have simple productions like the following: If it's a face, then smile. At a more sophisticated level, we might find productions like: If people have frowning faces, they are angry.
82
An integrated cognitive theory
Even more sophisticated might be: If someone is looking at me, pay attention to what he or she is saying. These examples also illustrate the variety of changes in working memory that can be induced by a production. In the first production, the condition is an external pattern, and the action is the initiation of a motor routine. Working memory is changed, but only that part which represents the current activity of the organism itself. In the second production, inferring anger from a frown adds a concept to working memory, thus elaborating the organism's representation of external reality. Finally, paying attention to someone who maintains eye contact is a different sort of production, for it makes no direct changes in the contents of working memory. Instead, it manages the distribution of attention. In other words, productions are a very general mechanism. A variety of patterns in working memory can serve as conditions for a production, and the action could be a motor routine, an inference (the elaboration of an internal representation), or an alteration in the distribution of attention. The utility of productions lies not in any claims they make about the contents of the mind but in the idea that all productions are processed according to the same constraints. As noted in section 2.1., the problem of production application breaks down logically into three parts: matching, the application of the action, and learning. Since linguistic knowledge is highly overlearned, we will not discuss learning but simply note that productions become entrenched to the extent that they apply successfully and that entrenchment has facilitating effects on production application to be discussed below. The issue of pattern matching itself can be broken down into two issues: similarity and control. Similarity is a fairly straightforward: a production's condition should resemble a configuration in working memory if its action is to be triggered. Productions should therefore match according to the processes of pattern matching and cognitive similarity discussed above. Thus, only the control of production application requires detailed discussion at this point.
2.4.1. Control of cognition: general considerations Cognitive similarity is essential to the application of productions. Anderson points out, however, that it cannot be the only factor. As processing proceeds, multiple productions may match the contents of working memory
Productions, relevance and matching
83
simultaneously, each specifying contradictory changes. Thus, it is crucial that pattern matching proceed in such a fashion that certain productions have priority over others. Anderson assumes several properties are essential (1983: Ch. 4): First, degree of match: the greater the cognitive similarity between the conditions of a production and a concept in working memory, the more quickly and easily the productions applies; thus a completely matching production should apply before one whose condition only matches partially. Second, production strength: the more entrenched a production is, the more easily and rapidly it applies; thus all other things being equal, the more entrenched production will apply first. Third, data refractonness: an element in working memory can only match one production at a time; thus productions which could apply to the same portion of working memory compete with one another, and only one may apply at a time. Fourth, specificity, if two productions can both apply, the one whose condition is more specific applies first. Fifth, goal dominance: when a production mentions a goal, it can only apply when that goal is currently active. Anderson achieves this effect by treating goals as a kind of focus; thus if a production mentions a goal in its condition, concepts will qualify for goal status in part by virtue of the very fact that they are in focus, and so will not match if they are not in focus because in that case they are not actually goals. Anderson's pattern matcher. Anderson uses a pattern matcher which displays each of the above as an emergent property. It is based on the concept of a data flow network. The essential idea is that each production forms the top node in a conceptual network in which are arranged the subpatterns tested for by its condition. If a subpattern is relevant for two productions, it is dominated by both. For example, in a data flow network designed to recognize the capital letters Ρ, Β and R, the simple network in figure 6 might be operant. Pattern recognition is accomplished by spreading activation and mutual inhibition of competing nodes. Suppose, for example, that the input were the lower-case letter b. This contains the subpatterns labeled (a) and (b). Activation spreads from (b) up to (e) where it is split among the competing concepts Β, Ρ and R. It also spreads from (a) to B. This converging activation makes Β the most active concept, at which point its inhibitory effects on Ρ and R will drive their activation levels toward zero, reflecting a categorization of b as an example of B. Of course Β will not be as active as it would be for a complete match, since there is no input from (c).
84
An integrated cognitive theory
R
Β
Ρ (e)
diagonal tail (d)
upright line (b)
Figure 6. A data flow network for Β, Ρ & R This pattern matching procedure automatically displays the major properties Anderson needs: complete matches are preferred over partial matches because complete matches contain more active subpatterns, rendering the complete match more active. Entrenched productions become salient at lower levels of activation, just as any entrenched concept would. Data refractoriness follows from the competition between subpatterns and productions at the same level in the network. Productions with specific conditions are preferred over less specific productions for the same reason that complete matches are preferred: they have more activation sources and so have the advantages accruing to convergent activation. Incidentally, the definition of cognitive similarity advanced earlier appears equivalent to a particular type of data flow network, one in which structural relations contribute directly to recognition of similarity but background relations do not. Figure 7 illustrates the concept. Nodes A, B, C, D and Ε represent configurations and subconfigurations. In particular, A is the target. R(A,B) and R(A,C) are structural relations which connect perceptually adjacent concepts to the target. R(B,D) and R(C,E) are structural relations removed from the target by a slight conceptual distance. A', B\ C', D' and E' are independent characterizations of the concepts linked by structural relations; the links beneath them represent various background relations. As examination of the connections in this diagram reveal, the closest structural relations contribute doubly to recognition of the concept, since they connect both to the configuration itself and to the subconfigurations Β and C. More distant structural relations have a less direct effect, while background relations only contribute weakly through recognition of the matrix concepts.
85
Productions, relevance and matching A
B-
D
R(A,
B)
R (A,
R(B,D)
C)
R(C,E)
D'
Figure 7. A data flow network for configurations
2.42 Processing cost and the concept of relevance
In Anderson's account of productions processing cost is measured by the assignment of focus and by speed of application. However, it is not otherwise a major factor in Anderson's theory. There is a good case to be made, however, that the concept of processing cost is integral to an adequate account of cognition. Perhaps the most intriguing such case is that made in Sperber and Wilson (1986), who argue that most of pragmatics can be derived from the assumption that people strive to maximize relevance. They define relevance in terms of two factors: contextual effects and processing cost. Contextual effects consist in information that one may derive by combining an assumption with a context (a set of background assumptions). These effects include the strengthening or refutation of information already present in the context, and the drawing of inferences which can only be derived by combining the assumption with the context. Thus, the contextual effects of an assumption are a measure of the increase in reliable information that it yields. Processing cost is the amount of mental effort required to derive a given contextual effect. This may include the cost of accessing specific pieces of information in memory and the cost of drawing specific inferences from already available information. Sperber and Wilson then define relevance as follows (1986: 125):
86
An integrated cognitive theory
Relevance Extent Condition 1: an assumption is relevant in a context to the extent that its contextual effects in this context are large. Extent Condition 2: an assumption is relevant in a context to the extent that the effort required to process it in this context is small. The idea is that people seek to maximize relevance, i.e., to extract as much information as they can at as low a processing cost as possible. While Sperber and Wilson provide only crude machinery with which to calculate the effects of relevance, their analysis provides tantalizing insights. And there is already an intriguing similarity between Anderson's account of pattern matching and Sperber and Wilson's use of relevance. In Anderson's theory, productions apply more quickly if their conditions are salient and/or entrenched: these are precisely the kinds of concepts that Sperber and Wilson would claim to involve low processing costs. Anderson's theory also claims that productions apply more quickly if their conditions are more specific—which translates to greater informativeness—one type of contextual effect. But in Anderson's theory, production application is controlled solely by the nature of the production's condition—its action has no effect whatsoever. That is, the first production to apply is the one whose condition becomes most active as a result of the matching process. This is entirely incompatible with Sperber and Wilson's basic insight, which is that cognitive processing can be affected by its consequences—that cognitive processing is regulated so as to balance the effort involved in an operation against its probable benefits to the efficiency of future processing. Consider, for example, the Chess move P-K4. At the beginning of a game, this is one of the most relevant moves to consider making. Why? Because of its consequences for the moves of other pieces. Once the king's pawn has been moved forward two spaces, one's own queenside and one's kingside bishop are free to move. In addition, the pawn blocks the king's file to enemy pieces and makes it impossible for enemy pieces to occupy two squares near the center of the board without risk of capture. In terms of the psychology of a Chess player, we might say that the P-K4 production is relevant because it changes working memory in such a way as to match the conditions for a variety of other productions. There is an interesting similarity between relevance as Sperber and Wilson define it and certain properties of connection ist models of cognitive processing. Relevance involves an optimization: trading processing cost against contextual effects. Connectionist models begin with a rapid spread of excitation among the nodes in a network, which then steady down into an optimal steady state balancing the effects of excitatory and inhibitory links
Productions, relevance and matching
87
between nodes, (cf. Smolenski 1986). This similarity reveals how to incorporate contextual effects into Anderson's theory. Let us begin by assuming that productions apply gradually rather than all or none. Thus, when a production begins to match a configuration in working memory, its condition is at a low level of activation, and so its action comes into effect only tentatively, that is at a low level of activation. Nonetheless, it is present in working memory. Let us assume, further, that the process of matching causes an increase in activation for the configuration that is matched, not just for the production which matches it. This is actually implicit in the account of cognitive similarity presented previously, since both the target and anchor are placed in focus during pattern matching. Consider what this implies. Suppose that a particular production will enable several other productions to apply. When it begins to match, its action begins to form in working memory at a low level of activation. The resulting configuration may be low in activation, but it is available for matching. If other productions do, they will rapidly drive the configuration's activation levels up, expending surplus activation until it constitutes a readily accessible part of working memory. For example, the eleven new moves made possible by 1. P-K4 are likely to drive it to salience far sooner than the three new moves created by 1. P-N4. It is no surprise therefore that experienced players typically fail even to consider 1. P-N4. Relevance, then, is the balance between two kinds of processes. First, there are processes that have a cost in surplus activation, such as the assignment of focus. Second, there are processes that lower processing costs by raising activation levels, such as spreading activation, relevance and entrenchment. Note that there is an important connection between relevance, entrenchment and categorization. To begin with, a category may be defined quite simply as a concept which has become fully entrenched, since the major reason for considering that a concept is a category is its familiarity and ready recognition. Furthermore, entrenchment is a function of frequency of successful use: the more often a concept is used to guide thought and action successfully, the more entrenched it becomes, making it easier to recognize and use it in the future. Finally, frequency of successful use is itself dependent on the richness of the information associated with a concept: each piece of information provides a separate reason for accessing the concept, thus increasing the opportunities for its successful use. Since Sperber and Wilson define relevance in terms of the balance between processing cost and informational richness, entrenchment has the effect of increasing the net relevance of highly informative concepts. In other words, concepts become entrenched because they provide easy access to a bundle of useful
88
An integrated cognitive theory
information, either related concepts or productions (inferences and cognitive routines). Rosch (1978: 28-29) expresses the point as follows: . . . what one wishes to gain from one's categories is a great deal of information about the environment while conserving finite resources as much as possible . . . On the one hand, it would appear to the organism's advantage to have as many properties as possible predictable from knowing any one property . . . On the other hand, one purpose of categorization is to reduce the infinite differences among stimuli to behaviorally and cognitively useable proportions.
2.5. Categorization and polysemy: a case study The concept of categorization relates in turn directly to the concept of polysemy (multiple but related meanings of a single word, cf. Deane 1988b), which the theory presented in this chapter was originally developed to explain. It will thus be helpful to examine how the theory applies to polysemy (a phenomenon which absolutely demands a purely cognitive account) before we go on to examine how it can be applied to problems of syntax. Polysemy presents a number of challenges to traditional semantic theories. First, there is the problem of sense selection. Given that a word can mean two different things, how do we know when it will mean what? Second, there is the problem of semantic relatedness. Polysemy is distinct from homonymy because the senses of a polysemous word are related to one another. What kinds of relations are these? How can we recognize their presence? Third there is the problem of category identity. It is often hard to tell whether the senses of a polysemous word should be viewed as distinct (but related) categories, or whether they should be viewed as contextual variants of a single underlying category. In Deane (1988b) the present theory is applied to offer hypotheses about each of these problems. The following analysis is offered: first, sense selection is argued to be a function of relevance. Communication is a matter, in Sperber and Wilson's terms, of making one's intended meaning manifest to one's interlocutors. In psychological terms, this is largely a matter of making sure that the words one uses cause the intended meaning to become salient in the minds of one's audience. And in the present theory salience is a direct function of relevance. Second, semantic relatedness is argued to be a direct consequence of knowledge representation: the senses of a word are related by cognitive relations between concepts, i.e., by image schematic connections and by judgements of cognitive similarity. Finially, category identity is argued to depend on relevance also, in a very interesting way. As noted above, there
Categorization and polysemy
89
is a natural connection between categorization and relevance in which categories represent bundles of information about the world. While there is a motivation to distinguish many different categories, there is also a processing cost involved in learning and in distinguishing new categories. Thus there is a natural connection between categorization and relevance: organisms may be expected to entrench (or make categories of) only those distinctions that happen to be relevant to them. Deane (1988b) states the following principle: (149)
Two items belong to the same category to the extent (a) that they share relevant information and (b) that contrasting information is irrelevant.
This principle is essentially the old emic principle with a wrinkle added: its effects are subordinated to the calculation of relevance, implying that categorial identity is subject to contextual and information-flow considerations. Here is how relevance affects categorization. Suppose that two mental states differ, but they differ minimally, in that they have the same processing cost and the same productions may apply. There is no reason to distinguish two such states of mind: they obviously share relevant information (for otherwise the same productions could not apply to them) and the differences between them are not relevant by definition (for no productions hinge on the differences, and hence no contextual effects ensue.) In other words, the two mental states are functionally equivalent, at least in the short term. Deane (1988b) claims that category identity depends on how closely two concepts approximate functional equivalence of this sort. Let us see how a theory of this sort works out in practice by examining the polysemy of the English word body as analyzed in Deane (1988b). Body has a number of senses. These include, inter alia, the senses listed in (150). Certain observations may be made about these senses. Senses (150a) and (150b) behave grammatically as a single unit, since they name a whole and its core part. Senses (150g) seem categorially identical to sense (150a) when the animals in question have bodies anatomically similar to human beings. When the animal is sufficiently different, category distinctness sets in. The remaining senses behave grammatically as distinct lexical items. Why should this state of affairs be the case? Deane (1988b) argues that these properties are consequences of conceptual configuration given in Figure 8. This configuration, it is claimed, constitutes the most salient background knowledge associated with the concept BODY—to be precise, its structural relations. The possibility of other senses, and their relation to the basic 'human body' sense, is a function of the factors which determine rele-
An integrated cognitive theory
90
instance of
instance of instance of whole/part
control Figure 8. Paul D. Deane, 'Polysemy and Cognition', Lingua 75, p. 353. Copyright (c) 1988, Elsevier Science Publishers. Reprinted with permission. (150) (a) (b) (c) (d) (e) (f) (g) (h)
the main sense, in which body names the body of a human being. A closely related sense in which body names the core part of the body, the torso. A metonymic sense in which body names a person: as in whats a body to do. a more abstract sense in which body names a concrete physical object, as in physics (a moving body continues in a straight line). metaphorical senses based on the idea that a body is a figure moving as a unit through space (a body of men approached me). a set of metaphoric senses based on (b), such as the body of the evidence supports me. metaphoric senses based on the idea of thickness or tangibility, as in this stew has body. Something of a category squish in which body is applied to living things other than humans, beginning with expressions like the body of a monkey and ending with expressions like the body of a sea squirt.
vance. The arrows in the diagram represent the direction of stronger activation spread given attribute/domain asymmetries and the special status of core parts. For example, when we examine sense (150b) we find that spreading activation works strongly in either direction: BODY makes TORSO salient and vice versa. This has extremely interesting consequences. Since each concept makes the other salient, each will recall all the concepts associated with the other at the same activation levels, which means in turn
Categorization and polysemy
91
that exactly the same productions will be able to apply no matter which of the two concepts is placed in focus. In short, the concepts of the body and torso are functionally equivalent in the terms described above, and so they ought to behave as a single category despite the obvious differences between them. In other words, a complete mutual association between two concepts causes them to fuse into a single categorial complex. In this case, they are single categories even though as Nunberg (1980) puts it, "they are distinct entities in anyone's ontology". Deane (1988b) argues that this is the situation which underlies the polysemy which forms the basis of Nunberg's (1980) arguments—polysemy like that of newspaper where the same term is used to refer to an abstract publication and a publishing organization without any indication that the two concepts are distinct in any linguistically relevant way. In the previous sections two arguments were presented. The first argument claimed that the part-whole relation is an attribute/domain relation, and thus that activation levels in a part spread without significant diminution to the whole. The second argument claimed that core parts benefit from convergent activation, so that their activation levels rise and fall with those of the entity as a whole. Together, these arguments produce a situation in which whole and core part necessarily covary in activation and so constitute a single category. This should be true not just of physical core parts like the torso, but of any concept which instantiates the CORE PART image schema. This result dovetails neatly with an earlier section in which it was argued that core parts literally determine the whole's identity as a single discrete entity. It appears that core parts simply have so close a relation to the whole that there are few if any reasons carefully to distinguish the two concepts. The natural response to an attempt to distinguish them is likely to be that noted by Cruse (1986: 171). Asking what one termed the part of a fork excluding the handle, he most frequently received the response But that IS the fork. Such responses seem to be natural for core parts: (151) a. The head without the jaws, face and ears is still the head. b. The arm without the hand is still the arm. c. The hand without the fingers and thumbs is still the hand. Sentences like (151) seem to be received as true descriptive statements. A second case of this sort is the signification relation which holds between form and meaning. Form must instantly recall meaning and meaning form if language is to be used fluently. It follows that any morpheme, for example, will consist of a signifier, or form, whose activation levels covary with the signified, or meaning. (149) then yields an entirely appropriate result: that form and meaning are fused into a single category and normally can no more
92
An integrated cognitive theory
be separated or distinguished from one another than a whole object is likely to be distinguished or separated from its core part. Let us now examine other senses of the word body, beginning with sense (150h), which appears to be categorially identical with the basic, human sense of the word, provided that the animal body in question is not too unlike the human body. This conclusion can be supported by the possibility of identity-of-sense anaphora; that is, one can say things like: (152)
My body is larger than a deer's.
(149) also explains this result, as follows: The diagram presented above represents the structural relations saliently evoked by the word body. Other knowledge about the parts and functions of the human body are also evoked, of course, but at a much lower level of activation. Two points follow: First, that the term body is likely to be applied any time this configuration is matched—i.e., any time a physical object embodies an agent and is articulated into central vs. peripheral parts. Since these properties are salient, their processing cost is low. Moreover, there are obviously a variety of productions which specifically address interaction with the bodies of living creatures; one simply does not treat a living creature the same way one does an inanimate object. These productions entail that the concept BODY is going to have many contextual effects even if the body in question is not a human body. In other words, the concept BODY is highly relevant wherever its structural relations are matched, for it has many contextual effects at a low processing cost. Now, the background relations are what specify the difference between a human body and other kinds of bodies. As these are not salient, and since they would not (for the most part) evoke productions as important (i.e. as highly entrenched) as those associated with the basic animate/inanimate distinction, they are less relevant. They do play a role in categorization, though, by modifying the basic level of relevance. Bodies highly similar to the human body will share almost all relevant information, and so will be categorized strictly with the human body. As this information varies, more and more of the information not shared between the human body and the animal body will become relevant; thus, the two concepts will be more and more likely to be classed as separate categories. At the bottom of the scale we encounter bodies like those of the sea-squirt which differ somewhat even in structural relations, since the body of a sea squirt lacks any differentiation between core and peripheral parts. This guarantees an even greater degree of categorial distinctness. Other senses are categorially distinct. The sense of body in expressions like whats a body to do involves a metonymy from the body to its embodied
Categorization and polysemy
93
agent. The failure of identity-of-sense anaphora attests to the categorial distinctness of the sense; thus, one cannot say: (153)
*Whatever a body does, his is likely to get hurt.
The current theory is able to explain this fact as follows: The concept BODY saliently evokes the concept of its embodied agent. However, the inverse does not hold. That is to say that the body is an attribute of its embodied agent; the agent is not an attribute of its body. It follows that the AGENT and BODY concepts do not evoke the same network or frame of associated concepts. And when they do evoke the same concepts, there will often be a significant difference in activation level. To be precise, if the concept BODY is salient, then (by the patterns discussed in the last chapter) it follows that the AGENT, FIGURE, MASS and CORE PART schemata will all be salient. On the other hand, if the concept AGENT is salient, spreading activation will move from domain to attribute when it crosses from the concept AGENT to the concept BODY. This reduces the activity level of the concepts associated with BODY, which means that background concepts associated with them will not even be evoked. These differences in activity level and presence in working memory correspond directly to differences in relevance: differences in activity level correspond to differences in processing cost, while differences in associated concepts correspond to differences in contextual effects. In other words, the relevant information shared by the two concepts overlaps but is not identical, so the two categories are distinct. The remaining, metaphoric senses are categorially distinct for similar reasons: they share only a part of the relevant information with the basic sense. Sense (150d), the 'moving body sense, shares two of the four structural relations: it is an instance of the FIGURE and MASS schemas. This connects it with the basic sense in two ways, as indicated in Figure 9.
>
FIGURE & MASS image schemas