268 18 16MB
English Pages 453 [456] Year 2006
Language and Memory
W G DE
Trends in Linguistics Studies and Monographs 173
Editors
Walter Bisang (main editor for this volume)
Hans Henrich Hock Werner Winter
Mouton de Gruyter Berlin · New York
Language and Memory Aspects of Knowledge Representation
edited by
Hanna Pishwa
Mouton de Gruyter Berlin · New York
Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.
© Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability.
Library of Congress Cataloging-in-Publication
Data
Language and memory : aspects of knowledge representation / edited by Hanna Pishwa. p. cm. — (Trends in linguistics. Studies and monographs ; 173) Includes bibliographical references and index. ISBN-13: 978-3-11-018977-3 (hardcover : alk. paper) ISBN-10: 3-11-018977-1 (hardcover : alk. paper) 1. Psycholinguistics. 2. Memory. I. Pishwa, Hanna. II. Series. P37.5.M46L36 2006 401'.9-dc22 2006005192
ISBN-13: 978-3-11-018977-3 ISBN-10: 3-11-018977-1 ISSN 1861-4302 Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at .
© Copyright 2006 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording or any information storage and retrieval system, without permission in writing from the publisher. Cover design: Christopher Schneider, Berlin. Printed in Germany.
Contents
Chapter 1 Memory and language: Introduction Hanna Pishwa Part 1. Linguistic structure and memory Chapter 2 A constructivist epistemology for the language sciences Ngoni Chipere Chapter 3 'What's a nice NP like you doing in a place like this?' The grammar and use of English inversions Gert Webelhuth, Susanne Wendlandt, Niko Gerbl, and Martin Walkow Chapter 4 Co-occurrence and constructions Paul Deane Chapter 5 On the reduction of complexity: Some thoughts on the interaction between conceptual metaphor, conceptual metonymy, and human memory Rainer Schulze Chapter 6 Making ends meet Uwe Multhaup Part 2. Select linguistic notions and memory Chapter 7 Evaluation and cognition: Inscribing, evoking and provoking opinion Monika A. Bednarek
vi
Contents
Chapter 8 Causality and subjectivity: The causal connectives of Modern Greek Eliza Kitis
223
Chapter 9 Expression of goals in communication: The case of multifunctional 'try' Hanna Pishwa
269
Chapter 10 What psycholinguistic negation research tells us about the nature of the working memory representations utilized in language comprehension BarbaraKaup
313
Part 3. Discoursal units and memory Chapter 11 It utterly boggles the mind: Knowledge, common ground and coherence Wolfram Bublitz
359
Chapter 12 Remembering another's experience: Epistemological stance and evaluation in narrative retelling liana Mushin Chapter 13 Conversation memory: Intentions, politeness, and the social context
387
Thomas Holtgräves
409
Chapter 14language, persuasion, and memory Nonliteral Roger J. Kreuz and Aaron Ashley
425
Index
445
Chapter 1 Memory and language: Introduction Hanna Pishwa *
1. Aspects of memory in linguistics The motivation for putting up this volume originates in my study on the acquisition of English by twelve German students attending the primary and secondary school in Berlin (for details, see Pishwa 1998). The goal of the study was to examine cognitive economy in the growth of the interlanguage. The findings demonstrated the brain's ability to adapt itself to the complexity of data: accuracy of processing decreases with increased complexity and vice versa (cf. Pishwa 2002). This became evident in a predominantly holistic processing of linguistic data in the initial stage, where a learner faces a huge amount of new phenomena; holistic processing required less effort than analytic processing would have done. Only some areas, such as the lexicon and a few simpler grammatical rules, were processed analytically. Holistic processing was characterized by attention paid only to salient phenomena, which consisted of the particular marking of perceptually prominent events and goals of the protagonist; the latter are, as demonstrated in this volume, the most salient parts of cognitive schemas. Further exposure to English enabled the students to adopt the more effortful analytic processing mode, which implied rejection of the distinctive encoding of salient perceptual and memory-based stimuli by verbalizing action schemata at lower levels of memory structures, implying that neither the abstract high-level goals nor perceptually salient events were verbalized any longer. The more accurate processing mode constituted no longer a burden for the learners since their linguistic skills had improved; it contributed to the acquisition of more subtle linguistic features. The development suggests that the hierarchical memory structure is reflected in a primitive language noticeably, while its influence remains more covert on native speakers' language, where superordinate goals extracted by the brain in communication are stored in memory but are rarely verbalized. My conclusion was that they steer our way of thinking and influence communication implicitly and that they are of particular interest from a cognitive and a linguistic point of view.
2
Hanna Pishwa
Therefore, the assumption that the properties of language (the medium) are adapted to those of the container (the brain) and the content (knowledge stored in the brain) at least to a certain degree is not unrealistic, since memory is one of the primary sources for information employed in communication. The fact that linguistic and conceptual structure are stored in memory strengthens the assumption of feature sharing. Studies on iconicity have revealed a high correspondence between these two levels in many respects (Haiman 1980; Simone 1995; Fischer and Nänny 2001). Surely, we will not find an exact copy of the cognitive representation in language because language is poorer than our knowledge. This is exactly the reason why a comparison of these two levels should be a fruitful enterprise. Although cognitive linguistics has largely neglected the influence of the knowledge structures on language, it has provided enough evidence for similarities between perceptual aspects and numerous linguistic structures, that is, our creative and manipulative ability to view states and things from various perspectives and verbalize them accordingly. In addition to perceptual aspects, metaphorical properties of language have been a popular object of comparison between language and mental structures; metaphor is an instance par excellence of one of our cognitive processing modes, namely search for similarities, without which the whole cognitive system would break down. This is particularly true of patterns, which our mind extracts subconsciously in order to organize knowledge representations for easy retrieval; the function of knowledge chunking is not space saving, which is not necessary because the brain has a huge storage capacity. An example of pattern matching at an abstract level is the recognition of the similar structure in "Romeo and Juliet" and "West-Side Story." Even categorization, which serves as a link between perception and memory with the purpose of reducing the quantity of information on the basis of similarity and contiguity, just like metaphor and metonymy, bundling it, and making it available for easy retrieval, is an established phenomenon in linguistics. Language has been found to reflect categorization at the conceptual level in hierarchical structures, observable in the characteristics of the symbolization of the single levels, for instance, with basic level items carrying the least linguistic material because of their frequent use, while subordinate categories containing richer information are supplied with more linguistic substance. Studies have also shown that most linguistic structures are organized in categories around a prototype, the core of a category, similar to mental categories. A further cognitive aspect considered in cognitive linguistics is mental spaces, which are shown to be involved with knowledge representations. However, the attention in this approach has been focused on creative
Memory and language: Introduction
3
aspects rather than on memory structures. Recent studies on discourse have started searching for cognitive explanations as well (Sanders, Schilperoord, and Spooren 2001; van Dijk 2002; Virtanen 2004). The goal of this volume is to complement cognitive linguistics by taking into consideration aspects of memory as linguistic explanation, which in turn may contribute to a development of more accurate cognitive approaches. The term 'memory' is used here to refer to the content and properties of memory as well as its processes. Since these aspects imply a large range of issues, the topics of this chapter are restricted mainly to those touched upon in the contributions; a short description of the structure of memory and the type of representations stored there is provided. While most contributions are concerned with the relation between language and (activated) stored knowledge, some, particularly those by psychologists, also address the influence of the use of certain linguistic structures on the encoding (storage) of information in memory as well as latencies in recall initiated by particular linguistic structures. Findings in this volume should lead to a deeper understanding of language and its role in communication, however, excluding issues of relativity. A desirable side-effect is that pragmatic justifications, such as implicature, which classify phenomena without further elaboration, can be replaced by more informative cognitive explanations. As already pointed out, this volume is meant to expand the field of cognitive linguistics. A major concern of the volume is to show in what ways linguistic structures are related to representations stored in various parts of memory with regard to their content and composition. An issue closely connected to this are the principles and processes responsible for the organization of information. These aspects in turn are determined by the overall structure and properties of memory, for instance, the global economy brought about by the flexibility of the brain enabled by the utilization of varying degrees of generality and specificity, similarity and contiguity in order to cope with the huge amount of information. Although the contributions are interdisciplinary originating from linguists and psychologists with different focuses, the starting point is always linguistic structure. The contributions are allocated to three different sections. Despite the different approaches that the disciplines take to the topics, the chapters complement each other in a coherent way. The structure of the volume is adapted to linguistic aspects following the degree of their relative fixedness so that the first part of the volume concentrates on the basic structure of language concerning issues such as rules vs. structures, that is, storage and application of linguistic - in most chapters syntactic - structures. The contributions in the second part,
4
Hanna Pishwa
in opposite, view single linguistic items, most of which lack a fixed meaning, for instance, discourse markers and evaluative devices, and the relation of their functions to aspects of memory. The third part presents various aspects of discourse with regard to memory. Since the structure of the volume is organized according to linguistic principles, aspects of memory have to be presented in this part. Therefore, this chapter provides basic ideas about memory and introduces the issues to be pursued in the chapters, which are: -
-
Type of memory; the information stored there, and the cognitive effort involved in recall Organization of information in various memories according to its generality, specificity and abstractness, concreteness in structured representations; contiguity, similarity Principles of chunking information: hierarchy, prototypicality Economy and flexibility of memory
In order to be able to discover relations between language and memory, we have to know how memory is structured, what is stored there and how, and which properties it has that could possibly be found in language and its use. Therefore, the introduction provides a minor overview of memory types, their structure and properties. Many features of memory have to be omitted here, for instance, neurological aspects, some of which are dealt with by Multhaup and Schulze (this volume). I would like to note that there are numerous approaches to the structure and functions of memory; here I will be addressing the most common models that can account for the linguistic findings.
2. Structure of memory Memory cannot be understood without knowledge of the type of information stored there and its functions, which are considered to be the basis for the distinction of memory systems or their components. The very earliest modern approaches (Foster and Jelicic 1999) to multiple memory systems are from the beginning of the 19th century initiated by Maine de Biran, philosopher, Francois Joseph Gall, and von Feinaigle (Schachter and Tulving 1994; Tulving 1983). More recent models started by making a division between short- and long-term memory by Atkinson and Shiffrin (1968); later, working and long-term memory were separated by Schiffrin
Memory and language: Introduction
5
and Schneider and Klatzky. This was followed by Paivio's distinction of verbal from nonverbal memory (1971), which introduced images into memory research. At the same time, Tulving (1972, 1983) presented his division of long-term memory into episodic and semantic memory. In the eighties, further distinctions followed: procedural and declarative memory, already recognized by Biran, and implicit and explicit memory (Ashcraft 1989; Baddeley 1999; Engelkamp and Zimmer 1994; Neath and Surprenant 2003). These distinctions will be discussed in the context of the respective memory systems: short-term memory and long-term memory. The starting point is the structure of memory followed by the types of representations that information is stored in. It should also be mentioned that psychologists are divided into two camps with regard to the architecture of memory: those who divide it structurally focusing on the systems memory contains and those who conceptualize it according to its function, i.e., processes (Foster and Jelicic 1999). For linguistic considerations in this volume, the former seems to be a more fruitful enterprise. Information intake through perception, which is the first stage of information processing and takes place in the so called sensory memory, is limited through restricted attention. The task of perception is to start the processing and categorization of the inputs of visual and auditory data in the respective iconic (visual) and echoic (auditory) memory (Neath and Surprenant 2003: 21-42). Attention functions as a "bottle neck" delimiting the amount of data to be processed further and therefore determines what will be encoded in memory. It may be based on a primitive instinct of self preservation with a kind of alarming function to detect changes in the environment ("conscious involuntary direction of attention"). Attention also enables us to select certain information on the basis of previous unconscious analyses ("unconscious selective attention"). Attention is already firmly established in linguistics in terms of gestalt psychology (Koffka 2001; Langacker 1987; Talmy 2000; Ungerer and Schmid 1996). Numerous linguistic phenomena, such as clause structure and choice of tense and aspect categories have been traced back to the figure-ground constellation (Wallace 1982). This approach is already an indication of the flexibility of memory and language: the salient figure cannot be fixed in any way due to its context dependence, which simply means that one and the same entity is perceived differently salient in various contexts. The figure is first analyzed in a holistic manner and then by means of a feature analysis, a sequence followed by German learners of English (see above). This sequence is taken for granted not only by gestalt psychologists but also by other cognitive scientists. It is not certain whether the gestalt principles only influence
6
Hanna Pishwa
the selection of information or also the form of knowledge representations. But even unattended information may be processed and encoded (Eysenck 1984: 56; Glass and Holyoak 1986: 36). It is assumed to influence consciously processed information (Dixon 1981). Attention is not required by automatized routines stored in procedural memory (Cowan 1995). The subsequent step in the information processing chain is the shortterm or working memory, which is considered to be entirely separated from long-term memory by some psychologists, for example, Baddeley (1999: 18), who, however, concedes "a complex set of interacting subsystems". Working memory is also viewed as an activated long-term memory without any borderline between them. This system functions as a temporary storage, which can maintain about 5-7 items simultaneously for a few seconds, corresponding to the clause length. According to Ashcraft (1989: 53), "short-term memory is the memory buffer or register that holds current and recently attended information." There are numerous, slightly varying models of this subcomponent, which can be further subdivided (Neath and Surprenant 2003: 68). While short-term memory is of primary interest for psycholinguists and psychologists, whose object of investigation is the processing itself, the main focus of the present volume is, however, the activation or storage of representations in long-term memory, that is, the source and goal of knowledge intermediated in communication. However, some of the contributions by psychologists in this volume address processing in the working memory by examining the speed; the measurement of latencies tells us about the effort, which again frequently indicates the type of memory the information is connected to (see below). As already pointed out, long-term memory can be divided into procedural and declarative memory. Procedural memory is assumed to contain the 'how', and declarative memory, the 'what', a division which might look like a computer metaphor with a processor and data, however, without sharing many similarities with the computer. Procedural memory contains instructions of automatized abilities, such as walking, driving, and speaking; priming is a procedural skill as well (Tulving 1983: 110; Roediger, Weldon, and Bradford 1989: 15). Due to automatization, this is the fastest memory and, hence, the least costly and by the same token inflexible and rigid. Therefore, language skills rely only partly on procedural abilities. Although syntactic rules have been considered as procedural competence par excellence until recently, this view has started declining in favor of an assumption that they are stored and used as constructions with amalgamated elements (see Section 1). The contributions in this volume demonstrate that syntactic structures and rules form a continuum ranging from
Memory and language: Introduction
7
rigid procedural knowledge through constructions to structures without fixed functions. Declarative memory, which interacts with procedural memory, is composed of semantic and episodic memory. Semantic memory serves as a storage for world knowledge and the lexicon, while episodic memory contains personally experienced events (see Bublitz, Multhaup, Schulze), which are not stored in large networks as knowledge but are usually remembered as single items. Private experiences may become general world knowledge through a desemantisization process after repeated occurrences (see Multhaup), while non-repeated or not frequently enough repeated experiences remain in episodic memory, in particular, if not shared with others in the same cultural environment. In his chapter (6), Multhaup describes how information stored in semantic memory may become automatized, i.e., procedural. This can best be observed in the acquisition of language rules, which we first have knowledge of and which after enough exposure may become automatized, i.e., procedural, through an intermediary, associative stage; this stage presupposes an exhaustive analysis of the structure. Another instance of this is grammaticalization involving "semantic bleaching" of words reducing context-sensitivity. Information from episodic memory cannot be automatized. The three types of memory differ in the degree of effort involved in recall: automatized processes do not require any cognitive energy at all and can therefore be considered implicit, semantic knowledge to a certain degree, and remembering one's experiences is the most costly. The contributions in the volume show that this is of particular relevance in communication because of the limited capacity of working memory. Although semantic and episodic memory are mentioned and described in several contributions in this volume, I would still like to present the most recent status of research on episodic memory because it has revealed new details of high relevance for the volume. Some of the differences between the two memories are shown below as characterized by Tulving in the early days (1983: 35): Episodic memory - Sensation - Events; episodes - Personal belief - More important for affect - Inferences limited - Context dependency high
Semantic memory Comprehension Facts; ideas, concepts Social agreement Less important Rich inferences enabled Low contextual dependency
8
Hanna Pishwa
Access deliberate Recall more effortful
Automatic Retrieval effortless
Recent research shows, however, that the distinction between semantic and episodic memory is not as neat as suggested above, although many properties discovered by Tulving are still assumed to be valid. I will address a recent, more elaborate view of episodic memory as proposed by Conway (2002). In this account, episodic memory may be split into two temporally differing systems, the short-term sensory-perceptual episodic memory (hence "episodic memory") and the longer-term autobiographical memory. The primary function of these two memories is considered to be the storage of everything related to personal goals and their achievement. Episodic memory is involved with the attainment, modification, and abandonment of goals; it cannot, however, cope with more complex goals, which are stored in the autobiographical memory, where they are 'framed' with their attitudes and beliefs. Episodic memory maintains "highly detailed perceptual knowledge of recent experience" (Conway 2002: 53) for a short duration sustaining the temporal order of events. When recalled, such memories are experienced recollectively, that is, with images and feelings. Pishwa (in print) shows that most functions of the English present perfect reflect exactly this memory system due to its relevance for results. They can be transferred to the autobiographical memory, where they supply specific information. It is worth mentioning that episodic events are stored in a different part of the brain than autobiographical memories (Conway 2002: 54). The function of the autobiographical memory in this account is grounding the self in terms of goals, toward which everything is geared (see Pishwa); it stores information on the attainability of goals. This memory contains three kinds of knowledge: (1) lifetime periods, which cover the most extensive knowledge structures and contain knowledge of others, events, feelings, and evaluations concerning the whole period. They are assumed to be stored as abstract mental models (see below); (2) general events, which contain the optimal amount of information being comparable to basic level items. Their retrieval requires the least effort of all autobiographical knowledge, though more than that of knowledge stored in semantic memory. General events can be used as cues to access life-time periods or episodic memories; (3) episodic memories, which evoke recollective experiences with differences in the vividness of different memories (Anderson and Conway 1997: 219). Conway (2002: 67) describes them as "small 'packets' of experience derived from conscious states that remain
Memory and language: Introduction
9
intimately connected to consciousness." The information contained in the autobiographical memory may, therefore, be both specific and general, for example someone's habits. When recalling this kind of information, we often find false starts, redundant information, and retrieval blockages (Anderson and Conway 1997: 219), which attests the difficulty of retrieval. Episodic and autobiographical memory can then be considered to have a less fixed organization than semantic memory with the consequence that connections between the nodes are weaker, which supports the assumption that their recall is bound with more effort than that of semantic information. Autobiographical information is also more errorprone than semantic knowledge and shows a tendency to be subjective. A further difference between these two kinds of information is that the former can only be verbalized by the experiencer himself correctly, while a person conveying others' experiences has to keep the source of experience in mind and may have to rely on inferences (see Mushin, Pishwa). On the contrary, world knowledge can be conveyed by everyone without paying attention to the source. This means that evidentiality and epistemicity are primarily concerned with autobiographical memory. Consideration of the distinction between the three declarative memories is of high relevance for linguists, a claim that many contributions in this volume supply evidence for. For instance, the economy involved with the use of metaphor (see Schulze, Kreuz and Ashley) becomes apparent against this background: rapid activation of semantic knowledge instead of slow retrieval of autobiographical information. It is also shown that linguistic structures denoting subjectivity relate to the latter memory type.
3. Knowledge representations Information in memory is structured and chunked in order to be easy to recall on the one hand. On the other hand, clustering brings about the flexibility required for communication. Principles of information chunking can be matched with processes such as association, specification, generalization, and search for similarities. While associative processes serve to expand representations in the horizontal direction by connecting information that we perceive to belong together through our experience of the world, specification and generalization lead to the construction of hierarchies in order to allow the choice of the right degree of specificity on every occasion: levels on top of the hierarchy are highly generalized; down the vertical axis, decline, specificity increases. Similarity - or analogy - serves to
10
Hanna Pishwa
create larger representations by matching different clusters into one, such as concepts or metaphors. The processes mentioned give rise to different knowledge representations: the form of representations in which information is stored varies largely in dependence on the stimulus and the goal of activation. Since Markman (1999) offers the most extensive treatment of the various kinds of representations, the following account will mainly relate to this source. Among the components of mental representations we find spatial representations and features. Spatial representations do not only refer to points that fix locations in a space, but also to distances between points in a space, and dimensions for directions in a space (Markman 1999: 28). In addition to concrete locality represented by prepositions and cases in languages, distances may represent psychological similarity and perceived distance or preference. An example of this would be people's representation of concepts in categories, whereby distance between points representing members in a mental space measures relatedness of the members within a category. While spatial representations are concerned with distances, features show similarity and difference between objects. In this sense, the featural representation belongs to relatively primitive representations in that the relations between the features remain unspecified. Features are not an artificial theoretical construct by linguists or psychologists, but have a neurological basis in visual systems (Markman 1999: 63). Those operating in visual systems always have fuzzy boundaries with the advantage that if several neurons process similar, fuzzy features, the result is more accurate than in a case when only processed by one neuron. However, features are usually discrete and, hence, identifiable as separate entities as in phonology. Features may be additive, which means that any features can be added without considering the other features. Or they may be substitutive, which means that one feature is not compatible with another, for instance, red and green, which are not simultaneously applicable to one single item. Category representations, both prototypes and exemplars as well as networks (see below), can be analyzed in terms of features. However, prototypes are not based on features in the case of holistically formed exemplars. Information tends to be chunked in memory, as already pointed out. Networks are the most common type of such chunks based on sharing of similar features and relations. Although most linguists are familiar with this phenomenon, I would still like to discuss this topic, since linguists employing this framework have omitted a number of interesting details. In early semantic networks, nodes (concepts) were linked with each other, with both nodes and links being labeled. Inheritance from upper to lower
Memory and language: Introduction
11
levels was assumed to exist for space saving in the brain. Distance between the nodes was also equaled with difficulty of processing. None of these proved to be correct: The brain offers enough space for multiple storage; in some cases, a node at a longer distance may be easier, i.e., faster, to process than one at a closer distance, for example, dog is easier to verify as animal than as mammal, although animal is the highest node and further away from dog than mammal. Later experiments by Collins and Quillian made clear that differently weighted links cause differences in the degree of activation of nodes. Further work by Collins and Loftus showed connectionist effects in the networks in spreading activation, whereby the number of nodes connected to a node was found to be significant for the efficiency of processing ("fan effect"): the more nodes are connected to a certain node, the longer the processing takes. This is probably the reason why the highest abstract nodes are rarely verbalized; they maintain links to most of the nodes of a schema. This effect has been found particularly strong in recall tasks. In sentence structure, for example, a higher fan effect was found for subjects and verbs than for objects. Priming effects were already recognized in the early models. The early models were used to make findings about automatic text comprehension, which was not particularly successful because texts are not only based on existing schemas but also provide new information. Semantic networks are well suited for the examination of concepts. They contain members of a category as well as features of the members, which may be shared. But they can also be used to model person perception when complemented with connectionistic models, which generate dispositional inferences automatically in explaining past behavior and predicting behavior in the future. In other words, they create expectations. Later developments of networks have added procedural components to the static structures and can better explain our ability to connect various kinds of information, for instance, semantic knowledge with procedural skills. Network models show a higher complexity in that the new models contain relations between elements in a situation and allow combinations of simpler items. The assumption of different strengths of links between the nodes is meaningful to understand the growth of syntagmatic structures, such as grammatical constructions (see part 1). A spreading activation model enhanced by production rules may become highly complex and can be used to explain rule-governed and goal-directed processes (see Chipere). Structured representations may become powerful enough to include higherorder relations, such as causal relations and implications, and create coherence (see Bublitz). It is also worth mentioning that some psychologists
12
Hanna Pishwa
assume the existence of dynamic schemata, so called 'action systems', containing procedural information (Mandler 1985: 44). They are cognitively economical because they run subconsciously without costing cognitive energy, but also create negative effects such as interference in second language learning, because they are difficult to switch off. So the issue is what kind of information should be considered procedural. Based on semantic networks are "parallel constraint satisfaction models", which are connectionistic and exhibit a higher degree of dynamicity, modeled after neuroscience. In these, several constraints operate simultaneously. For instance, in making dispositional inferences of other people, people may make use of causal beliefs, inferences, and observed facts. The model implies that dispositional inferences are made automatically. Mental images are structures similar to other mental representations, that is, they are not stored pictures. They differ from the latter in being eligible for transformations, which are, however, restricted in so far as some parts must remain stable. These, combined with gestalt theory, are what has been the main concern of cognitive grammar: manipulation of information through the choice of perspective. Markman (1999: 190) emphasizes the similarity in function between visual structures and mental representations of other types. Despite similarities in structuring, perceptual information differs from conceptual knowledge. They complement each other: they are found at all levels of abstraction together, but deliver different types of information. Therefore, concepts do not only consist of abstract knowledge about properties and functions of concrete objects, but also of their perceptual properties. Mental models are theoretical constructs (Johnson-Laird 1983) and moderately accurate representations of situations we experience, simply "memories of things that happened in the world" (Garnham 1997: 151). They are not only stored in autobiographical long-term memory; they also participate in the processes taking place in working memory and episodic memory as representations of the real or imaginary world as well as in high-level thinking processes. Their property of being manipulable makes them useful in the explanation of various communicative forms. Combined with world knowledge they are used in creation and comprehension of texts and are, hence, employed in language processing. Since they comprise numerous cognitive processes and representations, they are suitable for various kinds of interpretations (see Kaup). Scripts or schemas (even called 'frames') are ripped-off mental models in that they only contain the essential parts of event chains, with the details being omitted. They are more complex than simple objects stored in cate-
Memory and language: Introduction
13
gories since they imply the participation of objects and usually human beings as actors. Schemas, which are based on networks with spreading activation, are hierarchical structures with goal and outcome found at every level of generality; at the highest level of the hierarchy we find the primary goal that maintains connections to all parts of the whole structure (see Pishwa). The goal node is connected to most other, lower nodes and can be claimed to function similar to the prototype in categories in this respect, however, differing in that goals are usually abstracted by the brain without being verbalized, while the prototype is the most frequently used member of a category. Schemas grow internally through integration of information and externally through elaboration, i.e., adding of further schemas, which frequently causes reorganization. This already suggests that they are not exact copies of reality but novel organizations created by the brain. Schemas create expectations about events in the world; they are involved in all kinds of communication, they even steer our lives. The description of all kinds of networks suggests that they are not passive storages, but "dynamic structures that store and organize past experiences and guide subsequent perception and experience by catching up environmental regularities" (Mandler 1985: 36). They are abstract representations and contribute to cognitive economy by the same token similar to metaphor and metonymy (cf. Schulze, this volume). Their creation involves integration through connection of single elements with each other into a small schema, and elaboration to enable the combination of several schemata with each other into large networks (Mandler 1985: 68). Networks are, of course, utilized in linguistics, particularly in semantics, for the description of relatedness of items and are not the central topic of any of the chapters. Thus, the description of knowledge representations serves as evidence for the organization ability of the brain, which is not restricted to these but can be observed in language and its use with similar principles operating there. 4. Hierarchical structure of knowledge representations There are many frameworks that suggest a hierarchical structure of memory and knowledge representations (Bolles 1988; Graesser and Clark 1985; Wyer and Srull 1989). These are of interest for the understanding of the storage of autobiographical memories because their content is slightly different from that of semantic memory. An attractive memory model is the dynamic memory model put forward by Schank (1982). In this structure,
14
Hanna Pishwa
the smallest units are scripts, which contain specific, idiosyncratic experiences, similar to episodic memories (see above). Scenes are more abstract than scripts and match event chains with shared goals. The level above these exhibits MOPs (memory organization packages), which are composed of scenes with a common goal and are domain specific, for instance, "flying", which contains various scenes. Schank provides a further level, TOPs (thematic organization points), with an even higher abstract analogical structure. An example would be finding of similarity between "Romeo and Juliet" and "West Side Story" mentioned above. TOPs are also involved with goals and plans applicable across various domains due to the high abstractness. The information in them is not split as in semantic networks but represented in chunks. What is of interest for this volume is that MOPs and TOPs are structures abstracted by the brain itself, which searches for similarities, of which metaphors are a further example (see Schulze). Memory structures are always organized according to a goal in this model. Goals are marked by indexing, which indicates the main feature of the memory structure abstracted by the brain. All of these structures may be personal; they are always goal-directed. Clearly, the concept behind the model is generality vs. specificity of information. This concept was utilized in the development of CYRUS (Kolodner 1983), a computer model with a hierarchical structure. It is assumed that autobiographical knowledge is stored in such a hierarchical structure instead of schemas, scripts, or categories (Anderson and Conway 1997). Anderson and Conway set out to test the combined Schank-Kolodner memory model on episodic and autobiographical memory. Their finding was that autobiographical memories are organized according to an abstract personal history, one level of which consists of lifetime periods (see above), called A-MOPs. Below this level, we find general events, E-MOPs. While CYRUS put an emphasis on events, Anderson and Conway found that actors were the most important elements, followed by locations, and temporal information. Barsalou conducted a similar test and found that such a hierarchy contains summarized but not specific events; participant cues were the fastest, followed by location, time, and activity cues. However, temporal order of events was found to play a significant role in the organization of autobiographical memories, in particular at lower levels. A clear difference to the dynamic memory model is its focus on contexts and actions, which were not relevant in the experiments concerning autobiographical memory. Anderson and Conway conclude that this knowledge is "accessed by a complex retrieval process modulated by central control processes. The retrieval process is cyclic driven by a mental model of cur-
Memory and language: Introduction
15
rent task demands" (1997: 242). The first step consists of the elaboration of the cue, which is followed by access to knowledge structures, starting at a general level. The model the authors create contains features of the dynamic model: knowledge structures, indexing, spreading activations. These aspects, which are discussed below, can be traced in language and communication at least at three different levels, which are the basis for the structure of the volume. The first and the most debated is established linguistic structure, in particular syntax. The main issues here are: What processes are responsible for linguistic structure? In which memory is, for instance, syntax stored? How economical is language? Until recently, it was taken for granted that grammatical rules are stored in procedural memory. However, with the emergence of cognitive grammar and construction grammar, which have proved grammar to consist of entrenched cognitive structures, this view has been revised. Construction grammar started with the "uncomfortable" part of grammar, namely idiomatic expressions, which were assumed to be learned by heart as part of the lexicon due to lack of systematicity; the finding was that these are part of grammar. The grammatical theories just mentioned reject the assumption of a sharp boundary between the lexicon and syntax and take a continuum for granted with no borderline between them. This is shown to be enabled by the flexibility and the economy of memory. The second part presents a new perspective on some of those linguistic phenomena which are usually thrown into the huge bin of pragmatics with a highly varying content. These are structures without a firmly established form-function relation, such as expressions for evaluation or discourse markers. A pragmatic explanation of the functions of such a structure does not add much to our knowledge of that very structure: they are supposed to invoke implicatures, which are required for a correct interpretation of the structure in a particular context. Contributions in the second part of this volume demonstrate that it is possible to provide proper explanations even for a strange behavior of certain structures by considering properties and functions of memory. In this part, the distinction of submemories according to the kind of content becomes critical. The third part of the volume discusses larger contents: textual and pragmatic aspects, such as coherence and interaction. Despite the different linguistic perspective, the findings are similar to the previous parts concerning the type of memory and language, the hierarchical structure of memory, and the tendency of the brain to recognize the abstract structure in various types of text.
16
Hanna Pishwa
5. The chapters 5.1. Linguistic structure and memory In this part, the emphasis is on the relation of established linguistic, in particular syntactic, structure and submemories and their interaction. The topic shared by all contributors is clearly the issue of the form of storage of linguistic structures: structure vs. rule, which correspond to syntagmatic vs. paradigmatic organization. This issue is, of course, connected to the type of submemory as well. Rule application requires the participation of procedural memory, while the employment of established structures is involved with the activation of memory chunks stored in semantic memory. The overall finding of the contributions is that syntax, for instance, cannot be judged to exhibit only one kind of processing; instead, its processing is involved with a whole continuum ranging from strict procedural skills to semantic knowledge. This is in line with work done by syntacticians in the last few years, which has revealed an increasing agreement on the assumption that a large part of established language, for example, syntactic structures, is stored in declarative memory and is not generated by rule application. This discovery has resulted in Construction Grammar (Croft 2001; Goldberg 1995; Kay and Fillmore 1999). A compromise adapted to the properties of the structures is the most economical solution: procedural knowledge is too rigid to cover all syntactic structures, and the employment of semantic knowledge cannot cope with the high regularity of some structures. While Chipere's chapter is concerned with the syntagmatic and paradigmatic aspects of language in general, Webelhuth et al. as well as Deane demonstrate the role of these two processes in syntactic structure and its acquisition by means of experiments. Schulze views figurative language from the perspective of syntagmatic and paradigmatic processes. Finally, Multhaup provides a comprehensive illustration of language acquisition integrating these in this account. On the whole, the contributions provide a moderate view of syntax and linguistic structure excluding extreme ideas. They suggest that linguistic skills cannot be described in terms of one single memory type but should be considered in terms network models or action systems containing both organized semantic and procedural knowledge. The case of figurative language shows a joint utilization of episodic and semantic memory. Chipere is concerned with the role of memory for linguistic structure; the question presented is whether linguistic structure as well as its acquisition can be explained in terms of a dual system consisting of rules and associations. From the point of view of memory, rules are stored as such
Memory and language: Introduction
17
and applied on each relevant occasion, an assumption found in generativist theories of language, whereas structures that cannot be split are stored as chunks and activated as associations when required. While the application of rules shows a high degree of creativity when acquired and effortless application after their automatization, the latter taxes memory constantly, with the degree depending on the entrenchment of the structure. Numerous linguistic models assume these two strategies to be enough to explain linguistic behavior claiming that systematic structures are rule-governed, while unsystematic, opaque structures are considered associations. Chipere shows, however, that such a dual model is not powerful enough because of "the assumption of mutual incompatibility between rules and associations." He provides a historical overview to illustrate this starting with views held by the classic "linguists", Plato and Aristotle, representatives of rationalism viz. empirism. He shows then the problems encountered with this division when testing subjects' syntactic knowledge. The results make clear that even rule-governed linguistic forms can be processed as associations, that is, they may be stored in declarative memory. A further challenge to the dual nature of language is shown to lie in the processing of structures by individuals: Chipere makes the finding that one single construction may be processed as a rule or as an association by different individuals. This is true, in particular, of language learners: children may process one area of language holistically, another analytically. The appropriate framework to cope with this problem is constructivism as presented by Piaget and de Saussure due to its ability to integrate both kinds of processing, however, without being restricted to these and yet providing a proper frame for individual differences. In this framework, language is considered a network consisting of paradigmatic and syntagmatic links. This is viewed as parallel to the graph theory applicable to other sciences as well. The most important finding concerning the influence of memory on language structure made by Chipere is corroborated in the following chapter by Webelhuth, Wendlandt, Gerbl and Walkow, who examine the storage and processing of four different types of inversion in English. The main finding is that "the truth lies somewhere in the middle," which means that the construction is neither entirely rule-governed nor memory-based despite the fact that inversion is assumed to be a rule-based phenomenon, a transformation in some theories, a template in others such as Role and Reference Grammar. The authors investigate the universal, language-particular, and construction-particular properties of inversions within a purely syntactic framework (Principles and Parameters), using a statistical procedure in terms of optimality theory. The result is that not even syntactic structures
18
Hanna Pishwa
can be explained to function only on the basis of rules; instead, they are dependent on multiple factors, such as the presence of an identifying or existential subject and its definiteness, that is, representations in working memory, as well as the frequency of the co-occurrence of the elements, that is, syntagmatic chains stored in semantic memory. The investigation yields a continuum with varying degrees of definiteness as constraints for the four different structures examined, and basically confirms the results obtained in the psycholinguistic experiments conducted by Chipere. Since definiteness is concerned with the appraisal of the activation and existence of referents in the addressee's mind, we can claim that even the constraints for the use of an inversion are involved with various memory structures and submemories (see Bublitz). In the same vein, the chapter by Deane investigates grammatical constructions in terms of lexical co-occurrences applying statistical methods on corpus data with the aim of discovering how learners process constructions and store them. An assumption of innate linguistic knowledge, that is, involvement of procedural memory, is rejected on the basis that "a significant portion of the correlation between co-occurrence and semantics is mediated by the existence of constructions". Hence, Deane opts for inductive associative learning of syntactic structures. This raises the question of what mediates the relationship between semantics and syntax and raw occurrence data. According to Deane, the answer lies in a construct consisting of the paradigmatic class and the syntagmatic structure, which emphasizes the "mental reality of linguistic structures." Constructions, which are the result of the interplay of paradigmatic and syntagmatic processes (see Chipere) can be treated like words because of their basic status in Construction Grammar (see also Kay and Fillmore 1999); all constructions are assigned a specific meaning. An analysis of constructions requires, then, according to the study, specifications of the grammatical patterns and the word classes of the elements, correlation between cooccurrence and semantic similarity. It is hypothesized that all that is required for a successful acquisition of the ditransitive construction, for example, is the statistical ability (Zipfs first law) to induce the generalization that such structures with various sequences belong to a common category. This outcome shows in reality how semantic knowledge becomes procedural (see Multhaup). The chapter by Schulze examines the processing of metaphor and metonymy, with an emphasis on the economy and the flexibility of cognition. By doing this, he provides a detailed description of various kinds of information chunking in mental representations, which implies both an interaction of various submemories and an involvement of different processes.
Memory and language: Introduction
19
Figurative language is shown to reduce cognitive complexity in that only the most salient parts of a structure are stored and activated, a view opposite to common belief that they add complexity in being stylistic devices (see also Kreuz and Ashley). The difference between metaphorical and schematic structure, both containing abstract information, is elucidated: While the latter originates in the similarity of attributes and can be activated partially, metaphors are based on analogical similarity, that is, on similarity of relations and could, hence, be taken to be instances of MOPs considered within Schank's memory model. The principle behind the processes of metaphorization and metonymization is shown to be cognitive economy as conceived in relevance theory: maximal benefit at minimal cost by employing analogical instead of analytical processing. While analogies are automatic and effortless, analytic processing is slow and effortful. Figurative language represents optimization of resources due to the abstract and dense information package and is, hence, easy to process. The description of the cognitive apparatus underlying metaphor and metonymy in language is underpinned by a neurological model to provide background for the way contiguity leads to semantic networks containing various kinds of relationships, to metonymy, and finally to analogy and metaphor. In these processes, semantic and episodic memory can be considered to be jointly involved in that metaphoric and metonymic thinking arise "from recurring patterns of embodied experience." Therefore, the function of figurative language can be considered to facilitate the understanding of certain concepts. The starting point of metaphor and metonymy is shown to lie in paradigmatic and syntagmatic processes similar to grammar. The origin and the function of figurative language show a tight relation between cognition and language: language is a tool for the management of attention and is therefore also flexible and inaccurate, though accurate enough to allow efficient communication, for instance, through the development of tools like metaphor and metonymy. Multhaup, who views information processing in language acquisition by comparing three types of theories (behavioristic, innatist, and constructivist), arrives at the same conclusion as the previous contributors -already indicated by the title of his chapter - "ends meet." Right at the beginning, it is made clear that "declarative and procedural knowledge are not stored in separate compartments but in a way that makes (physical) forms have (cognitive) functions." Initial chunk learning is shown to take place in declarative memory in order to be utilized for communicative purposes in procedural memory. It is emphasized, however, that declarative knowledge does not turn into procedural knowledge only by frequent activation but has to
20
Hanna Pishwa
undergo a cognitive analysis first. In the same vein, the author argues that episodic and semantic memory cannot be kept apart: Certain episodic information is transferred into semantic memory through decontextualization processes (see Bublitz). It is argued that no extreme theory could be correct because our cognitive processing is guided by pattern-finding abilities (see Schulze), which is accompanied by "intention-reading skills" in communication and presupposes an ability to create and modify existing knowledge structures. A broad discussion including various cognitive aspects of relevance for topics of language learning terms "declarative" and "procedural knowledge", "explicit" and "implicit knowledge," is underpinned by findings from neuroscience. The chapter shows that the investigation of language acquisition is an enterprise particularly fruitful for discovering properties of language processing and structure in general as well as those of single standard languages, as memory takes a multitasking role in language acquisition.
5.2. Select linguistic notions and memory A significant finding provided by contributions in this part is that the use of less well entrenched structures causes a more thorough mental processing and along with it, a higher degree of subjectivity and evaluative force than structures with fixed functions; such structures also tend to acquire interpersonal functions. They also tend to be involved with a higher processing load than those referring to clearly related memory structures and, hence, their use leads to a better memorability, in particular if involved with evaluation, than the use of structures with established functions. The structures examined in this part are mainly related to autobiographical memory. Mental processes involved in the decoding of their functions are those typical of creative cognition, characterized by inaccuracy and fuzziness (Finke, Ward, and Smith 1992), i.e., they share features with processes taking place in problem solving. The findings are ascertained in entirely different studies, such as investigation of evaluative language in newspapers (Bednarek), use of causal connectives in Greek (Kitis), various forms of 'try' (Pishwa), and negation (Kaup). This part also demonstrates that different types of memory sources yield different functions, thus strengthening findings in memory research. It is structures related to episodic/autobiographical memory that tend to exhibit subjective functions. The analyses also suggest that structures that involve memory only to a certain degree (Bednarek, Kitis, Pishwa) are used primarily subjectively,
Memory and language: Introduction
21
evaluatively, and intersubjectively, while those with an accurate reference to long-term memory, such as a causal connector in Greek (Kitis) reflecting stored causality, are only used ideationally. The investigation of negation (Kaup) shows in addition that negated information does not even leave traces on memory as such; it simply does not exist in memory. Its processing in working memory shows a preference for mental models, that is, holistic patterns, rather than for propositions. Bednarek investigates the role of memory in the use of evaluative language, a highly complex phenomenon due to its heterogeneous symbolization and multiple origins. Arguments concerning memory are based on frame (schema) theory; the seat of frames is not stated because of the multifariousness of evaluation. Cognition is assumed to be involved in an evaluative act in two different ways: an evaluation may be the result of a cognitive operation in communication (see Kaup), or an evaluation may already be attached to a memory structure. While the latter seems to be rather constant, online evaluation varies to a high degree: it may be a spontaneous act of evaluation, it may symbolize expectedness, particularly failed expectations, or it may be used for rhetorical-pragmatic purposes. The role of memory for (the verbalization of) evaluation can thus be considered to lie either in a direct activation of stored memory structures or in activated expectations brought about by an experience stored in frames. The expression of an evaluation may be "inscribed", i.e., explicitly verbalized, or "evoked", i.e., inferred by the addressee without an explicit expression. In the latter case, evaluation is brought about by a relation of appropriate frames to the content conveyed by a message in a certain context, which implies activation of knowledge representations along with additional processing; this would usually be called "implicature" or "invited inference" in pragmatics. Certain verbs such as 'admit' are considered as an intermediate solution, being neither a verbalized evaluation nor entirely unexpressed. 'Provoked' refers to an evaluation of an intermediate degree, which is found, for instance, when an evaluative parameter other than positive/negative provokes exactly this parameter in an utterance. Evaluative expressions found in an English newspaper corpus are classified in terms of a parameter framework consisting of ten parameters, which are shown to interact strongly with each other. The investigation shows that tabloids contain more evaluative expressions as well as clusters of parameters than do broadsheet papers, in which usually only one parameter is found in a single expression. The findings could be interpreted with regard to the participation of memory as follows: the less accurate an expression is the more knowledge representations are activated for the recognition of the
22
Hanna Pishwa
evaluation and the higher the degree of subjectivity and the evaluative force; this finding is confirmed in the other chapters in this part. Effort involved with this kind of evaluation is high, while activation of an evaluation stored in memory structures requires much less cognitive energy. The paper by Kitis provides evidence for the involvement of two different kinds of memory in the functions of Greek causal connectives, which correlate with the source of the information combined by the connective. One of them, epeiSi, is only used when the information in the subordinate clause is "a purely semantic or content conjunction"; it connects clauses that are stored together in everybody's mind and cannot be challenged. This suggests that the information chunk is retrieved from semantic memory. The function of the other two, yiati and Sioti, is rather a subjective statement involving episodic - or, in terms of Conway's memory model autobiographical - memory and online processing. These two connectives display the speaker's/writer's view on the relation between the clauses adding causality or a reason between pieces of information, which are not stored in this relation in memory (see Bednarek). yiati may also reflect shared information and, thus, behave like epeiöi. However, its primary function is found at a discourse level, where it indicates relevance relations between adjacent and remote clauses. It can be moved around freely in the sentence due to its being desemanticized (see Pishwa); when positioned at the end of a clause, yiati functions as a presuppositional marker. Kitis concludes that epeiöi is the connective at the ideational level, while the other two can also be used and on the textual and interactional level of language, where they imply a high degree of subjectivity. The cause for the different functions is found to lie in their etymology still discernible in the meaning: epeiöi reflects a temporal anteriority, which is a presupposition for causality established in memory. The root of Sioti {yiati being its low variety) is similar to that of because being derived from Ancient Greek 'by+cause'; the English because can be used subjectively as well in that the speaker provides her own opinion of the cause. It is quite obvious that languages need some desemanticized structures to cover functions not captured by others. Pishwa's analysis of 'try' relates to the hierarchy of knowledge representations on the one hand in that 'try' is related to goals, the highest, abstract nodes of schemas, representing a generalized structure. On the other hand, it deals with personal goals (schematic goals also frequently serve as personal goals) and, hence, with social cognition in that goals are social phenomena (Barone, Maddux, and Snyder 1997: 255). The analyses show that 'try' symbolizes a non-default goal; default goals are stored in seman-
Memory and language: Introduction
23
tic or autobiographical memory. This function lends it a high degree of abstractness and desemanticization allowing 'try' to acquire multiple functions (see Kitis), which can be explained by considering knowledge about the world and people. From a cognitive point of view, the default function of 'try' implies that the intender is uncertain about the outcome of the goal and, hence, that it is not stored in the speaker's memory. The analysis demonstrates a tight relation between the intender's certainty about the attainment of the goal, the speaker's knowledge of the intender's goal, and the authenticity of goals. Subjectivity and evaluative force increase along with decrease of the intender's certainty concerning the attainability of the goal, with a speculation about others' goals by a speaker, and with past goals, an unnatural constellation. This means that the function of 'try' is also dependent on the person category of the clause subject and the tense of 'try'. The less certain the intender is about attaining her goal, the more abstract its meaning and the higher the probability for added subjectivity. The results suggest, similar to yiati in Greek, that it is the desemanticized nature of structures that allows multifunctionality. With regard to memory, it means that linguistic structures without a firm reference in memory structures tend to be used subjectively; 'try' as a marker for a non-default goal is an instance of this category. The analysis of 'try' illustrates a further remarkable feature of memory, namely that people are able to retrieve accurate information of the attainability of goals instantly despite the fact that they have never encountered them before. The question arises of whether the whole memory is scanned in search for information about attainability or whether there is other kind of information. In fact, autobiographical memory is assumed to store knowledge of constraints for goals that are worth pursuing; 'worth pursuing' always refers to goals that are judged to be attainable. As mentioned above, social cognition assumes that we store rules or regularities in particular schemas about goals. This shows that an analysis of 'try' would be impossible without considering the structure of knowledge representations and social cognition, which provide proper explanations instead of classifications offered by pragmatics. The chapter by Kaup examines negation focusing on the form of the representation of the negated state in the addressee's working memory (proposition, referent, or image). This is an interesting issue because negated states do not refer to anything in the real world, and hence, are lacking both in the speaker's and the addressee's long-term memory. On the contrary, the non-negated state is assumed to be stored in the addressee's mind, which the speaker attempts to change by negating it. In
24
Hanna Pishwa
order to find out about the representations and the processing taking place in working memory, Kaup tests the processing of negation within three approaches to language comprehension. The first (propositional) model views the linguistic input in working memory as propositional, which requires an additional level of propositional encapsulation for negation. Accordingly, negation implies a higher complexity with regard to mental processing and a lower availability of the negated item than of a non-negated one. In the second approach, the situation-model theory, meaning representations are not propositions, but mental tokens representing the referents. This model has similarities with discourse-representation theory (DRT), in which negation applies to a subordinate discourse representation structure making the referents that are introduced in the scope of the negation operator less available than those introduced in affirmative phrases. In opposite, referents not introduced in a negated clause are not influenced by the negation. In the third model (experiential view), the representation of elements is holistic in being the same as that utilized in non-linguistic cognitive processes (perception, action, imagery). The availability of the negated element does not depend on the linguistic form but on the content of the described state of affairs. According to this, a negated clause is more complex than an affirmative one because it is processed in two stages by the comprehender: (1) a simulation of the negated state of affairs followed by (2) a simulation of the actual state of affairs. To summarize: the three accounts differ with respect to assumptions concerning representational issues in language comprehension and the processing and representation of negation. Before the description of the experiments, Kaup presents findings made in previous studies, which argue for a more difficult processing for negated clauses than for affirmative clauses due to a longer processing time and a higher error rate. The first experiment conducted by Kaup verifies these findings concerning the degree of difficulty. Further experiments also confirm the prediction that the processing time of negated sentences is longer than that of affirmative sentences; the three models agree on this finding. An additional finding restricts this generalization, however, showing that the processing time is dependent on the felicity condition of the negated sentence. When negation is tested with respect to the truth value of the clause, differences between the models arise concerning processing, whereas they agree with regard to the results: false affirmatives take a longer time than true affirmatives and false negatives take a shorter time to process than correct negatives, and on the whole, negatives take a longer time than affirmatives due to the creation of the additional representation
Memory and language: Introduction
25
for the negated state. These findings support the analyses in the above chapters concerning the claim that if a structure relates to stored information in memory its processing requires less mental effort than that of a structure with an opaque reference. Differences between the models arise in further tests. The experientialsimulations view is supported by the results of a test in which the delay between the affirmative and the negative is varied: a longer delay between a sentence and a recognition task decreases the response time. This favors the two-level processing assumed in the experiential view and disconfirms the assumption that comprehenders construct an explicit encoding for the negation in working memory. Next, the accessibility of negated information is tested by using verbs describing creating and destroying activities. While the type of activity does not matter to propositional theory and DRT, it does make a difference in the experiential-simulations view, which assumes a simulation of the actual state. This test also considers the finiteness of the object created/destroyed. After the presentation of a short narrative story, which also contains a negation, two probe words are presented. In destruction activities no difference is found in the processing of negated and nonnegated activities. In contrast, a difference is noticeable in creative activities, both with an indefinite and definite object; the latter is processed faster than the former. This finding, complemented by a further experiment employing the presence/absence of colors of an object in the test, is interpretable only within the experiential view. In the following experiments, so called "equivalence effects" are tested in various ways. The results again lend support to the experiential view showing that comprehenders create only the negated state first by creating an image of it if the delay between the sentence and the picture is short. After a longer delay between the clauses to be compared, the actual negative situation is processed faster than its affirmative counterpart. The final part of the chapter discusses the encoding of negated sentences in long-term memory, that is, their recall. Numerous studies referred to demonstrate a poor memory performance for negated sentences due to various types of errors that appear in dependence on the linguistic material. The errors can be explained correctly in terms of the three processing models. The overall results favor the view that we process whole patterns or images instead of creating syntagmatic chains (propositions) or focusing on some of their parts (DRT). Supporting evidence for the superiority of the experiential view as a language comprehension model is provided by studies investigating the availability of negated elements: the type of activity is
26
Hanna Pishwa
found to influence this, as predicted by the experiential view, while the other two approaches do not pay attention to semantic differences of the verbs. Further evidence for this is provided by the study on anaphor resolution connected with double negations. Kaup states that further studies are necessary to corroborate the findings, because it remains unclear in how far the results can be transferred onto other phenomena or whether they are restricted to negation. However, the chapters in the first section show a tendency toward patterning of linguistic elements in other linguistic areas as well.
5.3. Discoursal units and memory This part views larger linguistic units, that is, discourse, from various angles: coherence in general, epistemic aspects, politeness, and persuasive language. The contributions support the assumption (Schank 1982) that memory is not a static storage but a processor and that the function of language is only that of a trigger to activate parts of the vast bulk of information stored in memory. This becomes evident in the context-dependent flexibility of linguistic structures and the interaction of the two declarative memory types (Bublitz). It is also revealed by the ways the brain extracts and stores an abstract structure in all types of communication, which may be the speaker's awareness of the source of knowledge in story telling (Mushin), an unpronounced speech act and politeness (Holtgräves), or its perlocutionary force (Kreuz and Ashley). The contributions also bear out the assumption that utterances conveying new, unexpected information are paid more attention to than those carrying familiar information: for instance, the processing of irony demands more time than that of metaphors, obviously because metaphors are rarely new, while irony is frequently novel and has to be processed anew in every context (Kreuz and Ashley). Irony also implies evaluation, which is assumed to contribute to more thorough processing; messages that are processed more thoroughly are also more persuasive. The contribution by Bublitz relates coherence to various types of declarative memory. The intersubjective aspect of coherence creation is emphasised in that comprehension, an essential element of coherence, is defined as a collaborative undertaking despite the single minds working on the meanings. This is enabled by the activation of linguistic and world knowledge shared by the participants. The function of linguistic items is to serve as "interpretation triggers" for the activation of knowledge stored in
Memory and language: Introduction
27
memory in order to make sense of the whole. Accordingly, coherence, defined as a "collaborative and hermeneutic" phenomenon, does not exist in texts but is negotiated by the participants sharing these two kinds of knowledge; it is also found to be variable, approximate and scalar. Coherence is involved with common ground, which is explained as "those actually activated fragments of knowledge that are relevant to the ongoing process of understanding" instead of the usual "shared world and cultural knowledge". This demonstrates the contextual flexibility of knowledge: In order to understand the construction of coherence, we have to know what is meant by context, since utterances change their meaning from one context to another. Indeed, Bublitz defines context as volatile in that speakers "create current contexts for current utterances." This definition is followed by the presentation of the involvement of various types of memory (cf. Multhaup and Introduction), which are then discussed with regard to their contribution to coherence. Episodic memory containing private experiences, simply labeled as memory, can be made public, however, without being turned into common knowledge; episodic knowledge acquired by recipients is comparable to mental models according to the author, which is in line with the assumptions of the content of memory structures presented in the first part of this chapter. An example illuminates the big difference between semantic knowledge and events stored in episodic memory: the latter is loaded with subjectivity and interpersonal information and therefore promotes coherence, while semantic knowledge is impersonal, but can be manipulated in multifarious ways by relating categories and concepts to each other, by arranging them causally or temporally, and be used creatively in other ways to provide background knowledge. Bublitz shows, however, by means of an example that frames stored in semantic memory may also be involved with emotions, evaluations, and attitudes shared by the whole community. The final perspective of the chapter focuses on the joint contribution of these two memories to coherence concluding that both are equally effective with regard to coherence because episodic memory when verbalized goes public and becomes comparable to frames. Mushin emphasizes the dynamicity of memory by considering evaluative expressions found in retold narratives. Evaluation has a broader scope than generally assumed in that it involves the ways a story is encoded and reconstructed. This is because retelling is a constructive process by the memory, which extracts the relevant information from a story and recreates it when retelling, keeping track of the source of information and the degree of epistemicity. The chapter starts with the investigation of the perspective
28
Hanna Pishwa
- epistemological stance - taken by the reteller with regard to the source of information. There are three stances that the narrator may take when retelling. It is possible to use a personal experience epistemological stance, which means that the events are told as if the reteller had experienced them himself. The narrator may also use the reportive or imaginative stance. While the reportive stance attributes the story to someone else, the imaginative stance treats it as fictional without referring to the source of knowledge, although the narrator keeps it in her mind. The fnaterial investigated is a "Mouse soup" story, which is doubly retold. The choice made by the narrators was not the reportive epistemological stance but the imaginative stance, whereby the original story itself was not altered. In some passages, however, the narrators switched to the reportive mode; this was particularly the case when evaluations were verbalized. Mushin considers the preferred use of the imaginative stance "evidence for the constructive process of story telling" and explains it as the narrators' intention to present a tellable story. However, as the other chapters show, the constructive process is not restricted to narratives. The second part of the chapter views expressions for inferencing, such as must, which only appears in reportive stance. This took place in the experiment, for instance, if the narrator in the "first generation retelling" was not sure about her information and marked it explicitly; when this was retold by a "second generation" reteller, the epistemic stance was verbalized again. Mushin notes that inferences are by no means always explicitly marked. This can be explained by the fact that memory complements incoming information by adding schematic knowledge so that the narrator is not even conscious of whether the information was verbalized or inferred by her memory. The chapter by Holtgräves is also concerned with pragmatic issues. It investigates the comprehension and storage of speech acts and politeness and shows their dependence on social parameters. The first part investigates speech acts using a recognition memory procedure and a cued recall procedure. The experiment concerned with recognition shows us that speech acts are stored in memory even if unpronounced. Recall was better when an utterance "performed the speech act named by the recall cue, relative to the control condition." The recall task also showed individual differences in dependence on the participants' communicative style with a tendency to indirectness promoting recall for speech act cues. The conducted experiments show the importance of the illocutionary force in both comprehension and representation; an illocutionary force is recognized by communicators even if it is not verbalized directly. The second part of the
Memory and language: Introduction
29
paper investigates wording for politeness, which is interesting because it is well known that exact wording is only rarely remembered; the information is extracted and transformed into an abstract meaning. However, it is possible to remember wording of information with high interactional value, that is, of politeness. The memorability of a message depends on the status and closeness of the communicators, which Holtgräves investigates by means of experiments. The main finding is that memory is best for polite forms when they contradict expectations: when a person in a high position is polite or a person with a low status is impolite. Remembered information is of relevance for future conversations because the content influences the cognizer's evaluations of others. Therefore, the author concludes that conversation memory is "an important and interesting endeavor." Kreuz and Ashley view the persuasive force of non-literal language by testing their reading time and memorability relating these to the long-term representation of the persuasive message. Previous findings, which are partly contradictory, indicate that non-literal language does influence the persuasiveness in various ways, for instance, metaphors are claimed to exert a positive influence on persuasiveness. The same applies to rhetorical and tag questions, which increase the effect of argument strength, however, in dependence on the strength of the argument the question is attached to. Two different processing mechanisms are introduced to be tested: operant conditioning and cognitive response hypothesis. The first simply means in the context of, for instance, rhetorical questions that a language user is conditioned through the process of acculturation to answer questions. The findings support the view that rhetorical and tag questions do enforce persuasiveness. "Cognitive response" to these kinds of questions leads to differentiated outcomes concerning cognitive elaboration in dependence on the strength of the argument. The chapter describes experiments examining these two processing modes with added, more refined features to find out about their influence on persuasiveness. Kreuz and Ashley set out to investigate the influence of non-literal language on long-term memory, which is dependent on the route leading to persuasiveness: while the central route refers to the quality of a message's arguments and may lead to permanent attitude change, peripheral processing relates to other factors, for instance, the attractiveness of credibility of the communicator, and can result in temporary attitude change. The investigation addresses the valence of the test items because of the different processing of positive and negative information. The main experiment with forty participants was preceded by "norming studies" in order to elicit only ideal material consisting of idioms, which were replaced by metaphors, similes, hyperboles, understatements,
30
Hanna Pishwa
ironic statements, and rhetorical questions; also the understanding of metaphors was tested before the experiment. The findings were that positive statements in non-literal language required a longer reading time than negative statements, while the reverse was true of literal language (cf. Kaup). Furthermore, ironic statements were read more slowly than all other types and were remembered better than similes and understatements, and also a simile was read more slowly than a hyperbole or an understatement. The reason for the long processing time of ironic statements is assumed to lie in the "asymmetry of affect" effect, according to which speakers are more familiar with a positive formulation about a negative state than the other way round (canonical vs. non-canonical irony). The authors assume that the additional processing time due to the non-canonical cases, which shows that non-default constellations require more processing, which leads to better memorabilty, which again leads to a higher persuasiveness. The surprising result of the study was that metaphors and rhetorical questions were neither processed nor memorized differently from literal language, a finding that contradicts all previous research results except that provided by Schulze (this volume). 6. Conclusion I hope that the chapters will convince the reader of the existence of firm and multiple relations between language and memory. The contributions in this volume demonstrate not only that the structure of language reflects properties of memory and it processes but also that the function of single linguistic structures can be predicted to a certain degree on the basis of their relation to a certain part of memory. At least as important is the degree of the correspondence of linguistic structures and memory. These aspects guarantee the processability of language and ensure its suitability as an ideal communication device because both information and the language itself are stored in memory. If language did not share properties with its storage communication would be more cumbersome due to two different systems. Luckily, this is not so, as the chapters in this volume show. The first part shows a light shift in the view of the storage of fixed language structures in favor of associative patterns indicating that grammar is not necessarily composed of rules, which are the very last stage of automatization, but that it shows a highly flexible structure beginning in the lexicon. The second section demonstrates convincingly that linguistic structures with a loose relation to a particular part of memory tend to be
Memory and language: Introduction
31
used subjectively and intersubjectively. The last section illustrates the tendency of memory to impose structure onto all kinds of discourse and communication. These findings should be only the beginning in languagememory research, which is fruitful and should therefore be followed by further work.
Note * I am grateful to Friedrich Braun and Rainer Schulze for comments on an earlier version of this chapter.
References Anderson, Stephen, and Martin Conway 1997 Representations of autobiographical memories. In Cognitive Models of Memory, Martin Conway (ed.), 217-246. Hove: The Psychology Press. Ashcraft, Mark H. 1989 Human Memory and Cognition. Glenview, IL: Scott, Foresman and Company. Atkinson, R., and R. M. Shiffrin 1968 Human memory: A proposed system and its control processes. In The Psychology of Learning and Motivation: Advances in Research and Theory, Vol. 2, K. W. Spence, and J. T. Spence (eds.), 89-195. New York: Academic Press. Baddeley, Alan D. 1999 Essentials of Human memory. Hove: Psychology Press. 2002 The concept of episodic memory. In Episodic Memory: New Directions in Research, Alan Baddeley, John Aggleton, and Martin Conway (eds.), 1-10. Oxford: Oxford University Press. Barone, David F., James E. Maddux, and C.R. Snyder 1997 Social Cognitive Psychology: History and Current Domains. New York and London: Plenum Press Bolles, E.G. 1988 Remembering and Forgetting: Inquiries in the Nature of Memory. New York: Walker and Company. Bower, Gordon, and Mike Rinck 1994 Goals as generators of activation in narrative understanding. In Human Memory: A Multimodal Approach, Johannes Engelkamp, and Hubert D. Zimmer (eds.), 111-134. Seattle: Hogrefe/Huber.
32
Hanna Pishwa
Conway, Martin A. 2002 Sensory-perceptual episodic memory and its context: autobiographical memory. In Episodic Memory: New Directions in Research, Alan Baddeley, John Aggleton, and Martin Conway (eds.), 53-70. Oxford: Oxford University Press. Cowan, Nelson 1995 Attention and Memory: An Integrated Framework. Oxford: Oxford University Press. Croft, William 2001 Radical Construction Grammar: Syntactic Theory in Typological Perspective. New York: Oxford University Press. Engelkamp, Johannes, and Hubert Zimmer 1994 Human Memory: A Multidimensional Model. Seattle: Hogrefe/ Huber. Finke, Ronald, Thomas Ward, and Steven Smith 1992 Creative Cognition: Theory, Research, and Applications. Cambridge: Cambridge University Press. Fischer, Olga, and Max Nänny (eds.) 2001 The Motivated Sign: Iconicity in Language and Literature 2: A Selection of Papers Given at the Second International and Interdisciplinary Symposium on Iconicity in Language and Literature held in Amsterdam in 1999. Amsterdam: Benjamins. Foster, Jonathan, and Marko Jelicic 1999 Memory structures, procedures, and processes. In Memory: Systems, Process, or Functions, Jonathan Foster, and Marko Jelicic (eds.), 110. New York: Oxford University Press. Garnham, Alan 1997 Representing information in mental models. In Cognitive Models of Memory, Martin Conway (ed.), 149-172. Hove: The Psychology Press. Goldberg, Adele 1995 Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Graesser, Arthur, and Leslie Clark 1985 Structures and Procedures of Implicit Knowledge. Norwood, NJ: Ablex Publishing. Haiman, John 1980 Iconicity and grammar: Isomorphism and motivation. Language 56: 515-560. Kay, Paul and Charles Fillmore 1999 Grammatical constructions and linguistic generalizations: The "what's χ doing y" construction. Language 75: 133. Koffka, Kurt
2001
Principles of Gestalt Psychology. London: Routledge.
Memory and language: Introduction
33
Mandl er, George 1985 Cognitive Psychology. Hillsdale, NJ: Lawrence Erlbaum. Markman, Arthur 1999 Knowledge Representations. Mahwah, NJ: Lawrence Erlbaum. Neath, Ian, and Aimee Surprenant 2003 Human Memory. Belmont, CA: Thomson and Wadsworth. Pishwa, Hanna 1998 Kognitive Ökonomie im Zweitspracherwerb. Tübingen: Gunter Narr. 2002 Language learning and vantage theory. Language Sciences 24: 591624. in print English tense and aspect: Source of information. In Information distribution in English grammar and discourse, Erich Steiner and See-Young Cho (eds.). Frankfurt: Lang Roediger, Henry, Mary Weldon, and Challis Bradford 1989 Explaining dissociations between implicit and explicit measures of retention: A processing account. In Varieties of Memory and Consciousness: Essays in Honor of Endel Tulving, Henry Roediger, and Fergus Craik (eds.), 3-42. Hillsdale, NJ: Lawrence Erlbaum. Sanders, Ted, Joost Schilperoord, and Wilbert Spooren (eds.) 2001 Text Representation: Linguistic and Psycholinguistic Aspects. Amsterdam/Philadelphia: Benjamins. Schachter, Daniel, and Endel Tulving 1994 What are the memory systems of 1994? In Memory Systems 1994, Daniel Schachter, and Endel Tulving (eds.), 1-38. Cambridge, MA: MIT Press. Schank, Roger 1982 Dynamic Memory. Cambridge: Cambridge University Press. Simone, Raffaele (ed.) 1995 Iconicity in Language. Amsterdam: Benjamins. Talmy, Leonard 2000 Toward a Cognitive Semantics. Cambridge, MA: MIT Press. Tulving, Endel 1983 Elements of Episodic Memory. Oxford: Clarendon Press. Virtanen, Tuija (ed.) 2004 Approaches to Cognition through Text and Discourse. Berlin/New York: Mouton de Gruyter. Van, Dijk, Teun 2002 Political discourse, political cognition. In Politics as Text and Talk: Analytical Approaches to Political Discourse, Paul Chilton, and Christina Schaffner (eds.), 203-238. Amsterdam: Benjamins. Wallace, Steven 1982 Figure and ground: The interrelationships of linguistic categories. In Tense and Aspect: Between Semantics and Pragmatics, Paul Hopper (ed.), 201-223. Amsterdam: Benjamins.
34
Hanna Pishwa
Wyer, Robert, and Thomas Srull 1989 Memory and Cognition in its Social Context. Hillsdale, Ν J: Lawrence Erlbaum.
Part 1. Linguistic structure and memory
Chapter 2 A constructivist epistemology for the language sciences Ngoni Chipere
1. Introduction Language is Janus-faced, bearing rule-like regularity on one face and usage-based idiosyncrasy on the other. For two millennia, however, students of language have sought to establish one or the other as the true face of language (including, of all people, Julius Caesar, who favoured regularity over idiosyncrasy). The analogists of classical Greece viewed language as an orderly phenomenon possessing an internal logic. Their arch-rivals, the anomalists, saw language as an irregular phenomenon shaped largely by social usage. This ancient schism cuts across contemporary sciences of language: contemporary linguistics disagree sharply on whether language is rule-based or usage-based while psychologists differ on whether linguistic representations are based on rules or associations. Repeated failure to determine if knowledge of language takes the form of rules or associations has now lead to a growing perception that it encompasses both forms. Numerous experiments have shown that linguistic representations incorporate rules and associations (see papers in the volume edited by Nooteboom, Weerman and Wijnen 2002; see also Marcus 2001; Wray 2002; Townsend and Bever 2001; and Pinker 1999). Such studies have led to a growing recognition of the need to integrate rules with associations and there is now a proliferation of 'hybrid' models of language processing. These new models seek to combine rules and associations by employing a 'dual route' approach involving two co-operating language processors one based on rules and the other based on associations (see Chipere 2003 for a discussion of two such models). The rule-based processor is employed to deal with regular linguistic forms while the association-based processor is employed to handle irregular forms. There are problems with this formulation, however.
38
Ngoni Chipere
The problems arise primarily from the fact that, whereas rule-based and memory-based models were grounded in either rationalism or empiricism, the new dual route models lack a clear epistemological basis. The resulting epistemological vacuum has left the new hybrid models vulnerable to often covert influences from rationalism and empiricism. This article addresses two particularly problematic assumptions that emanate from the traditional epistemologies. Firstly, it is often assumed that these forms of representation are distinct and mutually incompatible. This assumption arises from the historical opposition between rationalists and empiricists, which has led to the polarisation of rules and associations. Thus we find that, although dual route models are predicated on the co-existence of rules and associations, they retain the assumption of mutual incompatibility between these forms of representation. For instance, dual route models assign different processing roles to rules and associations: rules are assigned to regular phenomena and associations to irregular phenomena. This separation of rules and associations is reified by locating each form of representation in its own special processor. This assumption of separateness, however, is violated by the fact that rules and associations interact in a cognitively productive manner. A mathematical analogy can be used to illustrate the point. Nooteboom, Weerman, and Wijnen (2002) note that it is possible to say that there is only one number, 1, and that all other numbers are derived from it by operations of addition, subtraction, multiplication and division. While this may be true, we routinely reckon with numbers higher than 1 as if they were primitives in their own right. In other words, we reckon with stored products of computational processes. The human ability to store and re-use computational products appears to be a natural and possibly indispensable aspect of mental functioning. One needs only imagine how onerous, if not downright impossible, mental arithmetic would be if one had to write 1+1+1+1 instead of simply writing 4. The observation in the last paragraph can be taken further. Consider that once a formula has been derived for solving a mathematical problem, mathematicians will typically not bother to derive the formula each time they need to use it. For instance, the formula for calculating the area of a circle is routinely applied with no thought given to the process of its derivation - it is simply recalled from memory and applied. The process can be repeated endlessly: the expression for calculating the area of a circle may, in turn, be used to derive other expressions, such as the one for calculating the volume of a cylinder and so on. It appears that the human brain natu-
A constructivist epistemology
39
rally compresses processes into products and thus enables itself to engage in more complex processes. There is a productive interchange between computation and storage that cannot be accounted for if rules and associations are separated. The second assumption that dual route models inherit from rationalism and empiricism is that speakers of a language share the same mental representations of the language. Rationalists assume that speakers of a language share the same grammatical rules while empiricists assume that speakers of a language share the same set of associations. In dual route models, these assumptions manifest in the dual language processors: a rule-based processor that can handle all the regular phenomena in a language and an association-based processor that can handle all the irregular phenomena. Yet again, this inherited assumption runs foul of the facts, for it appears that speakers vary considerably in linguistic representation. To illustrate the nature of this variation, let us return to the mathematical analogy one last time. If one wishes to learn the formula for calculating the area of a circle, one can learn the formula or one can recreate the deductive process by which the formula was discovered. To use a catchy phrase, one can either be a rule-follower or a rote-learner. Rote learning is the easier option while rule-following is more demanding. However, rotelearning results in item-specific knowledge while rule-following guarantees the ability to generate valid solutions for a wide range of mathematical problems. Evidence will be presented to show that speakers differ precisely along these lines: some tend to rote-learning and others to rule-following. On account of this variation, native speakers differ in the range of sentences that they can understand in their native language. The rule-followers can decode a wider range of syntactic structures than the rote-learners. This pattern of variation cannot be accounted for by dual-route models that assume that each speaker is fully equipped to deal with all linguistic regularities via a rule-based processor and all linguistic idiosyncrasies via an association-based processor. In summary, dual-route models are unable to account for a) the productive interaction of rules and association and b) native speaker variation in syntactic competence. The failure arises from the fact that these models lack epistemological grounding and therefore unwittingly inherit invalid assumption from rationalism and empiricism. The foregoing observations indicate the need to fill the epistemological vacuum of the post-rationalist/empiricist era.
40
Ngoni Chipere
This article will seek to show that rationalism and empiricism are inadequate bases for the study of language. It will then propose that constructivism provides the required epistemological under-girding. It will seek to show that constructivism provides principled accounts of a) the interaction between creativity and memory and b) individual variations in linguistic representation. This thesis is developed in five steps. Section 1 shows how rationalism and empiricism give rise to the assumption that a) mental representations of language are either rule-based or memory-based and b) the assumption that all native speakers of a language share a single mental system of linguistic representation. These assumptions are falsified in Sections 2 and 3, which show a) that rules and associations are aspects of a single system of mental representation and b) that mental representations of language vary across individual native speakers. Section 4 presents constructivism as an epistemological basis for handling both the interaction of rules and association and variations across individuals in general mental representation. Finally, Section 5 brings constructivism to bear on these issues as they relate specifically to knowledge of language.
2. Rationalism and Empiricism This section traces the origin of rationalism and empiricism to Plato and Aristotle. It seeks to seeks to show how these epistemologies have influenced the study of language over the past two thousand years. The historical accounts provided in the section are drawn mainly from Ellegärd (2003) and Hergenhan and Olson (2000).
2.1. Rationalism Rationalism originated in the ancient Pythagorean cult of number. The Pythagoreans observed that the physical universe displays a high degree of mathematical regularity. For example, musical sound displays exact mathematical relationships, as do physical shapes, such as triangles and circles. The Pythagoreans sought to generalise such observations even to aesthetic concepts such as harmony and justice. Their mathematicization of everything led them to formulate the motto: Number Rules the Universe. They proposed that number is eternal and exists independently of the hu-
A construciivist epistemology
41
man mind. They went as far as to describe number as an active intelligence with causal powers over physical events. As Aristotle observed, "Pythagoras holds that number moves itself, and he takes number as an equivalent for intelligence." (Guthrie 1988: 310). Pythagorean ideas were modified in several ways by Plato, himself a member of the cult, to create the doctrine of innate ideas. Firstly, Plato proposed that ideas, not just numbers, are eternal, immutable and independent of the human mind. Secondly, whereas the Pythagoreans viewed numbers as existing within things, Plato posited a duality between the plane of ideas and the plane of physical objects. In terms of this duality, the objects that comprise the physical universe are imperfect copies of the ideas that exist on the ideal plane (Plato sought to illustrate this duality with his allegoiy of the cave.) Thirdly, invoking the doctrine of reincarnation, Plato proposed that the soul exists on the ideal plane prior to birth but forgets the universal ideas of its native realm at birth. Experiences on the physical plane, however, can trigger a process of logical deduction within the faculty of Reason which results in recollection of the universal ideas (Plato sought to illustrate this idea with his story of the uneducated young slave who was led through Socratic questioning to make abstract statements about geometry.) We turn now to the origin of empiricism.
2.2. Empiricism Plato's pupil, Aristotle, is credited with establishing empiricism. It is important to point out that Aristotle himself cannot be described as an empiricist in the modern sense of the word. While many (though certainly not all) modern empiricists reject rules in favour of associations, Aristotle had no difficulty in positing a faculty of Reason. He has also been credited with formalising the logical syllogism. The reason that Aristotle is associated with empiricism is that he believed that knowledge is acquired from experience. He also developed a theory of memory that has become the basis of empiricist views on the nature of mental representation. These two aspects of his work are described in the following paragraphs. In contrast to Plato, Aristotle viewed experience as the source of knowledge. He argued that universal ideas are induced from particulars rather than the other way round. And whereas Plato was suspicious of
42
Ngoni Chipere
memory, Aristotle regarded it as important and he developed a theory of memory that has endured over the centuries. According to Aristotle's theory of memory, ideas in memory are related to each other via four laws of association. The law of contiguity states that ideas relating to events that co-occur in space or time will evoke each other. The law offrequency states that the more frequent the co-occurrence the stronger the association between the two ideas. The law of similarity states that similar ideas evoke each other. Finally, the law of contrast states that opposing ideas can also evoke each other. While Aristotle proposed four laws of memory, the empiricists who came after him often sought to reduce the number of associative laws to just two - contiguity and frequency. This reductive trend is most clearly pronounced in the Behaviourist doctrine of the 20th century and with some contemporary interpretations of Connectionism. As will be shown presently, focusing exclusively on the laws of contiguity and frequency has made it difficult for proponents of extreme empiricism to explain rule-governed aspects of mental representation.
2.3. The influence of rationalism and empiricism on the study of language In line with his idealism, Plato thought that words had inherent meanings which made them 'true' in the sense of possessing an inner fitness that connects a name to its referent. Consistent with his empiricism, on the other hand, Aristotle, thought that words acquire meaning from social usage. These opposing perceptions of language have marked the study of language over the past two thousand years. Beginning with the Classical Greek grammarians, there is a perceptible division of opinion along Platonic and Aristotelian lines. The classical Greek 'analogists' viewed language as possessing an inherent logical order while the 'anomalists' viewed it as being ordered by social custom. Much later, in the 17th century, the Port Royal grammarians, John Wilkins and Gottfried Leibnitz each sought separately to reduce language and thought to a universal grammar. During the same period, John Wallis and Robert Robinson took a more naturalistic approach and wrote on the social aspects of English phonetics. In the 19th century, Neogrammarians such as Grimm and Bopp sought to describe language in terms of autonomous rule systems while Gillieron adopted a socio-linguistic approach to language and sought to produce a linguistic atlas of France (see Ellegärd 2003).
A constructivist epistemology
43
The mid-twentieth century saw a major conflict between rationalism and empiricism in terms of the controversy between generativism and cognitivism on the one hand and structuralism and behaviourism on the other. The lines drawn during this period still define the contemporary study of language: in linguistic theory, we have formalism versus functionalism; in language acquisition: innateness versus learning; in psycholinguistics: rules versus associations; in language teaching: structural versus functional syllabi; and in literacy teaching: phonics versus whole language approaches. While the protagonists in these controversies do not always articulate their epistemological commitments, the battle-lines suggest more or less direct commitment to either rationalism or empiricism. In sum, rationalism regards knowledge as innately specified while empiricism regards it as something that it learned from experience. This difference of opinion regarding the origin of knowledge has given rise to different views about the form of knowledge. From the rationalist perspective, innate knowledge takes the form of axioms and rules of deduction. From the empiricist perspective, knowledge takes the form of associations that reflect the statistical structure of the environment. It is important to note, however, that although rationalism and empiricism are, in one sense, polar opposites, they make a common, fundamental assumption about the nature of knowledge. Both regard knowledge as a veridical representation of reality. They both subscribe to what has been called an instructional model of knowledge acquisition, whereby knowledge is transferred into the mind of the knower by means of genes or experience. This deterministic view of knowledge gives rise to the assumption that, excepting cases of abnormality, knowledge cannot vary across individuals. The article will now present arguments against the assumptions that that mental representations are a) either rule-based or memory-based and b) invariant across individual language users Evidence will be presented to show a) that rules and associations interact in a manner that results in qualitative representational change over time and b) that individual language users vary in linguistic representation, with some tending towards rule-like systematicity and others towards memory-based item-specificity.
44
Ngoni Chipere
3. The dual nature of mental representations of language This section presents evidence that mental representations of language display characteristics of both rules and memory for previously encountered linguistic forms. It begins by presenting some theoretical arguments to this effect before describing the experimental evidence.
3.1. Theoretical arguments There is a common-sense argument for the role of linguistic memory in knowledge of language. Language users can recognise surface features of language such as voices, handwriting and fonts. This recognition would not occur in the absence of episodic memory for linguistic form. Another argument for the role of storage in knowledge of language can be derived from the following observation. Gross (1979) noted that, despite their best efforts, linguists have never succeeded in creating an exhaustive rule-based model of any natural language. A simple explanation for this failure is that a significant portion of natural language is irreducibly irregular and must be learned by rote. In addition, approaches to grammar such as Construction Grammar (Goldberg 1995) have revealed the existence of numerous pre-fabricated syntactic schemata that display varying degrees of productivity. In Cognitive Grammar, there is the concept of 'entrenchment', whereby complex linguistic units become unitised with repeated usage and subsequently require no constructive effort (Langacker 1987). Corpus Linguistics has also shown that language displays various frequency effects.
3.2. Experimental evidence The experimental evidence pertaining to the form of linguistic representation comes from four periods of experimental investigation. Each period had a slightly different research agenda. In the 1950s, the concern was with frequency effects in knowledge of language and the research received its impetus from aspects Information Theory that support the empiricist approach. The second period, in the 1960s, was concerned with establishing the 'psychological reality of phrase structure rules' and it received its impetus from Generative Grammar - a decidedly rationalist approach.
A constructivist epistemology
45
The third period, in the 1970s sought to establish the relationship between language processing and the human memory system. The impetus appears to have been largely derived from the effort to decide between models of human memory. One class of models depicted human memory performance as a reflection of a fixed architecture independent of experiential content while the other class of models described the memory system in terms of its experiential content. Thus the debate manifested underlying rationalist versus empiricist concerns. The fourth period of investigation, in the 1980s and 1990s, was concerned with describing sentence parsing mechanisms. The inspiration behind this work came from the attempt to decide between modular and interactive approaches to mental processing. Yet again, this debate was fundamentally about rationalism versus empiricism, given that proponents of cognitive modularity are rationalists and innatists while proponents of interactivity are associationists and empiricists. Thus, while each period of investigation had its own distinct concerns, the underlying debate was the same as it had been for the preceding two millennia - rationalism versus empiricism.
3.3. The first period That language has a statistical structure has been known since ancient times. The 9th century Arabian code breaker, al-Kindi, found that each letter in a written language occurs with a signature frequency. For instance, to use the example of English, 'e' comprises about 13 per cent of English text, while 'z' comprises less than one per cent. Al-Kindi was able to use such knowledge to decipher secret messages that had been encoded by systematically substituting one letter for another (see Singh 2000). In the mid-twentieth century, Shannon and Weaver (1949) showed that orthographic and lexical sequences in natural language display statistical regularities that enable one to predict, at a given level of probability, successive linguistic units in a sequence. Numerous subsequent psycholinguistic experiments showed that linguistic representations encode degrees of predictability. In the study of speech production, Goldman-Eisler (1958) and Maclay and Osgood (1959) found that pauses in speech occur at points of high unpredictability. In reading research, it was found that skilled readers are better at making orthographic predictions than less skilled readers (Lefton, Spragins, and Byrnes 1973; LeBlanc, Muise, and Gerard 1971;
46
Ngoni Chipere
Muise, Leblanc, and Jeffrey 1971 and 1972; and Scheerer-Neumann et al. 1978). In experiments on short-term memory, Miller and Selfridge (1952) found that short-term recall is a function of the predictability of verbal stimuli (see also Deese and Kaufman 1957; Marks and Jack 1952; Miller and Selfridge 1950; Richardson and Voss 1960). With respect to language perception, Traul and Black (1965) found that the intelligibility of speech increases and the variability of errors decreases with increasing predictability. Miller, Bruner, and Postman (1954) found that letter recognition accuracy is greater for letters in familiar sequences compared to letters in unfamiliar sequences. Onishi (1962) found that eye-voice span increases with increasing predictability. Thus it was established that language has a statistical structure and that mental representations of language reflect this structure. However, the advent of Generative Grammar led many psycholinguists to abandon these empirical achievements and to seek, instead, to prove that syntactic representations are based on phrase structure rules. The reasons for this abandonment were purely theoretical and had to do with a whole-sale epistemological shift from empiricism to rationalism.
3.4. The second period A number of studies in the 1960s showed that the presence of syntactic structure in a lexical sequence increases intelligibility (Miller 1962; Miller and Isard 1963) and memory (Epstein 1961a; 1961b; Marks and Miller 1964; Anglin and Miller 1968; Graf and Torrey 1966). Johnson (1965) devised a metric called 'transitional error probability' which showed that sentences tended to be recalled in intact phrase units. Further evidence for the psychological reality of phrase structure rules came from Mehler, Bever, and Carey (1967), who discovered that eyefixation patterns conform to syntactic structure. Fodor and Bever (1965) also observed that when a click is superimposed on a spoken sentence, there is a tendency for subjects to report that the click occurred near major syntactic boundaries even if the clicks actually occurred well within a phrase. In addition, the greatest number of correct responses about the location of a click were obtained when clicks were located at major phrase boundaries rather than within phrases (see also Bever, Kirk, and Lackner
A constructivist
epistemology
47
1969; Bever, Lackner, and Kirk 1969; Garret, Bever, and Fodor 1966; Holmes and Forster 1970). Thus evidence was obtained to show that mental representations of sentences display phrase structural organisation. It should be noted, however, that Levelt (1970) found that the phrase units in subjects' recalls did not always coincide with the phrase structures prescribed by linguistic theory. In addition, Fodor, Bever, and Garret (1964), pioneers in the attempt to prove the psychological reality of phrase structure rules, concluded that evidence for phrase structure rules had not been obtained. The reason for this conclusion was that the evidence for phrase structural organisation does not imply the existence of phrase structure rules. Phrase structural organisation could arise, for instance, from the fact that lexical predictability is coincidentally higher within phrases than between them. Thus, phrase structural organisation could simply be a statistical artefact, as suggested earlier by other researchers (e.g. Goldman-Eisler 1958; Maclay and Osgood 1959; Rosenberg 1968). In fact, Bever (1970) proposed that sentence comprehension is based on prefabricated sentence schemata called canonical sentoids (see also Bever 1988; Townsend and Bever 2001). Thus, in contrast to the effort to establish statistical structure in language, the attempt to establish the psychological reality of phrase structure rules did not succeed. The 1960s did produce evidence for the use of rules, however. To the perplexity of those concerned, this evidence had equivocal implications for the rationalist and the empiricist approaches. The evidence comes from experimental investigations into the ability of language users to understand multiply self-embedded sentences. Chomsky (1957) had argued that selfembedded sentences can only be understood on the basis of phrase structure rules. Given that models of language based purely on associations cannot handle phrase structure rules, Chomsky argued that speakers' knowledge of language cannot be modelled in terms of associative memory. Some findings did indicate that some native speakers can understand multiply self-embedded sentences. However, such findings were highly problematic because they also showed that many other native speakers appear to be unable to understand these sentences (see Miller and Isard 1964; Stolz 1967; Freedle and Craun 1970; Powell and Peters 1973). These native speakers appeared to rely on their memory of similar sentences in order to decode the test materials (see Blumenthal 1966; Stolz 1967). A
48
Ngoni Chipere
study by Blaubergs and Braun (1974) showed that such speakers need special training in order to understand self-embedded sentences. These findings were seriously problematic for both the rationalist and empiricist approaches. Recall that the rationalist approach assumes that speakers' internal models of language are innate rule systems. It is difficult to square this assumption with the evident need to special training. For empiricism, the problem is that associative memory mechanisms based purely on contiguous relations and frequency information do not support recursive phrase structure rules of the sort required to comprehend selfembedded sentences. It is therefore difficult, from an empiricist perspective, to explain the fact that language users can learn to use such rules. In sum, the second period of investigation therefore produced findings that could not accounted for by either the rationalist nor the empiricist approaches. For those who assumed that either rationalism or empiricism must be correct, these results must have been truly perplexing. It may be for this reason that the ancient debate shifted to a new front - the nature of human memory and its relation to language.
3.5. The third period During the third period of research into the nature of linguistic knowledge, researchers sought to relate the language processing mechanism with memory systems. The impetus for this research came from attempts to decide between the stage model of memory (Atkinson and Schiffrin 1968) and the levels of processing model (Craik and Lockhart 1972). The stage model observed a strict separation of short-term and long-term memory while the levels of processing model regarded short-term and long-term memory as aspects of a single memory whose retentive capabilities depend on the manner in which environmental stimuli are processed. The essential conflict here was between a view of memory as an innate structure or as a structure that reflects the organism's experience. The linguistic interest in this debate came about as follows. A key argument of the rationalists is that knowledge of language is not based on stored linguistic forms. On the other hand, a key argument of the empiricist approach is that knowledge of language is based on stored linguistic forms. Proponents of either approach were therefore keenly interested to discover if experience of language left traces in long-term memory. If it did, then this would suggest that knowledge of language comprises, to some extent
A constructivist epistemology
49
at least, knowledge of stored linguistic forms. On the other hand, if it could be shown that the process of sentence comprehension does not leave memory traces of sentences, then the empiricist argument would be invalidated. A number of experimental studies produced evidence that sentences are not stored verbatim in long-term memory (e.g. Sachs 1967; Jarvella 1971). This finding was consistent with the rationalist approach for the reason outlined above. However, a methodological flaw in these experiments was soon pointed out: they required subjects to recall sentences consciously. Subsequent experiments showed that, while subjects may not be able to consciously recall the surface form of previously presented sentences, they can, however, produce evidence of surface form retention using some other means of investigation. For instance, it was shown that subjects process sentences faster if the wording is exactly the same between the first and second presentations than if the wording differs slightly between the first and second presentations (see Moeser 1974; Graesser and Mandler 1975; McDaniel 1981; Anderson 1974; Kintsch and Bates 1977; Keenan, MacWhinney, and Mayhew 1977; Bates, Masling, and Kintsch 1978; Stevenson 1988). This facilitation suggests that episodic memory traces are stored in long-term memory and can influence subsequent processing. In sum, the third period of investigation provided evidence for the longterm retention of the surface forms of language. This evidence meant that the idea that knowledge of language consists of stored forms could not be discounted. However, as with the other periods of investigation, there does not appear to ever have been an attempt to consolidate the research findings and come to a firm conclusion. Instead, the debate simply shifted to new ground.
3.6. The fourth period The contentious issue during this period was whether sentence processing mechanisms are contained in a cognitive module that employs phrase structure-based parsing strategies or if such mechanisms are based on general-purpose cognitive mechanisms which employ frequency information and associations. Some of this research focused on the phenomenon of syntactic priming, whereby exposure to a given syntactic structure facilitates the subsequent comprehension or production of that structure (see Bock and Loebell 1990; Branigan, Pickering, Liversedge, Stewart, and
50
Ngoni Chipere
Urbach 1995; Frazier 1995). Although syntactic priming was initially regarded as evidence for the operation of rules in syntactic comprehension, subsequent work points to the phenomenon as a long-term memory learning effect (see Cuetos, Mitchell, and Corley 1995; Bock et al. 1996; see also the detailed arguments in Chipere 2003).
3.7. Summary In summary, it appears that there is evidence for a storage component in speakers' models of language as well as for the ability to use phrase structure rules. As argued earlier, this evidence is problematic for both rationalist and empiricist approaches given that these approaches describe mental representations of language exclusively in terms of rule systems or associative memory. It is important to note that the co-existence of rules and associations as discovered by these investigations is not one in which there is a division of labour, such that, as is assumed in dual route models, rules apply to regular linguistic forms and associations apply to irregular forms. Instead, it appears that regular linguistic forms can be processed associatively - finding that violates the dual-route separation of rules and associations into separate cognitive modules. This finding is directly related to the topic of the next section, which is individual variations in linguistic representation: it appears that some individuals tend toward linguistic computation (rule-followers) and others toward linguistic memory (rote-learners).
4. Individual variations in knowledge of language An extensive review of experimental evidence for individual variation in grammatical knowledge is provided in Chipere (2003). A brief synopsis of the review is provided here in two sub-sections. The first sub-section deals with evidence for individual differences in grammatical competence. The second section then addresses the proposal that individuals vary only in the linguistically extrinsic performance factor of 'working memory' and not in grammatical competence as such.
A constructivist epistemology
51
4.1. Individual differences in knowledge of language Bates, Bretherton, and Snyder (1988) provide a comprehensive review of literature on individual differences in first language development. They note that researchers have made various dichotomies in the ways that children learn the phonological systems of their native language. For instance, researchers have categorised language learners into 'holistic' versus 'analytic' learners. The former are said to acquire language by rote-learning whole structures that they gradually analyse into their constituent units. The latter are said to learn smaller units that they can recombine in a rule-guided fashion into larger structures. Such a dichotomy mirrors observations made in this article concerning memory-based and rule-based forms of linguistic knowledge. Bates, Bretherton, and Snyder caution, however, that such dichotomies should not be taken to represent types of children as such but rather different possible ways in which a child can approach the problem of learning language. They argue that a child who is holistic in one area of language may be analytic in another. On the other hand, studies by Day (1969, 1979) and Lefever and Ehri (1974), do suggest that individuals may have stable life-long preferences in language learning style. A number of studies have also produced evidence of individual variation in the making of grammaticality and acceptability judgements (Hill 1961; Spencer 1973; Greenbaum and Quirk 1970). Some of these differences have been linked to variations in educational attainment or linguistic expertise (Mills and Hemsley 1974; Spencer 1973; Karanth, Kudva, and Viyajan 1996). A relationship between educational attainment and syntactic comprehension skill was also reported by Gleitman and Gleitman (1970) Geer, Gleitman, and Gleitman (1971); Baruzzi (1984) and Dabrowksa (1997). Dabrowska's study is described below as an example of this type of work. Dabrowska's purpose was to find out how well the sorts of structures often discussed in linguistics journals can be understood by native English speakers with different levels of education. She used tough movement sentences, for instance, 'Sandy will be easy to get the president to vote for'; complex noun phrase sentences, for instance, 'The manager knew that the fact that taking good care of herself was essential upset Alice' and parasitic gap sentences, for instance, 'It was King Luis who the general convinced that this slave might speak to'. Dabrowska found that university
52
Ngoni Chipere
lecturers understood such sentences better than undergraduates who, in turn, understood them better than porters and cleaners. Other studies have shown that individual variations in lexical knowledge can lead to individual differences in syntactic performance (Cupples and Holmes 1992; Pearlmutter and MacDonald 1995). Earlier studies by C. Chomsky (1969); Kramer, Koff, and Luria (1972) and Sanders (1971) linked variations in lexical knowledge to variations in syntactic performance. Maratsos (1976) and Biber (1983) found evidence of individual variations in knowledge of noun definitization rules based on variations in lexical knowledge. There is also evidence that native speakers vary in the ability to assign constituent structure to sentences (Huey 1968; Dearborn and Anderson 1937; Cromer 1970; Levin and Caplan 1970; Muncer and Bever 1984; Cupples and Holmes 1987; Dabrowska 1997), in the ability to cope with syntactic ambiguity (Lefever and Ehri 1976; Cupples and Holmes 1992; Pearlmutter and McDonald 1995), in the ability to cope with decreases in syntactic predictability (Graesser, Hoffman, and Clark 1980), in brain wave responses to syntactic anomaly (Osterhout 1997), and the ability to assign thematic roles on the basis of syntactic constraints (Bates et al. 1982); Kilborn and Cooreman 1987; Harrington 1987; Kail 1989; McDonald 1989; Kilborn and Ito 1989; Sasaki 1997). It has been proposed, however, that individual variations of the sort described above are not due to individual variations in grammatical knowledge but to individual differences in the performance factor of 'working memory'. This proposal is examined next.
4.2. Individual differences in working memory capacity Chomsky (1965) made a distinction between grammatical competence and performance. In terms of that distinction, speakers have a perfect grammatical competence that may nevertheless fail to achieve full expression on account of limiting performance factors. A key performance factor is hypothesised to be 'working memory' capacity (see Fodor and Pylyshyn 1988; Just and Carpenter 1992; Gibson and Thomas 1999; Caplan and Waters 1999). Working memory is defined as the portion of the human memory system which is responsible for both storing information on a short term basis and manipulating that information (Just and Carpenter 1992). It can be
A constructivist epistemology
53
likened to the random access memory (RAM) in a computer system. It has been argued, especially by Just and Carpenter (1992), that individuals vary in working memory capacity and that these variations give rise to individual differences in linguistic performance. However, this notion of working memory has been critiqued by Ericsson and Kintsch (1995). They argue that working memory encompasses long-term memory, and they show that working memory capacity is related to the level of expertise in a given domain. Chipere (2003) tested Just and Carpenter's (1992) versus Ericsson and Kitsch's (1995) models of working memory. The experiment involved manipulating working memory and grammatical knowledge to establish which of these factors was related to comprehension. An initial test of comprehension and recall of complex sentences produced low scores. Subjects were then given training to boost their working memory capacity. A subsequent test of comprehension and recall showed improvements in recall but not comprehension. A matched group of subjects was then trained in comprehension. A subsequent test of recall and comprehension showed improvements in both recall and comprehension. The conclusion from the study was that low pre-test performance in the first group was caused by insufficient grammatical knowledge as opposed to a linguistically extrinsic working memory capacity. The experiment also showed that some native speakers had no difficulty in understanding the complex sentences used in the study while other native speakers completely failed to understand them. The subjects who understood the sentences appeared to use something akin to grammatical rules in order to comprehend the sentences. An analysis of the manner in which subjects recalled the sentences suggested that subjects who failed to comprehend the test sentences were making use of a prefabricated syntactic schema which was inappropriate to the sentence type involved. Thus it appeared that subjects displayed contrasting processing styles - one based on rules and one based on memory for previously encountered sentences. The proposal that individual variations in grammatical performance are solely due to individual variations in working memory capacity therefore appears to be invalid. Quite the converse: individual variations in working memory capacity appear to arise from individual variations in grammatical competence.
54
Ngoni Chipere
4.3. Summary In summary, it appears that native speakers vary in grammatical competence. Some individuals display a rule-based processing style while others display a memory-based style. There is also evidence that the same individual can display one processing style in one area of language and a different processing style in another area. The examination in Section 1 showed that the traditional epistemologies of rationalism and empiricism are inadequate because they do not acknowledge the interaction between rules and associations and because they do not acknowledge individual differences. The next section argues that constructivism provides a plausible and principled explanation for these phenomena.
5. Constructivism Constructivism integrates positive elements of rationalism and empiricism while avoiding the limitations of both. This enables it to accommodate rules and associations as well as individual differences. The following paragraphs show how this is achieved in terms of Donald Campbell's evolutionary epistemology and Jean Piaget's genetic epistemology. Both of these sources of constructivism ground cognition in biology.
5.1. Evolutionary epistemology Campbell's (1974) evolutionary epistemology is predicated on the proposition that mechanisms found in the realm of biology govern not only the brain but also the nature and the development of knowledge within the organism. Campbell thus conceives of knowledge as a form of biological adaptation, whose value lies in the advantage that it confers upon the organism in the struggle to survive. He regards the development of knowledge in terms of biological evolution - knowledge is said to mutate; conceptual mutations that enhance survival are retained while those that do not are lost. To put this in other words: in biological evolution, the process of reproduction produces offspring that vary from their parents and from each other in terms of certain traits. If a new trait confers an advantage, then the indi-
A constructivist epistemology
55
viduals bearing that trait are likely survive and pass that trait on to their offspring. Traits that confer a disadvantage will not perpetuate as the individuals bearing those traits are unlikely to reproduce and pass on the traits. In effect, the environment favours certain traits over others and thus exercises the power to select. Countless iterations of this process give rise to highly adapted life forms. Campbell proposed that knowledge is generated via an analogous process. Whereas biological organisms reproduce physically, new concepts are generated when individuals randomly combine behavioural responses in order to deal with novel situations. Responses that lead to successful outcomes 'survive' in the sense that they will be evoked by similar situations in the future while those that lead to unsuccessful outcomes will not be evoked in future. Furthermore, successful behaviours can be recombined in order to deal with novel circumstances in the future. Thus the individual builds up a repertoire of effective and increasingly complex behaviours through a process of trial and error. As with physical adaptations, the environment can be seen as favouring certain behaviours over others, thus exercising selective power. The repertoire of behaviours is stored in the memory of the organism and constitutes its knowledge. Organisms with complex brains, like humans, have the capacity to mentally simulate environmental conditions and to generate, in a purely mental fashion, appropriate responses to those conditions. This ability confers an evolutionary advantage by reducing the need to trial ineffective behaviours against a potentially deadly environment. Thus memory increasingly takes on the selective role of the environment by making it possible to determine mentally whether new behaviours are likely to pass or fail environmental tests. Thus the process of blind variation and selection continues on the cognitive plane.
5.2. Summary Campbell presents a radically different view of cognition, which he depicts as an instantiation of the evolutionary algorithm of variation and selection in the domain of mental representations. In his view, memory is transformed into new knowledge that, in turn, becomes memory. As knowledge evolves, it captures more and more general regularities in the environment and gives rise to what may be called rule-governed behaviour. Thus we can
56
Ngoni Chipere
see that constructivism can accommodate rule-governed creativity and memory in a natural and effortless fashion. Furthermore, Campbell's account depicts individuals as sources of variation. The individual organism generates new behaviours (which ultimately become knowledge) through blind trial and error. Therefore it is entirely natural to expect individuals to vary from each other. Now, it might be argued that, given a common environment, individuals ought to converge on similar representations of it. However, this is not necessarily the case. It should be borne in mind that, from the perspective of evolutionary epistemology, knowledge is merely an adaptation that confers evolutionary advantages. In the case of physical evolution, a wide variety of physical adaptations can confer evolutionary advantages - scales, feathers, fur each differ from each other yet each enhances the survival of one species or another. Applying this point of view to knowledge, we are drawn to consider that what matters is not truth-value but survival value. Constructivism holds that an infinite range of different models of the same environment can co-exist, so long as each of these models enable the knower to survive the environment. Therefore, where there is no environmental pressure for individuals to converge in their representations of the environment, constructivism leads us to expect variations as the norm rather than the exception.
6. Genetic epistemology Piaget's (1970) genetic epistemology takes a somewhat different but complimentary focus to that taken by evolutionary epistemology. According to Piaget, the infant is born with an innate repertoire of operations ('cognitive structure') for transforming the environment. The elements of this set are innate behaviours such as crying, grasping, sucking and so on and they are the precursors of knowledge. The infant is confronted with seemingly random and possibly catastrophic transformations of the environment. In order to counter the negative effects of such transformations, the infant responds with random behavioural selections from its cognitive structure. Initially, the infant operates on the environment in a random manner, crying, kicking, grasping and so on with no evident goal in mind. Some of these operations evoke desirable outcomes while others evoke negative outcomes. Gradually infant discovers systematic correspondences between
A constructivist
epistemology
57
its operations and the consequent transformations of the environment. Crying, for instance, can causes food to become available. Shaking a rattle produces a novel auditory sensation; wriggling about can give rise to changes in location and so on. The infant can now begin to transform its environment in a goal-directed manner. In time, the infant discovers that certain goals can be met by combining individual transformations. For instance, if an infant wishes to hear the sound of a rattle but the rattle is out of reach, then it must crawl to the rattle first, grasp it firmly and shake it. In addition to composing series of transformations, the infant gradually differentiates elements of its original cognitive structure in order to effect more specific transformations. Grasping, for instance, becomes differentiated into pulling, pushing, shaking and so on. According to Piaget, the entire gamut of human behaviour emerges gradually through differentiation and co-ordination of the original elements of the infant's cognitive structure. At some point, according to Piaget, the child begins to simulate transformations of the environment mentally. Behaviour thus becomes less and less random and more purposeful as thought eliminates the need to for blind trial and error. This point heralds the beginning of symbolic thought, which, in Piaget's account, is grounded in physical action. It is interesting to note, in this respect, that verbs denoting mental processes are drawn metaphorically from the physical domain. We speak, for instance, of 'seeing a point'; 'grasping ideas'; 'searching memory'; 'turning an idea over in the mind' and so on. From a Piagetan perspective, these are not mere figures of speech: they are echoes of the physical origins of thought. As it turns out, the physical basis of thought has been explored in a number of linguistic theories, notably in Cognitive Grammar. Langacker (1987) proposes that visual perception underlies cognition and language. He suggests, for instance, that the linguistic categories of noun and verb are based on two forms of visual perception, respectively, summary scanning, in which objects are apprehended as wholes and sequential scanning, in which objects are viewed as processes. Furthermore, the grammatical categories of subject and object correspond to the visual categories o f f i g u r e and ground. It therefore appears that elements of the constructivist outlook appear independently in linguistic theory.
58
Ngoni Chipere
6.1. Summary Although Piaget tells a somewhat different account from that of Campbell, certain key principles are preserved in both accounts. Firstly, that new knowledge is built from old and secondly, that the individual literally constructs knowledge in order to adapt to the environment. Both accounts accommodate creativity and memory as well as individual variations in mental representations. The next section shows how constructivism provides a basis for an approach to language that integrates rules and associations and caters for individual differences in linguistic knowledge. It turns out that the principles for such an approach were long ago worked out precociously by none other than the 'father of modern linguistics' - Ferdinand Saussure.
7. Saussure's theory of language 7.1. Sign systems According to Saussure, humans do not have innate linguistic knowledge but rather an innate 'faculty of constructing languages.' He regarded language as one of many systems of representation created by humans, which he called 'sign systems'. These systems are models of reality and, as such, do not provide photographic renderings of reality. Rather they represent selective aspects of reality that are relevant for the life purposes of language users. Saussure proposed that all sign systems can be studied under one science which he called 'semiology' - the science of signs.
7.2. Paradigmatic and syntagmatic Saussure also proposed that all sign systems have a common form of organisation. A good way to understand this organisation is in terms of Saussure's distinction between la langue and la parole. La langue is the abstract system of language existing in the brain and la parole is the audible or written linguistic forms that are produced by members of a language community. Although la langue and la parole initially appear to be quite distinct, the relationship between them is rather subtle, as shown below.
A constructivist epistemology
59
According to Saussure, la langue is an associative network consisting of linguistic units connected by two types of link - commonly referred to as paradigmatic and syntagmatic. Paradigmatic links connect units that are similar with respect to some abstract feature; all nouns, for instance, are paradigmatically linked simply by virtue of sharing whatever properties nouns have in common. Syntagmatic links, on the other hand, connect linguistic units into sequences, so that adjacent letters in a word or adjacent words in a phrase are said to be linked syntagmatically. According to Saussure, all linguistic activity can be described in terms of paths across the associative network of language via the two kinds of link. For instance, the production of an utterance requires an individual to navigate the associative network paradigmatically, seeking and selecting linguistic units that satisfy certain search criteria. Selected units are then linked syntagmatically to generate a linguistic utterance that is then transmitted by the articulatory organs. The result of this process is a speech form, which belongs to the domain of la parole. The link between la langue and la parole is based on the fact that copies of transmitted speech forms are retained memory, so that the individual holds in memory not only la langue but also a private copy of la parole. The retention of la parole in memory takes place as follows. When linguistic units are linked syntagmatically, the links do not disappear after the act of speech but are stored permanently in long-term memory. According to Saussure, a syntagmatic link becomes stronger over time if two units cooccur frequently. So much so, that, in Saussure's theory, frequently cooccurring units become unitised, forming larger units. These greater units can enter into paradigmatic relationships with each other, giving rise to higher levels of complexity in the linguistic system. For instance, a verb like running is a unitisation of the free morpheme 'run' and the bound inflectional morpheme ing. Each of these morphemes belong paradigmatic series - run, for instance might belong to a paradigmatic series containing morphemes like walk while ing might belong to paradigmatic series containing other inflectional morphemes like ed. However, once the two morphemes are combined during the act of speech to form running they comprise a syntagmatic unit that can enter into paradigmatic series with other units at the same level of structural complexity, for instance, walking, talking etc. Thus the product of a creative process is stored in memory from whence it can serve as input into other creative processes at a higher level of complexity. La langue thus integrates its own product, la parole. The two are
60
Ngoni Chipere
not completely distinct: they interact productively in a process that elaborates the language network in spiralling levels of organisational complexity. As with the constructivists who came after him, Saussure regarded creativity and memory as intertwined. His simple descriptive apparatus enabled him to handle both rules and associations as follows. The set of paradigmatic links encodes the system of abstract categories that are part of the mental grammar. Syntagmatic links, on the other hand, encode sequential associations between linguistic units. Thus rules and associations are treated as aspects of a single system of representation. It may be noted, at this point, that Saussure's paradigmatic and syntagmatic links are based on Aristotle's theory of memory. Paradigmatic links correspond to the law of similarity while syntagmatic links correspond to the law of contiguity. The strengthening of links is based on the law of frequency. Saussure also makes considerable use of the law of contrast but this area will not be covered in this discussion.
7.3. Individual differences Saussure did not discuss individual differences systematically. His theory was, however, applied by Roman Jakobson to explain individual differences. According to Jakobson, individuals are, for a variety of reasons, biased towards the use of either paradigmatic or syntagmatic links. He suggested that some individuals prefer to carry out paradigmatic operations while others prefer syntagmatic operations. My view, however, it that individuals differ in the following way. The individuals referred to earlier as rote-learners tend to rely greatly on preestablished syntagmatic links and to comprehend novel linguistic forms by analogy to forms stored in memory. Rule-followers, on the other hand, tend to make greater use of the grammatical categories encoded in paradigmatic connections. Their knowledge of language is therefore more categorical, providing them with a greater creative capacity. Saussure's theory of language instantiates and in some ways goes beyond the constructivist epistemology described in the previous section. In this theory, rules and associations interact productively as parts of a single system of representation. This system takes the form of a network in which linguistic units are linked paradigmatically by similarity and syntagmatically by contiguity. Linguistic creativity involves selection of rele-
A constructivist epistemology
61
vant units from paradigmatic series and combination of these units into syntagmatic sequences. The paradigmatic links encode abstract grammatical categories while the syntagmatic links provide memory storage for the formed sequences. These stored sequences comprise units which, in turn, can form paradigmatic series from which elements can further selected and combined. Individuals can vary in their knowledge of this system depending on whether they rely on stored sequences or on abstract categories.
7.4. Graph theory The idea that language is a network has been taken up by a number of linguists subsequent to Saussure, notably, Hockett (1955) and Lamb (1966). Currently, the notion of language as a network is advocated by Halliday (1985); Langacker (1987); Goldberg (1995) and Hudson (2000). A key advantage of studying language as a network is that there already exists a well-developed branch of mathematics, called graph theory, which is devoted to the study of networks. A number of mathematicians have already applied graph theory successfully to linguistic analysis. For instance, Markov (1916) developed the seminal theory of Markov models from a study of Pushkin's novel Eugene Onegin. Polya (1954) and Hubey (1999) apply graph theory to the study of language families. Textual analysis has been another domain of application for graph theory (Auster 1980; Dailey 1959; and Zarri 1976). In advance of linguists, a number of psychologists have applied graph theory to the study of the mental representation of language. Kiss (1968) used graph theory to study lexical networks while Greeno (1980) used graph theory to describe the mental representation of structured knowledge. Finally, Sutton et al. (1994) used graph theory to describe the sequencing of elements in poems and narratives produced by mental patients. Barabäsi (2002) has provided a new impetus to graph theory showing that networks are ubiquitous in nature and human institutions. He shows that networks possess general properties that manifest in phenomena as diverse as molecules and the world wide web. On the basis of Barabäsi's approach, Motter et al. (2002) carried out a study that showed that linguistics networks are like social networks in that they are densely interconnected. Whereas social networks are said to have 6 degrees of separation,
62
Ngoni Chipere
linguistic networks appear to have, on average, 3 degrees of separation. For instance, the words 'universe' and 'actor' are linked semantically as follows: universe —• nature —• character —• actor. Motter et al.'s research was based on analysis of a thesaurus and it is not clear how the results can be generalised to actual mental lexicons in individual human brains. Nevertheless, this kind of research is significant because it shows that graph theory has the potential to link language with other fields of study and to make it possible to share results and techniques with more established sciences. In addition, graph theory makes it possible to use the same technical language in order to carry out both linguistic description and psychological modelling. Pioneering research along these lines has been carried out by Paul Meara and colleagues in modelling the mental lexicon (Wilks 2000; see also Ferrer and Sole 2001; Steyvers and Tenenbaum 2003).
8. Conclusion This article has argued that current attempts to integrate rules and associations lack an adequate epistemological foundation. As a result, the models that are developed are unable to cover all the relevant data on language processing. In particular, current models fail to explain a) the interaction of rules and associations and b) individual variations in linguistic representation. It has been proposed that constructivism accommodates both the interaction of rules and associations as well as individual variations. The case for constructivism as an epistemology for the language sciences was further strengthened by the existence of constructivist elements in linguistic theory and particular reference was made to Saussure's theory of language. That theory describes knowledge of language as a network consisting of paradigmatic and syntagmatic links. The paradigmatic links encode grammatical categories while the syntagmatic links provide storage for linguistic sequences. It was suggested that some individuals rely on pre-established syntagmatic links and are therefore dependent on memory, while others have well-developed paradigmatic links and are therefore more creative. Finally, it was suggested that graph theory provides a ready-made mathematical formalism for modelling language as a network. Constructivism thus provides an epistemology that is highly congruent with existing theories of
A constructivist epistemology
63
language and is compatible with a mathematical analysis of language in the form of graph theory.
References Anderson, J.R. 1974 Verbatim and propositional representation of sentences in immediate and long-term memory. Journal of Verbal Learning and Verbal Behaviour 13 (2): 149-162. Anglin, J. M., and G. A. Miller 1968 The role of phrase structure in the recall of meaningful verbal material. Psychonomic Science 10(10): 343-344. Atkinson, R. C., and R. M. Schiffrin 1968 Human memory: A proposed system and its control processes. In The Psychology of Learning and Motivation 2, K. Spence, and J. Spence (eds.), 89-195. New York: Academic Press. Auster, C.J. 1980 Balance theory and other extra-balance properties: An application to fairy tales. Psychological Reports 47 (1): 183-188. Barabäsi, Albert-Läszlo 2002 Linked: The New Science of Networks. Cambridge, MA: Perseus Publishing. Baruzzi, A. 1983 Effects of degree of education on the comprehension of syntactic structures in normal and aphasic populations. McGill Working Papers in Linguistics 2: 56-74. Bates, Ε., I. Bretherton, and L. Snyder 1988 From First Words to Grammar. Cambridge: Cambridge University Press. Bates, E., S. McNew, B. MacWhinney, A. Devescovi, and S. Smith. 1982 Functional constraints on sentence processing: A cross-linguistic study. Cognition 11: 245-99. Bates, Ε., M. Masling, and W. Kintsch 1978 Recognition memory for aspects of dialogue. Journal of Experimental Psychology: Human Learning and Memory 4 (3): 187-197. Bates, E., W. Kintsch, and C.R. Fletcher 1980 On the role of pronominalization and ellipsis in text: Some memory experiments. Journal of Experimental Psychology: Human Learning and Memory 6: 676-691.
64
Ngoni Chipere
Bever, T.G. 1970 The cognitive basis for linguistic structures. In Cognition and the Development of Language, J.R. Hayes (ed.), 279—352. New York: Wiley. 1988 The psychological reality of grammar: A student's eye-view of cognitive science. In The Making of Cognitive Science: Essays in Honour of George A. Miller, W. Hirst, (ed.), 112-142. Cambridge: Cambridge University Press. Bever, T.G., R. Kirk, and J. Lackner 1969 An autonomic reflection of syntactic structure. Neuropsychologia 7 (1): 23-28. Bever, T.G., J.R. Lackner, and R. Kirk 1969 The underlying structures of sentences are the primary units of immediate speech processing. Perception and Psychophysics 5: (4) 225-234. Bever, T.G., and E. Saltz 1972 Phrases vs. meaning in the immediate recall of sentences. Psychonomic Science 9: 381-384. Bever, T.G., T.M. Saltz, and D.T. Townsend 1998 The emperor's psycholinguistics. Journal of Psycholinguistic Research 27 (2): 261-283. Biber, D. 1983 Differential competence in Somali: Evidence from the acquisition of noun definitisation. Journal of Psycholinguistic Research 12 (3): 275-295. Blaubergs, Μ., and Μ. Braine 1974 Short-term memory limitations on decoding self-embedded sentences. Journal of Experimental Psychology 102 (4): 745-774. Blumenthal, A. 1966 Observations with self-embedded sentences. Psychonomic Science 6 (10): 453—454. Bock, K., G. Dell, Z. Griffin, F. Chang, and V. Ferreira 1996 Structural priming as implicit learning. Paper presented at the meeting of the Psychonomic Society, November, 1996. Bock, K., and H. Loebell 1990 Framing sentences. Cognition 35 (1): 1-39. Branigan, Η., B. Pickering, S. Liversedge, A. Stewart, and T. Urbach 1995 Syntactic Priming. Journal of Psycholinguistic Research 24 (6): 489-506. Campbell, Donald T. 1974 Evolutionary Epistemology. In The philosophy of Karl R. Popper, and P. A. Schilpp (ed.), 412-^63. LaSalle, IL: Open Court.
A conslruclivist epistemology
65
Caplan, D., and G. Waters 1999 Verbal working memory and sentence comprehension. Behavioral and Brain Sciences 22 (1): 77-94. Chipere, N. 2003 Understanding Complex Sentences: Native Speaker Variation in Syntactic Competence. Basignstoke: Palgrave Macmillan. Chomsky, C. 1972 Stages in language development and reading exposure. Harvard Educational Review 42 (1): 1-33. 1969 The Acquisition of Syntax in Children from 5 to 10. Cambridge, MA: MIT Press. Chomsky, N. 1957 Syntactic Structures. The Hague: Mouton. 1965 Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Costa-Pereira, D.J., and R. Maskill 1983 Structure and process in pupils' essays: A graphical analysis of the organisation of extended prose. British Journal of Educational Psychology 53(1): 100-106. Craik, F. I., and R.S. Lockhart 1972 Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal Behaviour 11: 671-684. Cromer, W. 1970 The difference model: A new explanation for some reading difficulties. Journal of Educational Psychology 61 (6): 471-483. Cuetos, F., D.C. Mitchell, and M.M.B. Corley 1996 Parsing in different languages. In Language Processing in Spanish, M. Carreiras, and J. E. Garcia-Albea (eds.), 145-187. Mahwah, NJ, USA: Erlbaum. Cupples, L., and V.M. Holmes 1987 Reading skill and interpretation of temporary structural ambiguity. Language and Cognitive Processes 2: 179-203. 1992 Evidence for a difference in syntactic knowledge between skilled and less skilled adult readers. Journal of Psycholinguistic Research 21 (4): 249-275. Dabrowska, E. 1997 The LAD goes to school: A cautionary tale for nativists. Linguistics 35 (4): 735-766. Dailey, C. A. 1959 Graph theory in the analysis of personal documents. Human-Relations 12: 65-75.
66
Ngoni Chipere
Day, R. S. 1969
Temporal order judgements in speech: Are individuals languagebound or stimulus-bound? Paper presented at the 9th meeting of the Psychonomic Society, St Louis. 1979 Verbal fluency and the language-bound effect. In Individual Differences in Language Ability and Language Behavior, C. J. Fillmore, D. Kempler, and W. S. Wang, 57-84. New York: Academic Press. de Saussure, Ferdinand 1990. Reprint. Course in General Linguistics. London: Duckworth, 1916. Dearborn, W. F., and I. H. Anderson 1937 A new method for teaching phrasing and increasing the size of reading fixations. The Psychological Record 1: 459-475. cited in Cuppies, L. and V. M. Holmes (1987). Deese, J., and R. A. Kaufman 1957 Serial effects in recall of unorganised and sequentially organised verbal material. Journal of Experimental Psychology 54: 180-187. Ellegärd, A. 2003 The study of language. In Dictionary of the History of Ideas. P. Wiener (ed.) http://etext.virginia.edu/DicHist/dict.html. Epstein, W. 1961a The influence of syntactical structure on learning. American Journal of Psychology 74: 80-85. 1961b A further study of the influence of syntactical structure on learning. American Journal of Psychology 75: 121-126. Ericsson, K. A. and W. Kintsch 1995 Long-term working memory. Psychological Review 102: 211-245. Ferrer, R. and R. V. Sole 2001 The small world of human language. Proceedings of The Royal Society of London. Series B, Biological Sciences 268 (1482): 22612265. Fletcher, C. R. 1994 Levels of representation in memory for discourse. In Handbook of psycholinguistics, M. A. Gernsbacher (ed.), 589-607. San Diego, CA, USA: Academic Press. Fodor, J.A. and Τ. G. Bever 1965 The Psychological Reality of Linguistic Segments. Journal of Verbal Learning and Verbal Behaviour 4: 414-420. Fodor, J. Α., Τ. G. Bever, and M. F. Garret 1974 The Psychology of Language. New York: McGraw Hill. Fodor, J. Α., and Z. W. Pylyshin 1988 Connectionism and cognitive architecture: A critical analysis. Cognition 28(1-2): 3-71.
A constructivist epistemology Frazier, L. 1995
67
Constraint satisfaction as a theory of sentence processing. Journal of Psycholinguistic Research 24 (6): 437-467. Freedle, R., and M. Craun 1970 Observations with self-embedded sentences using written aids. Perception and Psychophysics 7 (4): 247-249. Garret, Μ., T. Bever, and J. Fodor 1966 The active use of grammar in speech perception. Perception and Psychophysics 1: 30-32. Geer, S. Ε., H. Gleitman, and L. Gleitman 1972 Paraphrasing and remembering compound words. Journal of Verbal Learning and Verbal Behaviour 11: 348-355. Gibson, E., and J. Thomas 1999 Memory limitations and structural forgetting: The perception of complex ungrammatical sentences as grammatical. Language and Cognitive Processes 14 (3): 225-248. Geer, S. Ε., H. Gleitman, and L. Gleitman 1972 Paraphrasing and remembering compound words. Journal of Verbal Learning and Verbal Behaviour 11: 348-355. Gleitman, L. R., and H. Gleitman 1970 Phrase and Paraphrase. New York: W.W. Norton. Goldberg, A. 1995 Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Goldman-Eisler, F. 1958 Speech analysis and mental processes. Language and Speech 1: 5 9 75. Graesser, A. C., and Mandler, G. 1975 Recognition memory for the meaning and surface structure of sentences. Journal of Experimental Psychology: Human Learning and Memory 1: 238-248. Graesser, A. C., N. L. Hoffman, and L. F. Clark 1980 Structural components of reading time. Journal of Verbal Learning and Verbal Behavior 19: 135-151. Graf, R., and A. W. Torrey 1966 Perception of phrase structure in written language. Proceedings of the Annual Convention of the American Psychological Association 1966: 83-84. Greenbaum, S., and R. Quirk 1970 Elicitation Experiments in English. London: Longmans. Greeno, J. G. 1976 Psychological representation of structured knowledge. Journal of Structural Learning 5: 73-95.
68
Ngoni Chipere
Gross, M. 1979 On the failure of generative grammar. Language 55 (4): 859-885. Guthrie, K.S. 1988 The Pythagorean Source Book and Library. Phanes Press: Michigan. Halliday, Michael 1985 An Introduction to Functional Grammar. London: Arnold. Harrington, M. 1987 Processing transfer: Language specific strategies as a source of interlanguage variation. AppliedPsycholinguistics 8: 351-378. Hergenhahn, B., and M. Olson. 2001 Introduction to Theories of Learning. Englewood Cliffs, NJ: Prentice Hall. Hill, A. A. 1961 Grammaticality. Word Π: 1-10. Hockett, C. F. 1955 A manual of phonology. International Journal of American Linguistics 21 (4): Memoir no. 11. Holmes, V. M., and Κ. I. Förster 1970 Detection of extraneous signals during sentence recognition. Perception and Psychophysics 7 (5): 297-301. Hubey, Η. M. 1999 Mathematical Foundations of Linguistics. München and Newcastle: Lincom Europa. Hudson, R. 2000 Language as a cognitive network. In A Cognitive Approach to the Verb: Morphological and Constructional Perspectives, H. G. Simonsen, and R. T. Endresen (eds.), 49-72. Berlin: Mouton de Gruyter. Huey, Ε. B. 1968 The Psychology and Pedagogy of Reading. Cambridge, MA: MIT. Jarvella, R. J. 1971 Syntactic processing of connected speech. Journal of Verbal Learning and Verbal Behaviour 10 (4): 409—416. Just, Μ. Α., and Carpenter, P.A. 1992 A capacity theory of comprehension: Individual differences in working memory. Psychological Review 99: 122-149. Kail, M. 1989 Cue validity, cue cost, and processing types in sentence comprehension in French and Spanish. In The Crosslinguistic Study of Sentence Processing, B. MacWhinney, and E. Bates (eds), 7 7 117. Cambridge: Cambridge University Press.
A constructivist epistemology
69
Karanth, P., A. Kudva, and A. Vijayan 1996 Literacy and linguistic awareness. In Speech and Reading, B. de Gelder, and J. Morais (eds.), 303-316. London: Lawrence Erlbaum. Keenan, J. Μ., B. MacWhinney, and D. Mayhew 1977 Pragmatics in memory: A study of natural conversation. Journal of Verbal Learning and Verbal Behavior 16: 549-560. Kilbon, K., and T. Ito 1989 Sentence processing strategies in adult bilinguals. In The Crosslinguistic Study of Sentence Processing, B. MacWhinney, and E. Bates (eds), 257-291. Cambridge: Cambridge University Press. Kilborn, K., and A. Cooreman 1987 Sentence interpretation strategies in adult Dutch-English bilinguals. Applied Psycholinguistics 8 (41): 5-31. Kintsch, W., and E. Bates 1977 Recognition memory for statements from a classroom lecture. Journal of Experimental Psychology: Human Learning and Memory 3: 150-159. Kiss, G.R. 1968 Words, associations, and networks. Journal of Verbal Learning and Verbal Behavior 7 (4): 707-713. Kramer, P. Ε., E. Koff, and Z. Luria 1972 The development of competence in an exceptional language structure in older children and young adults. Child Development 43: 121-130. Lamb, Sidney 1966 Outline of Stratificational Grammar. Washington DC: Georgetown University Press. Langacker, Ronald 1987 Foundations of Cognitive Grammar I: Theoretical Prerequisites. Stanford: Stanford University Press. Le-Blanc, R. S., J. G. Muise, and J. Gerard 1971 Letter reading as a function of approximation to English and French. Perceptual and Motor Skills 33 (3): 1139-1142. Lefever, Μ. M., and L. C. Ehri 1976 The relationship between field independence and sentence disambiguation ability. Journal of Psycholinguistic Research 5 (2): 99-106. Lefton, L. Α., A. B. Spragins, and J. Byrnes 1973 English orthography: Relation to reading experience. Bulletin of the Psychonomic Society 2 (5): 281-282. Levelt, W. J. M. 1970 Hierarchical chunking in sentence processing. Perception and Psychophysics 8. 99-103.
70
Ngoni Chipere
Maclay, Η., and C. Ε. Osgood 1959 Hesitation phenomena in spontaneous English speech. Word 15: 19— 44. Maratsos, M. P. 1976 The Use of Definite and Indefinite Reference in Young Children. London: Cambridge University Press. Marcus, G. F. 2001 The Algebraic Mind: Integrating Connectionism and Cognitive Science. Cambridge, MA: MIT Press. Markov, A. A. 1913 Primer statisticheskogo issledovanija nad tekstom 'Evgenija Onegina' i 1 ljustrirujuschij svjaz' ispytanij ν tsep [An example of statistical study on the text of'Eugene Onegin1 illustrating the linking of events to a chain]. Izvestija Imp. Akademii nauk, serija 6 (3): 153-162. McDaniel, M. A. 1981 Syntactic complexity and elaborative processing. Memory and Cognition 9 (5): 4 8 7 ^ 9 5 . McDonald, J. 1989 The acquisition of cue-category mappings. In The Crosslinguistic Study of Sentence Processing, B. MacWhinney, and E. Bates (eds.), 375-396. Cambridge: Cambridge University Press. 1987 Reading skill and interpretation of temporary structural ambiguity. Language and Cognitive Processes 2: 179-203. Marks, M. R., and O. Jack. 1952 Verbal context and memory span for meaningful material. American Journal of Psychology 65: 298-300. Marks, L. E., and G. Miller. 1964 The role of semantic and syntactic constraints in the memorisation of English sentences. Journal of Verbal Learning and Verbal Behaviour 3: 1-5. Mehler, J., and P. Carey 1967 Role of surface and base structure in the perception of sentences. Journal of Verbal Learning and Verbal Behaviour 6 (3): 335-338. Mehler, J., T. G. Bever, and P. Carey 1967 What we look at when we read. Perception and Psychophysics 2 (5): 213-218. Miller, G. 1956 The magical number seven, plus or minus two: Some limits of our capacity for processing information. Psychological Review 63: 81— 97. 1962 Some psychological studies of grammar. American Psychologist 17 (10): 748-762.
A constructivist epistemology
71
Miller, G„ and S. Isard 1963 Some perceptual consequences of linguistic rules. Journal of Verbal Learning and Verbal Behavior 2: 217-228. 1964 Free recall of self-embedded English sentences. Information and Control 7: 292-303. Miller, G., and J. A. Selfridge 1951 Verbal context and the recall of meaningful material. American Journal of Psychology 63: 176-85. Miller, G., J. S. Bruner, and L. Postman 1954 Familiarity of letter sequences and tachiscopic identification. Journal of General Psychology 50: 129-39. Miller, G., G. A. Heise, and W. Lichten 1951 The intelligibility of speech as a function of the context of the test materials. Journal of Experimental Psychology 41: 329-35. Mills, J. Α., and G. D. Hemsley 1976 The effect of level of education on judgements of grammatical acceptability. Language and Speech 19 (4): 324-342. Mitchell, D. C., F. Cuetos, Μ. M. Corley, and M. Brysbaert. 1995 Exposure-based models of human parsing: Evidence for the use of coarse-grained (nonlexical) statistical records. Journal of Psycholinguistic Research 24 (6): 469-488. Moeser, S. D. 1974 Memory for meaning and wording in concrete and abstract sentences. Journal of Verbal Learning and Verbal Behaviour 13: 683—697. Motter, A. E., A. P. S. de Moura, Y. C. Lai, and P. T. Dasgupta 2002 Topology of the conceptual network of language. Physical Review Ε 65. Muise, J. G., R. S. Leblanc, and C. J. Jeffrey 1972 Letter reading by English Ss as a function of order of approximation to French and English. Psychological Reports 30 (2): 395—398. Muncer, S. J., and T. G. Bever 1984 Sensitivity to propositional units in good reading. Journal of Psycholinguistic Research 13: 275-279. Nooteboom, S. G., F. Weerman, and F. Ν. K. Wijnen 2002 Storage and Computation in the Language Faculty. Dordrecht: Kluwer. Onishi, S. 1962 The recognition of letter sequences with different orders of approximation to the Japanese language: On the eye-voice span. Japanese Psychological Research 4 (1): 43-47.
72
Ngoni Chipere
Osterhout, L. 1997 On the brain responses to syntactic anomalies: Manipulations of word position and word class reveal individual differences. Brain and Language 59: 494-522. Pearlmutter, Ν. J., and M. C. MacDonald 1995 Individual differences and probabilistic constraints in syntactic ambiguity resolution. Journal of Memory and Language 34: 521542. Piaget, J. 1970 Genetic Epistemology. New York: Columbia University Press. Pinker, S. 1999 Words and Rules: The Ingredients of Language. New York: Basic Books. Poly a, G. 1954 Mathematics and Plausible Reasoning. Princeton: Princeton University Press. Powell, Α., and R.G. Peters 1973 Semantic clues in comprehension of novel sentences. Psychological Reports 32: 1307-1310. Richardson, P., and J. F. Voss 1960 Replication report: Verbal context and the recall of meaningful English material. Journal of Experimental Psychology 60: 417-18. Rosenberg, S. 1968 Association and phrase structure in sentence recall. Journal of Verbal Learning and Verbal Behaviour 7 (6): 1077-1081. Sachs, J. S. 1967 Recognition memory for syntactic and semantic aspects of connected discourse. Perception and Psychophysics 2: 437—442. Sanders, L. J. 1971 The comprehension of certain syntactic structures by adults. Journal of Speech and Hearing Research 14: 739-745. Sasaki, Y. 1997 Individual variation in a Japanese sentence comprehension task: Forms, functions and strategies. Applied Linguistics 18 (4): 508-537. Scheerer, N. G., H. Ahola, U. Koenig, and U. Reckermann 1978 The use of oral redundancy with reading disabled children. Zeitschrift fuer Entwicklungspsychologie und Paedagogische Psychologie 10 (1): 35-48. Shannon, C. E., and W. Weaver, 1949 The Mathematical Theory of Communication. Urbana. Sharp, H. C. 1958 The effect of contextual constraint upon recall of verbal passages. American Journal of Psychology 71: 568-72.
A constructivist epistemology Singh, S. 2000 Spencer, N. 1973
73
The Science of Secrecy: The Secret History of Codes and Codebreaking. London: Fourth Estate.
Differences between linguists and non-linguists in intuitions of grammaticality-acceptability. Journal of Psycholinguistic Research 2: 83-98. Stevenson, R. J. 1960 Memory for referential statements in texts. Journal of Experimental Psychology: Learning, Memory, and Cognition 14 (4): 612-617. Steyvers, M., and J. Tenenbaum. 2003 The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth, http://www.sych. Stanford. edu/~msteyver/papers/smallworlds.pdf as at 19/03/2003 Stolz, W. 1967 A study of the ability to decode grammatically novel sentences. Journal of Verbal Learning and Verbal Behaviour 6: 867—873. Sutton, J. P., C. D Rittenhouse, E. Pace-Schott, and R. Stickgold 1994 A new approach to dream bizarreness: Graphing continuity and discontinuity of visual attention in narrative reports. Consciousness and Cognition: An International Journal 3(1): 61-88. Townsend, D. J., and T. G. Bever 2001 Sentence Comprehension: The Integration of Habits and Rules. Cambridge MA: MIT Press. Traul, G. N., and J. W. Black, 1965 The effect of context on aural perception of words. Journal of Speech and Hearing Research 8 (4): 363-369. Wilks, C. 2000 Untangling Word Webs: Graph Theory Approaches to L2 Lexicons. Unpublished PhD dissertation. University of Swansea. Wray, A. 2002 Formulaic Language and the Lexicon. Cambridge: Cambridge University Press. Zarri, G. P. 1976 A computer model for textual criticism? In The Computer in Literary and Linguistics Studies, A. Jones, and R. F. Churchhouse (eds.), 133-155. Cardiff: The University of Wales Press.
Chapter 3 'What's a nice NP like you doing in a place like this?' The grammar and use of English inversions Gert Webelhuth, Susanne Wendlandt, Niko Gerbl and Martin Walkow
1. Introduction1 Based on a large corpus of English inversion sentences collected by Betty Birner, this article will use standard linguistic and statistical tests to determine the grammatical structure and preferred use of inversions. The investigation will reveal that inversion sentences do not all behave alike but must be analyzed differently depending on the part of speech of the initial constituent. We conclude that English inversions form a construction family of the kind that has become familiar from in-depth analyses of other sentence types in English and other languages, e.g. topicalizations and Z/iere-sentences (Birner and Ward 1998), relative clauses (Sag 1997), interrogative clauses (Ginzburg and Sag 2001), subject-auxiliary inversion (Fillmore 1999), double object constructions (Goldberg 1995), complex predicate constructions (Ackerman and Webelhuth 1998), genitive constructions (Malouf 2000), and others. Second, we will (re-)address the issue of the degree of motivation of the form of inversion sentences. In other words, we will ask whether it is natural for sentences with the function of inversion to have the form such sentences show in English.2 Naturally, we would prefer English inversions to be instances of universal laws or at least universal tendencies of language. The work on the constructions referred to in the previous paragraph and many others has shown beyond much doubt, however, that often we have to live with generalizations at the level of individual languages, dialects, or even individual constructions. Most typical, indeed, seems to be the finding that particular sentence types instantiate universal construction types that are complemented by systemic properties of individual languages and/or construction families as well as construction-individuating properties. In a sense, then, as in other sciences the nature-nurture question in linguistics demands an answer where the truth lies somewhere in the middle. Put dif-
76
G. Webelhuth, S. Wendtlandt, N. Gerbl, and M. Walkow
ferently, the correct question to ask is: how many universal, language-particular, and construction-particular properties does a construction have and which are which? With respect to English inversion, our conclusion will be that one can definitely find strong universal tendencies realized in these constructions, but that there are idiosyncracies as well. The remainder of this article will be structured as follows: The next section will introduce examples of the inversion construction and lay out why they are of interest to linguistic theory in the context of the overall system of English grammar. In Section 3 we will look at the tension between descriptive and explanatory adequacy. Then we will compare the grammatical behavior of English inversion sentences with other English sentence types displaying non-canonical word order. We will find similarities with English existential sentences and thus consider the results of a typological study of such sentences cross-linguistically. Out of this will develop several hypotheses about the grammatical analysis of inversion sentences which we have successfully implemented in the grammar development system TRALE3. Section 5 will address an important empirical issue that threatens the success of our grammatical analysis of inversion sentences, namely the existence of focused subjects in English. This leads us to the formulation of an additional theoretical principle based on a statistical analysis of inversion sentences in the Birner Corpus4. Another computer simulation shows the revised system of hypotheses to be superior to the original system we had formulated. Our article concludes with some remarks about language acquisition and a summary of the overall argument.
2. English inversions and their theoretical importance The second sentence of (1) is an example of an inversion sentence. Its noninverted counterpart appears in (2): (1)
An important part of the day is going to class. But [Ap just as important] is [Npthe on-the-job training program], (Action News)
(2)
But [Npthe on-the-job training program] is [APjust as important].
We can represent the difference between those two sentences schematically as follows:
The grammar and use of English inversion
(Γ) (2')
77
AP -be - SUBJ SUBJ - b e - AP
In the canonical order (2'), the subject appears before the copula, which in turn is followed by a predicative adjective phrase. In the inversion sentence ( Γ ) the subject and the adjective phrase trade places. Additional attested examples of this kind appear below: (3)
Attractive fares are important in trying to increase ridership on SEPTA's commuter rail lines, but [Apeven more important] is [NP convenient and reliable service]. (Philadelphia Inquirer, p. 22-A, 10/26/83, editorial 'More trains, more riders')
(4)
Your article on disease control was fascinating, and your depiction of the AIDS crisis sensitively written. [AP More frightening than the disease] are [NP the attitudes of some people in the mainstream of society]. (Time, 7/25/83, Letters to the Editor)
All the sentences presented up to this point have an adjective phrase in first position. The Birner Corpus also contains sentences with initial phrases of other parts of speech. (5)-{7), for instance, begin with a participle phrase: (5)
Labor savings are achieved because the crew is put to better use than cleaning belts manually; [PartPalso eliminated] is [Npthe expense of buying costly chemicals], (WOODEXTRA, August 1988)
(6)
Discussion of the strategy began during this year's General Assembly and will conclude next year. [ParlP Dropped from consideration so far] are [NP the approaches of the past], which The Economist recently described as based on the idea that the rules of orthodox economics do not hold in developing countries. (NYT Week in Review, November 5, 1989, p. 2)
(7)
Soviet visitors are barred from visiting California's Silicon Valley; The New York Times said yesterday that last week the State Department presented to the Soviet Union a new list of areas open and closed to Soviet journalists and diplomats. [Par,PAlso listed as o f f -
78
G. Webelhuth, S. Wendtlandt, N. Gerbl, andM. Walkow limits] were [NP Houston and Dallas]. (Philadelphia Inquirer, p. 3-A, 11/21/83, column National and International News in Brief)
Prepositional phrases are possible as well and, in fact, are statistically dominant: (8)
Ingrid Bergman won an Oscar for her portrayal of an amnesiac trained by exiled Russians to impersonate the czar's daughter in the mystery Anastasia. [ppAlso in the cast] are [w Yul Brynner, Helen Hayes, andAkim Tamiro], (Philadelphia Inquirer, p. 6-C, Worth Watching)
(9)
Soon it was dark, and the bells tinkled contentedly that the camels were feeding not more than a hundred yards away. Then the bells rang a different tune, telling that the animals were coming hurriedly to camp, and [PP into the light of the fire] appeared [NP their heads] on a level as they waited for Bony to cut a crust for the waiting dog to serve to them. And thus was Bony taken to old Patsy Lonergan's camps, with no roads to follow, no tracks to lead him. (A.W. Upfield, Man of Two Tribes, 1956; Collier Books reprint, New York, 1986, p. 49)
We thus need to generalize the schema in ( Γ ) and (2') to the following one, which allows phrases other than adjective phrases to appear in first position: 5 ( 1 " ) XP -be- Subj ( 2 " ) Subj - be-XP (XP stands for a category neutral phrase.) The theoretical importance of inversion sentences lies in the non-canonical behavior of their (logical) subjects. Clearly, when faced with the choice between ( 1 " ) and (2"), overwhelmingly speakers of English choose the latter sentence form over the former. This underlies the intuition that the word order in (2") is "more normal" or "more canonical" than the word order in (1") and the characterization of the latter as "inverted" relative to that in (2") rather than the reverse.
The grammar and use of English inversion
79
The preference of subject-initial over subject-non-initial word order in English is an instance of a cross-linguistic generalization that prefers subjects to appear before the pieces realizing the predicate (i.e., its head and its complements) in representative samples of the world's languages. As the following table from Tomlin (1986) shows, more than 4/5 of the languages of the world are subject-initial in their basic word order: Table 1. The frequencies of basic constituent orders in a representative sample of the languages of the world (Tomlin 1986: 22, Table 1) Constituent Order SOW OSV
Number of Languages 180 0
Frequency in Final Sample 44.78
-V-
SVO OVS
168 5
41.79 1.24
V-
VSO VOS
37 12
9.20 2.99
402
100.00
-V
Totals
0.00
3. The tension between descriptive and explanatory adequacy It is a common observation that among the languages of the world one finds a remarkable amount of grammatical differences, yet, on a more abstract level these differences can be shown to result from factorizations of a finite number of properties. In the wake of the Principles and Parameters approach pioneered by Chomsky (1981, 1982, 1986) it became common to describe perceived regularities among languages in terms of a set of principles that the human language faculty is innately endowed with. One of the better known ones is the headedness-principle, which states that all linguistic structures must be headed. One of the parameters of this principle is where the head occurs in linear order - before or after its sisters. As an example compare the standard word orders in Arabic (10), English (11), and Japanese (12). This kind of variation is highly systematic and can easily be captured in Chomsky's approach.
80
G. Webelhuth, S. Wendtlandt, N. Gerbl, and M. Walkow
(10)
ra'a: mariam Ju:n. Saw Mary SUBJ John OBJ 'Mary saw John.'
(11)
Mary saw John.
(12) Mary-ga John-o mi-ta. MarySUBJ John OBJ saw 'Mary saw John.' However, besides systematic differences between languages, there are also idiosyncratic differences, which grammars of the Principles and Parameters type cannot adequately account for. An example of this is the WXDY 6 Construction (Fillmore and Kay 1999). (13)
a. What is that fly doing in my soup ? b. What is it doing raining? c. What is a nice NP like you doing in a place like this?
There are observable restrictions on this construction, namely that the elements in the X and Y position must be able to stand in a subject-predicate relation. The interesting and problematic aspect of this construction is that it is completely productive. Any subject-predicate relation, of which there are infinitely many, fits in. Furthermore, the words occurring in the construction are not completely fixed. Whereas the linear order of what, be and doing is fixed, be may take any inflected form and when a verbal predicate occurs in the construction, it takes progressive aspect, but there need not be a verb at all. The pragmatic function, namely that of expressing surprise, is fixed as well. The existence of such a construction and the restrictions thereon are not inferrable from Universal Grammar or derivable by parameter settings. The divide between the idiosyncratic and the universal in the grammar of an individual language has been bridged by assuming a distinction between core and periphery. The core is that part of a language's grammar which is derived from Universal Grammar. The periphery on the other hand is the part which is particular to a language. Often peripheral phenomena have been treated as lexically idiosyncratic, an argument unavailable in the case of the WXDY-construction because of its mixture of free and fixed elements and its unlimited productivity.
The grammar and use of English inversion Turning now to the English inversion construction, we can ask to what extent it is an instance of universal properties and to what extent its properties are specific to English or even to the construction itself.
4. Inverted subjects in English The subject in English canonically appears to the left of the predicate. Yet there also exist several constructions where this is not the case. Among these are constructions with expletives such as cleft-sentences in (14) and existential constructions in (15), locative inversion in (16), and other constructions with fronted non-subject constituents, as in (17). Cleft sentences as in (14) will not be dealt with here. It] was [NP Johnny] that stole her money while we were away in France. (Collins 1991: 51)
( 1 4 )
[EXPINP
(15)
[ExpiNP There] is [up a dog] on the street.
(16)
[ppOn the street] is [NP a dog].
(17)
2i.[ApJust as important] is fap the on-the-job training program]. b.[par,ρ Also eliminated] is [NP the expense of buying costly chemicals]. c.fppAlso in the cast] are [NP Yul Brynner, Helen Hayes, and Akim Τ amir o].
In order to evaluate the cross-linguistic status of inversions in English, consider first the existential construction in (15). This sentence form alternates with locatives such as A dog is on the street. This alternation is common among the world's languages as Clark's (1978) study shows. Clark found the following distributive patterns:
81
82
G. Webelhuth, S. Wendtlandt, N. Gerbl, and M. Walkow
Table 2. Word Order Alternations (Clark 1978: 96, Table 3) Word Order Alternations in the Hyperlocative Constructions Language Amharic Bengali Chuvash Eskimo Hindi Japanese Malayalam Sumerian Swahili Turkish
Existential Loc Nom V Loc Nom V Loc Nom V Loc Nom V Loc Nom V Loc Nom V Loc Nom V Loc Nom V Loc Nom V Loc Nom V
Locative Nom Loc V Nom Loc V Nom Loc V Nom Loc V Nom Loc V Nom Loc V Nom Loc V Nom Loc V Nom Loc Nom Loc V
Syrian Arabic Mandarin Chinese Estonian Finnish German Modern Greek Kurukh Panjabi
Loc V Nom Loc V Nom Loc V Nom Loc V Nom Loc V Nom Loc V Nom Loc V Nom Loc V Nom
Nom V Loc Nom Loc Nom V Loc Nom V Loc Nom V Loc Nom V Loc Nom V Loc Nom V Loc
English French Spanish
pro-Loc V Nom Loc pro-Loc V Nom Loc pro-Loc V Nom Loc
Nom V Loc Nom V Loc Nom V Loc
Hebrew Hungarian Luiseno
V Nom Loc V Nom Loc V Nom Loc
Nom V Loc Nom V Loc Nom V Loc
Languages that retain the same order in both constructions: Nom Loc V Basque Gujarati Loc Nom V Kashmiri Nom V Loc Nom Loc V Mundari Tagalog Loc Nom Twi Nom V Loc Nom V Loc Yoruba Total number of languages with an alternation in order = 24 Total number of languages with no change in order = 7
The grammar and use of English inversion
83
We observe that in a wide variety of unrelated languages the existential and locative constructions differ in word order. Whereas in the locative construction the subject typically precedes the locative and the verb (provided there is a verb), the subject typically is not sentence-initial in the existential construction. Furthermore, in the majority of languages, the locative precedes the subject in the existential construction, similar to the situation in English.7 We assume that this difference in word order is functionally related to the universal strategy of dividing utterances into topic and focus. The subject is canonically linked to topicality (Keenan 1976), which conforms to the word order in the locative construction. If the relation between topicality and subjecthood is broken, however, the subject needs to be realized in a non-canonical position, which we see in the existential construction. Based on Clark's patterns we postulate two universal tendencies:8 A) SUBJ[.Foc] > predicate B) X[part of the predicate] > SUBJ[+Foc] (= inversion) This means that a non-focused subject should always precede its predicate. This case would be the canonical word order. If the subject is focused, it should be preceded by another element. This element can either be the predicate, the locative, or both of them. Applying these universals to the English constructions in question, we come to the following working hypothesis: the logical subject of English inversion constructions is focused. Hence the subject must be focused in the following constructions, which all differ with respect to the element which precedes the subject: Table 3. Word Order Alternations (Clark 1978: 96, Table 3) Be [AP Just as important] Is [partp Also eliminated] Is [PP Also in the cast] Are [EXPINP
There]
[EXPINP I t ]
Is Was
focused subject ( x p t h e on-the-job training program]. [Npthe expense of buying costly chemicals] [MP Yul Brynner, Helen Hayes, and Akim Tamiro] [NP a dog] on the street. [NP Johnny] that stole her money...
However, stipulating that in inversion constructions the subject must be focused is insufficient. We still need to provide an account of how the
84
G. Webelhuth, S. Wendtlandt, N. Gerbl, andM.
Walkow
inversion comes about. Empirically speaking, languages seem to have a choice between syntactic and lexical tools to achieve inversion. Adopting the first solution means (i) allowing syntactic scrambling operations to produce word order alternations involving the subject and (parts of) the predicate and (ii) introducing a linearization constraint along the lines of X[part of the predicate] > SUBJ[+Foc], saying that a focused subject has to be preceded by another element. The second possibility of getting to such a structure is a lexical constraint like the following. word
POS V
=>[ARG-ST([INF-ST non- focus],...)]
Figure 1. Non-focused initial argument constrain
This constraint, which uses the formalism of Head-Driven Phrase Structure Grammar (HPSG), states the following: any word that belongs to the part of speech (POS) V(erb) is constrained to have an unfocused element heading its argument structure. This is achieved by constraining the first element on the list of arguments to have as the value of the feature Inf-St (information status) something of type non-focus. Since the first element of the argument structure is linked to the subject position9, this would prevent focused logical subjects from being realized in the canonical subject position, yielding an argument realization like the following: S Arg^l V
VP
SUBJ[FOC]
...
Arg_n
Figure 2. Argument realization
This diagram shows that, when a subject is focused, it must follow the first argument and in such a case the first argument is in the canonical subject position. Clark's (1995) study on language acquisition suggests that syntactic properties are learned specific to lexical items. Hence, we adopt a lexical approach to the English inversion construction and assume that the source of the inversion is to be found within the lexicon. A further argument for
The grammar and use of English inversion
85
this solution derives from the absence of syntactic scrambling operations elsewhere in the grammar of English. Under the hypothesis that focused subjects are assigned non-initial positions on argument structures and considering the data in Table 3, which confirms that the initial position can be occupied by different kinds of constituents, a mechanism for assigning this initial position to such constituents is needed. Such a mechanism could be realized by different lexical entries for the copula as shown in the diagrams below. (be, [ARG-ST (NPF, AP)]) Figure 3. Lexical entry for be (with no inversion)
NP
VP
The on-the-job training program V
AP
I is just as important Figure 4. Tree structure for (2)
In Figure 3 we see what the argument structure of be must look like to produce a sentence like the one shown in Figure 4. The argument structure is headed by an unfocused noun phrase, followed by an adjective phrase. (be, [ARG-ST (AP, NPtf,)]) Figure 5. Lexical entry for be (with inverted AP) S AP
VP
Just as important V
NP
is the on-the-job training program Figure 6.
Tree structure for (1)
Figure 5 shows what argument structure for be we must assume to account for sentences like the one in Figure 6. The first element on the argument
86
G. Webelhuth, S. Wendtlandt, N. Gerbl, andM. Walkow
structure of this be is an adjective phrase followed by a focused noun phrase. (be, [ARG-ST (NP[FORM there], NP+Fbe, PP)]) Figure 7. Lexical entry for existential be
S NP There
VP
V V
NP
PP
Figure 8. Tree structure for There is a dog on the street
To account for sentences like that in Figure 8, i.e., existential constructions, we have to assume a lexical entry for the copula as shown in Figure 7. In this case the argument structure is headed by the expletive there, which is followed by a focused noun phrase - the logical subject - and a prepositional phrase. The different lexical entries of be that we have postulated are plausibly related by lexical rules. To prove the workability of these assumptions, we have implemented them in a grammar fragment of English in the grammar development environment TRALE. Parsing a sentence with a canonical structure like The dog is clever yields the following output:
s VP
hs-phrase
is the
clever
dog
Figure 9. Tree structure for The dog is clever
The grammar and use of English inversion
87
TRALE allows us to show just the features that are of interest. Hence in Figure 9 is shown the value for the feature Inf-Str (information status) under the SYNSEM path (under this path is stored syntactic and semantic information). So we see in Figure 9 that the dog, being the topic of the sentence, is non-focused. Topic is a subtype of non-focus. This is in accordance with our prediction, since in this case the dog must be the initial element of the argument structure, and thus is barred from receiving focus. Parsing the sentence with a non-canonical word order yields a different focus distribution as is shown in the following diagram (Figure 10). We see in Figure 10 that the logical subject has changed its information status shown under Inf-Str. It is not a topic anymore like in Figure 9 but it is focused. Apart from that we see that the element in the canonical subject position is unfocused (its value for Inf-Str is of type non-focus). So the order of constituents seems to affect their focal status. This is in accordance with our prediction that the initial element of the argument structure is represented by an adjective phrase in this case. Hence the logical subject the dog is the second element and may be focused. S VP
'word
PHON SYNSEM
[3]
[INF-STR
non-focus
jg hs-phrase Tsynsem
ARG-STR [2] SYNSEM [15]
INF-STR LLOC
the
föCUS [4]
dog
Figure 10. Tree structure for Clever is the dog
5. A problem The analysis presented in the previous section runs into problems with sentences containing focused subjects in canonical subject position. We see examples of this in (18) and others are easy to find:
88
G. Webelhuth, S. Wendtlandt, N. Gerbl, and M. Walkow
(18)
Only YOU can make this world seem right Only YOU can make the darkness bright (From the song Only you by The Platters)
Our assumptions thus need to be revised to allow for focused subjects in canonical subject position in at least some cases. A similar and related problem is that we treated all postverbal logical subjects alike: we uniformly require them to be focused. However, a careful analysis shows that the inversion constructions differ from each other in how frequently the logical subject is definite. The distribution of definite and indefinite inverted subjects of adjective phrase inversion constructions in the Birner Corpus and a sample of existential sentences which we have collected from the British National Corpus is shown below: Table 4. The distribution of definite NPs in AP-inversion and existential constructions Construction AP-be-NP[Foc] There-be-NPfFoci
NPfDef +i in corpus (%) 70.1 7.3
Significance ρ _Φ CO cö ο ο 03 Q. CO Ο) Ε ι —
Figure 2. Word frequency distribution of the Gutenberg corpus.
Now, it is important to note that the basic patterns codified by Zipfs law apply even more dramatically to word combinations. The frequency of word pairs drops even more quickly than the frequency of words, and that of word triplets, more steeply yet, so that only a very small percentage of word trigrams appear with any significant frequency. And so forth. Essentially, this means that the data from which any statistically based learning takes place has to be drawn from the most frequent words and word combinations; any facts about language that cannot be learned from them, arguably cannot be learned statistically with any reliability or consistency unless the samples are very large indeed - larger than most available corpora even now, when corpora containing hundreds of millions of words have become quite common. However, the word sequences which correlate with constructions are in fact among the more frequent in the language. For example, a sequence like him the or me consists entirely of function words, the most frequent in the language, and has a very high overall frequency. Even rarer constructions such as the way construction can be reliably identified by word sequences which, though low in overall frequency, are quite frequent with some words (e.g. combinations of a possessive pronoun with the noun way). In fact, the qualitative pattern described by Zipfs law applies quite as much to constructions as to word counts:
114
-
-
Paul Deane
If we examine the distribution of alternative words that occupy the same slot in the same construction, we discover that just a few words account for most instances of the construction, yet though there is a very broade range of words that are compatible with a construction, most of them appear in it quite seldom. If we examine how a word is distributed among the alternate constructions with which it is associated, we discover that most instances of a word appear in just a few constructions, but that there is a large set of constructions in which a word can (but seldom does) appear.
The easiest way to establish this point is to examine the distribution of constructions by counting bigrams or trigrams that indicate the construction's presence. When we do this, the pattern noted above occurs again and again. Thus, if we explore the frequency of contexts associated with ditransitivity, we discover that a few verbs supply most instances of the construction (Figure 3). And similar observations apply if we examine, instead, how prominent the ditransitive construction is for particular verbs. The chart which follows assigns verbs a prominence which reflects how high in the trigram rank/frequency table we find trigrams reflecting the ditransitive construction; for instance, if a word's most frequent ditransitive context is at Zipfian rank 7, its share of the pie chart is l/7th; thus the chart gives a fairly clear idea of how far down the frequency distribution one must go to find an instance of the construction (Figure 4). What these results suggest is that the actual process of learning the ditransitive construction essentially has to happen by extension from a relatively small set of verbs like give and get which are strongly associated with the construction both in terms of overall construction frequency and in terms of the choice of constructions available for each verb. Similar results can be multiplied for a variety of constructions. Let us examine the way construction. If we approximate the way construction by looking for verbs that take expressions like her way, his way, their way etc. as direct object, we observe the distribution shown in Figure 5.
Cooccurrence and
constructions
give tell make show get bring hand lend find ask grant owe buy take promise allow leave build win refuse sell deny pass blow permit knit loan kill throw feed toss bake shoot fling cook guarante bequeath crush slay reserve quote earn wire slap poke kick grab destroy allocate 0
500
1000
1500
2000
2500
Figure 3. Frequency of ditransitive contexts by verb in the Gutenberg corpus.
116
Paul Deane
Word Give Lend Show Loan Hand Make Grant Owe Buy Build Bring Knit Bake Tell Sell Get Toss Deny bequeath Refuse
Zipfian Rank of Most Frequent Ditransitive Context 1 2 2 2.5 3 3 4 4 5 7.5 8 11.5 12 13 175 21 21 24 26.5 28
Word Win Promise Find Permit Blow Ask Guarantee Leave Feed Wire Cook Fling Allow Throw Kill Take Pass Shoot Earn Slay
Zipfian Rank Frequent Ditr Context 29 33.5 44 54 58.5 68 101 125.5 126 132 144 152 235 279.5 311 376 415 416.5 674.5 809.5
Figure 4. Prominence of the ditransitive construction for selected verbs
4000 3500 3000 2500 2000 1500 1000 500 11 p n m n i n i
iiiniBifiiiiiiii
> UT) too |C KS.S 3 &•»Sc
Figure 5. Frequency of way-construction contexts by verb for the Gutenberg corpus
Cooccurrence and constructions
117
I.e., a few frequent verbs, such as make and find, plus a few semantically typical verbs, such as feel, grope, fight, and push, account for the lion's share of the occurrences of the construction. And a similar pattern holds when we examine the prominence of the way construction by verb:
Word Grope Wend Thread Worm Edge Elbow Wing Plough Work Fight Push Force Hew Battle Wind Steer Win Cut Plow Pick Make
Zipfian Rank of Most Frequent Way Construction Context 1 1 1 1 1 2 2 3.5 5 7 7 9 10 10 12.5 14.5 15.5 18 18 21 30
Word Make Find Nose Bore squeeze Plod Feel Tear thrust wriggle Bend Burst cleave Dig Steal break Eat Beat Trace shoulder Take
Zipfian Rank of Most Frequent Way Construction Context 30 30 34.5 41.5 43 43.5 64 66 79 83 113 138.5 150.5 168.5 194.5 253 269 321 402.5 429.5 992.5
Figure 6. Prominence of the way construction for selected verbs What these patterns suggest is that in fact the critical data necessary to infer the formal properties of constructions are in fact present in the highfrequency portion of natural language frequency distributions, where there is enough data to support statistical inference. Thus the story about language learning that the data suggest is that (the formal aspects of) constructions may be learnable on the basis of statistical inference using data
118
Paul Deane
from the high-frequency portions of natural language frequency distributions. In other words, the technique that suggests itself is that candidates for constructions can be identified by examining frequent η-grams for semantically related words. Consider, for example, verbs of physical contact, such as hit, strike, pat, and the like. These frequently appear with such bigram contexts as her on, him on, and the like. Such a trigram is the 3rd ranked trigram for patted, the 4th ranked trigram for touched, the 11th ranked for hit, and the 15th ranked for struck, all from the left portion of the Zipfian curve. Corpus search reveals that these trigrams are almost always (94%) followed by a noun phrase of the form the plus body part noun; thus, typical sequences look like the following: (10) pat her/him/them, on the shoulder pat her/him/them on the head touched her/him/them on the shoulder touched her/him/them on the arm hit her/him/them on the head hit her/him/them on the nose struck her/him/them on the breast struck her/him/them on the mouth Obviously, such sequences - identifiable by such minimal word sequences as him on, her on, etc. in combination with a verb of local action - constitute exactly the data needed to induce the appropriate generalizations about this construction, up to and including the odd fact that the definite article is strongly preferred over other possible determiners such as possessives before the body part noun. Or to take another example, verbs of self-motion such as lift, raise and lower frequently appear as the first word in trigrams of the form VERB + POSSESSIVE + BODY PART NOUN.
Such sequences constitute 6 of the 10 most frequent trigrams beginning with raise, 5 of the 10 most frequent trigrams beginning with lift, and 4 of the 10 most frequent trigrams beginning with lower. Corpus search reveals that these trigrams are almost always preceded by an animate noun or pronoun; in fact 55% of the time the immediately preceding word is he or she. The data thus strongly support postulating a construction of the general form X MOVE X'S BODY PART. In general, therefore, there appears to be significant evidence favoring the view that constructions may be learnable using statistical evidence plus
Cooccurrence and constructions
119
inference over common word sequences instantiating each construction. The question that remains to be answered is whether it is possible to move beyond manual examination of contexts to an algorithm that directly identifies candidates for constructional status.
4. Distributional measurement of similarity There is an extensive literature which addresses the use of cooccurrence data to measure the distributional similarity of words. While classical LSA is one such method, it may not be the most effective; within computational linguistics, a variety of other methods have been explored. Early NLP work includes Grefenstette (1992, 1994), Schütze (1992, 1993), Church et al. (1994), and Finch, Chater and Redington (1995). The state of the art is probably represented by Lin (1998); see Curran and Moens (2002) for a discussion of various issues affecting performance of these systems, and Lee (1999) for evaluation of the efficacy of various similarity metrics within the general framework. All of these systems perform in essentially the same way: first, cooccurrence statistics are collected from a corpus. The data can be simple bigram and trigram frequency counts, word by document frequency counts, as in LSA, or frequency of word-word relationships in a grammatically analysed corpus, as in Lin (1998). In some cases, such as LSA, an analytical step such as Singular Value Decomposition is applied as an intermediate step to improve the quality of the data. Finally a similarity or dissimilarity metric, such as cosine similarity, the Kullback-Leibler divergence, or one of a number of other metrics is applied to yield a ranking which estimates the degree to which any pair of words have similar or dissimilar distributions. Measures of this type have well known limitations and problems. First, the results are only as good as the corpus used for training; second, the results are far more reliable for common words than for rarer words due to sparsity of data problems; and third, these methods gloss over important linguistic distinctions such as the difference between different senses of the same word; by default, vector space methods assign one representation to every word, so that the results for ambiguous words represent an average over all senses. In short, if one wished to use vector-space representations of word meaning as actual models of the cognitive processes going on in people's minds, there are clear problems; as indications of the structure of the data and as a baseline measurement of what human semantic memory must minimally be able to represent, they are invaluable; and as engineer-
120
Paul Dearie
ing approximations to semantics, they have the advantage of being easy to train and of capturing a significant portion of the information needed to simulate human semantic processing. The output of such systems is typically noisy. For instance, the Educational Testing Service has access to a large corpus of texts such as might be read by students in grades K-12, including novels and similar reading matter, textbooks, and a range of other materials. This corpus, the Lexile corpus licensed from Metametrics Inc., contains over 400 million words. When the methods outlined in Lin (1998) are applied to this corpus the resulting word-word similarity values are clearly useful, but not perfect. For instance, the most similar words to the word house are: (11) Dekang Lin similarity values for house calculated from the Lexile corpus Room place home building car kitchen school cabin door tree office town barn apartment store church bed
0.29 .27 .26 .26 .23 .22 .21 0.21 0.21 0.21 0.21 0.20 0.19 0.19 0.19 0.19 0.18
window street face wal family shop life wood city field yard camp village hall table side garden
0.18 0.18 0.18 0.18 0.18 0.18 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.17 0.16
And for the word cat they are: (12) Dekang Lin similarity values for cat calculated from the Lexile corpus
Cooccurrence and constructions
dog animal bird wolf bear kitten rabbit creature rat snake dragon horse Pig kid baby boy mouse beast fox child puppy
021 0.24 0.23 0.21 0.21 0.21 0.21 0.20 0.20 0.19 0.18 0.18 0.18 0.18 0.18 0.17 0.17 0.17 0.17 0.17 0.17
fish cow girl monkey lion monster squirrel woman deer guy ghost chicken brother indian soldier ones little person old spider
121
0.17 0.17 0.17 0.17 0.17 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.16 0.15 0.15 0.15 0.15 0.15 0.15 0.15
The top ranked values are often quite good, synonyms or near synonyms, but the reliability of the information provided tends to degrade fairly quickly as one moves down the similarity ranking. However, even with the noise issues, vector space metrics such as this are useful. Quite a large amount of information can be extracted from them, particularly if one applies data clustering techniques (cf. Pantel and Lin 2002). The purpose to be addressed here, however, is determining whether there is enough information in raw text to support bootstrapping of syntactic constructions. This purpose places a number of constraints on the type of vector space measure one would wish to use. The Dekang Lin measure is explicitly linguistic; it requires a parser to preprocess the corpus and extract grammatical relationships. If we wish to determine how much information can be extracted without any prior knowledge, however, the data source must be much simpler, e.g. counts of raw word sequences, to avoid circularity. Also, most vector space measures have focused on assessments of word-word similarity. But the issue at hand here is not just the acquisition of words, but the acquisition of constructions, which means that it is necessary not just to calculate how similar words are to each other given the
122
Paul Deane
contexts they share, but to calculate how similar contexts are to each other given the words with which they characteristically appear. In order to meet these design considerations, a different vector-space method was employed similar in overall character to various methods discussed above (see Appendix A for details). In particular, the following points should be noted: (i)
(ii)
(iii)
(iv)
the method was applied to the Lexile corpus, so that it is based on the same data as the Dekang Lin similarity metric discussed above; in addition, it was restricted to the same word list as the Dekang Lin metric, to make it possible to induce direct comparisons. all data was derived from frequency counts indicating how often words appeared with particular unigram and bigram contexts, e.g. the source data included such information as the fact that the context themselves to appeared 156 times with the verb attach, and a total of 3761 times in the corpus as a whole. various postprocessing steps were included designed to improve the quality of the data, including eliminating various contexts that were not particularly informative; a similarity vector was induced not only for term-term similarity, but also for context-context similarity. (The latter measurement is possible in almost all vector space techniques, though it does not appear to have been generally used.)
The resulting metric for distributional similarity appears to be very useful, and appears, in fact, to outperform the Dekang Lin similarity measure in important ways. The Dekang Lin measure appears, in general, to be more effective at returning synonyms and near-synonyms near the top of the similarity rankings, and its linguistic basis makes it more effective at providing useful results for words that have different uses as nouns, verbs, or adjectives; however, the metric employed in this study appears to be significantly more effective at filtering out noise and returning words from the same general semantic class, at essentially the fineness of categorization selected by grammatical constructions. For instance, the following list shows the terms most similar to house using this metric: (13)
Terms most similar to house using the cosine against the vector space representation used in this study:
Cooccurrence and constructions
store building cottage cabin shop hut office apartment barn chamber lodge mansion room shack restaurant farmhouse arena chapel salon factory auditorium
0.835 0.819 0.799 0.796 0.794 0.783 0.781 0.778 0.772 0.766 0.763 0.759 0.755 0.736 0.736 0.731 0.726 0.722 0.718 0.709 0.707
shed tent schoolhouse village warehouse studio theater diner parlor bathroom garden bungalow cubicle corral compartment cafe palace lobby sanctuary fireplace
123
0.706 0.686 0.680 0.675 0.675 0.672 0.670 0.668 0.667 0.665 0.665 0.664 0.662 0.661 0.659 0.659 0.659 0.659 0.656 0.654
And the following are the words most similar to cat: (14)
Terms most similar to cat using the cosine against the vector space representation used in this study: dog wolf lion snake bear bird rabbit mare squirre dragon fox goat shark
0.941 0.883 0.871 0.869 0.865 0.849 0.846 0.845 0.844 0.842 0.832 0.820 0.818
pony mouse Pig colt horse moose monkey unicorn deer coon lizard crow mule
0.792 0.791 0.790 0.784 0.779 0.779 0.775 0.774 0.770 0.760 0.755 0.750 0.740
124
Paul Dearie
rat turtle cow bull owl whale raccoon
0.818 0.817 0.816 0.807 0.806 0.802 0.798
puppy kitten frog boar tiger panther dinosaur
0.736 0.732 0.732 0.728 0.727 0.727 0.727
The context-context similarities appear to be equally useful. Suppose, for instance, that we take the context a brave . Intuitively, the word that should appear next is a noun denoting a human. The context-context similarity metric should therefore return other contexts associated with humans, and in fact it does: (15)
Unigram and bigram contexts most similar to a brave to the metric used in this study bigram plump brave handsome clever stocky lanky tri gram , handsome a charming , lanky handsome , thin handsome , wiry _ , slender handsome , muscular
0.888 0.867 0.818 0.804 0.801 0.787
0.854 0.818 0.816 , 0.813 0.810 with 0.808 0.807 0.801 0.798 . 0.792
according
dapper cheeked attractive boned mannered bespectacled
0.776 0.772 0.766 0.766 0.759 0.757
- boned a yelled the handsome - cheeked a lanky plump little - shouldered a stocky the smiling , bearded
0.792 0.774 0.773 0.773 0.773 0.770 0.769 0.767 0.765 0.764
Note that the context-context similarities are not based on structure: only on similarity with respect to the kinds of words with which the contexts combine. However, even in the current example, not particularly chosen to cor-
Cooccurrence and constructions
125
respond to a constructional pattern, the majority of the high-ranking examples clearly are drawn from one grammatical type: adjective-noun constructions compatible with the structure of the original context, falling into the general category of adjectives evaluating personal attributes of human beings.
5. What contextual similarity tells us about constructions What distinguishes a grammatical construction from a random collection of contexts is (a) that it is structured, and (b) that its open slots are subject to selectional constraints. With respect to (a), for instance, all examples of the ditransitive construction must follow the grammatical pattern [Vp V NPi NP2]; all examples of the way construction must follow the pattern [V NP's way PP]; and so forth. With respect to (b), we know, for instance, that there needs to be a possessive relationship between NPi and NP2 in the ditransitive construction, that the verb in the way construction must denote or be interpretable as a manner of motion verb, and so forth. This suggests a very simple hypothesis: (16) Construction correlate hypothesis Constructions should have η-gram context correlates in which (a) the contexts are highly similar according to the vector space measure, and (b) the word plus context combinations are structurally parallel. As we shall see in a quick survey of the constructions that have been discussed in this paper, this hypothesis appears to be born out.
6. Correlates of the ditransitive construction In the analysis presented earlier in this paper, the ditransitive construction was related to sequences of the form him the, me this, and the like, which corresponds to the double-NP boundary that is characteristic of ditransitives. If the construction correlate hypothesis is correct, we should be able to demonstrate the following points:
126
Paul Deane
(i)
That bigram contexts like him the or me this are in fact similar by the vector space metric as well as being parallel sequences in terms of part of speech labels; that the verbs which the vector space method predicts will precede such sequences are prototypical ditransitive-construction verbs; that the nouns which the vector-space method predicts will follow such sequences are prototypical possessions.
(ii) (iii)
All of these predictions appear to be born out. Consider the following list, which comprises the contexts most similar to him the by the vector space measure: (17)
Contexts most similar to her the me his them the him his me this her her him my him her me her us the her my us his me something us this myself the me nothing them something me the me these
0.994 0.993 0.992 0.991 0.990 0.990 0.990 0.989 0.989 0.989 0.988 0.987 0.987 0.986 0.985 0.984 0.984 0.983 0.982
him the me your us one her his you the them his us her them their them her you my us our me stuff her anything us these you those everyone the her something me all him something them what
0.982 0.982 0.982 0.981 0.981 0.980 0.980 0.979 0.979 0.979 0.978 0.978 0.978 0.977 0.977 0.977 0.976 0.976 0.975
As examination of these contexts reveal, they are all highly parallel: sequences of the form personal pronoun plus determiner or pronoun, precisely what is needed to signal the basic double-NP property of the ditransitive construction Furthermore, if we examine the verbs which correlate strongly with these contexts, we find that they are strongly and consistently the typical
Cooccurrence and constructions
127
ditransitive verbs; i.e., the most strongly associated verbs include: (18) Verbs most strongly associated with the contexts in (17) give show hand teach bring cost
tell lend call offer owe ask
teach spare send grant assure make
deny begrudge sell loan buy grudge
Similar results follow if we examine the same sequences viewed as contexts for following words. The following list, for instance, shows some of the contexts most similar to the the context her the . What is striking about it is that those contexts which are not syntactically parallel to original context are precisely contexts typical for a possessed object: (19) Contexts most similar to her the him the _ me the got the _ receiving the found the received the get the _
0.957 0.944 0.836 0.833 0.818 0.817 0.810
by the vector space measure. us the _ offered the_ reads the a wrapped stuffed the _ bearing the
0.798 0.774 0.772 0.768 0.767 0.763
If we examine the words which are strongly associated with these contexts, these results are confirmed; the words are prototypical possessions (which means, among other things, that they are typically inanimate nouns): (20) Some of the nouns most strongly associated with the contexts in (19) news package money message receipt note
letter telegram opportunity key bottle card
courage impression bracelet password paper amulet
privilege gift flashlight trophy name address
128
Paul Dearie
The data thus appear to confirm the construction correlate hypothesis, as there is clearly enough evidence recoverable from cooccurrence-based, vector space evidence alone to identify the key properties of the ditransitive construction.
7. Correlates of the way construction The key correlates of the way construction are sequences of the form NP's way. Once again, we find the basic pattern: contexts of this form are both structurally parallel and similar by the vector space measure. In this case, the contexts with the appropriate structural pattern occupy the top of the similarity space, with the rest of the list being occupied by contexts associated with words like find or make that account for the majority of instances of the way construction. The following list, for instance, shows the contexts most similar to the context their way: (21)
Contexts most similar to his way its way her way my way our way it hard it difficult it impossible excuses to it harder it easier it extremely it quite
0.981 0.979 0.972 0.967 0.940 0.913 0.901 0.901 0.885 0.884 0.877 0.874 0.862
their way by the vector space measure it nearly it disappear a clean it particularly it especially some excuse it beautiful it almost it less it pleasant him stumble excuses her wince
0.862 0.855 0.854 0.854 0.851 0.850 0.848 0.844 0.842 0.839 0.837 0.837 0.836
The words associated with the way contexts (their way, her way, his way, its way, my way, our way) are precisely the class of verbs that typically appear in the way construction: (22)
Some of the words most strongly associated with texts.
NP's way con-
Cooccurrence and constructions make find fight wave ease wriggle bribe
push claw pick wind edge work bluff
block light bar dig force navigate
129
grope head inch lose worm hack
Examining the words which follow the basic way bigrams, e.g. which follow the contexts my way , our way , her way , his way , and their way , we find that the words most strongly associated with these contexts tend to be directional terms, such as the following: (23) Some of the words most strongly associated with NP's way texts. along home upstream downriver
past northward uphill carefully
slowly southward downstream gingerly
con-
aft upward uptown
In short, there is all the evidence necessary to infer the key properties of the way construction: its association with manner-of-motion verbs, the requirement that a directional particle or prepositional phrase follow way, and so forth.
8. Other constructions Similar observations apply to other constructions discussed earlier. In the case of the self-motion construction, syntactically parallel contexts rank highly; for instance, they constitute 14 of the 20 most similar contexts to raise her and 11 of the 20 most similar contexts to raise his , including (24)
Syntactically parallel contexts most similar to raise her/his vector space measure raise her _ raised her _
0.982083 0.973095
dropped his _ lifted his _
0.9125 0.903926
by the
130
Paul Deane raise my _ raising my raised my _ raise his
0.972977 0.96874 0.960822 0.960007
swung his _ pointed his pointing his _
0.884688 0.873917 0.857642
And as might be expected, the words most strongly associated with these contexts include body part terms like hand and finger. The verb-of-contact construction involving sequences like touch them on the hand is harder to demonstrate results for for this database, as the construction-critical sequences are trigram contexts ( them on the, them on the ) and the database on which context similarities is being calculated currently only includes unigram and bigram contexts. However, examination of pieces of the context shows that all of the component bigrams in fact behave as one would expect. And it is possible to replicate the results fairly easily for constructions that do fit within the window used in the current study. For instance, many verbs participate in one or other variant of what Goldberg (1995) calls the caused-motion construction, in which a transitive verb which does not entails movement of the direct object will do so when a directional prepositional phrase is added: (25) a. The tool scraped it. b. The tool scraped it o f f . Examining contexts related by the vector space measure to the key bigram context, e.g. it off , shows similar results to those discussed above. E.g. the most similar and syntactically parallel contexts are: (26) Syntactically parallel contexts most similar to space measure _ her off them off him off it away something off _ me off him overboard it down
0.962 0.960 0.941 0.923 0.895 0.883 0.880 0.880
_ you off — US off them down them out them away it overboard him out
it off by the vector
0.872 0.869 0.866 0.863 0.859 0.834 0.833
Cooccurrence and constructions
131
And the verbs that fit best in this context are verbs which intrinsically describe motion, such as throw, drag, push and toss; which is precisely what would be expected on a construction grammar account.
9. Some issues Three issues should be discussed at this point. First, the emphasis in this article has been on the behavior of argument structure constructions, which are lexically headed. Such constructions naturally display the kind of Zipfian statistics noted above, in which a small proportion of the total vocabulary is strongly correlated with a particular function word sequence. This characteristic - strongly skewed lexical preferences - will not apply to all constructions; for instance, it should not apply to such minor constructions as that which underlies phrases like the more the merrier, the bigger the better, etc., where the words that fill in the slots must be characterized very broadly as comparative adjectives. However, the basic technique - looking for similarity among parallel contexts should work well enough in this case also. All that the lack of lexical specificity in the input data should entail is a lack of lexical specificity in the open slots of the pattern. Second, we should note that the arguments presented above depend crucially on a strategy which identifies constructions by looking for dependencies between content words and sequences of grammatical elements (e.g. function words.) The reason that the method works is precisely the fact that function words are frequent, and the statistical data associated with them robust. It is worth asking whether all constructions can be characterized in this way. Obviously, simple constructions like those which involve Adjective + Noun do not require function words, though of course they are subject to the same kind of analysis. Rather more importantly, not all syntactic relationships are local. The relationship between pronouns and antecedents, the kinds of relationships which are involved in wh-movement and other long-range binding patterns, clearly do not fall within the scope of the methods applied here. Otherwise, though, there is every reason to expect that these techniques will apply, and that the properties of constructions can be identified using η-gram techniques that search for constructional correlates. Third, the technique outlined thus far is not a full discovery procedure for constructional correlates, though methods for identifying such are not hard to imagine given the foundations established thus far. One could, for
132
Paul Deane
example, apply standard clustering techniques, though the sheer number of contexts might be prohibitive; alternatively, one could order contexts by some measure reflecting their frequency and/or the number of words with they appear, and create construction hypotheses by selecting (for each unclassified context) all syntactically parallel contexts down to a cutoff value. Such issues will be addressed in future work.
10. Conclusions The results outlined thus far are still exploratory. They suggest an algorithm for automatic identification of construction correlates, which will be explored in future work. However, they are also highly suggestive, with many potential applications in linguistic theory, computational linguistics, and related fields. The work reported here is comparable in many ways to work on automatic lexical acquisition, particularly of verb classes and subcategorization frames (cf. Resnick 1993; Hughes 1994; Dorr and Jones 1996; Hovav and Levin 1998; Lapata 1999, among others), but differs significantly precisely because of its emphasis on the identification of constructions and its use of vector space measures to measure the similarity of contexts and not just terms. The computational linguistic techniques deployed here may have other areas of applicability. For instance, there are obvious additional applications of the technique which measures the similarity of contexts with respect to the words with which they combine. Such a method in effect identifies the extent to which different phrases carry essentially similar information, which might make it useful as a method for identifying potential paraphrases where the syntactic structure and word order differ. The fact that a vector space analysis provides both term and context similarity is also important, and has not generally been exploited. It should be possible, for instance, to calculate contextually modified term-term similarities which indicate how similar items are in a local sentence context, rather than the purely global similarity metrics that dominate the literature currently, or to gauge how well which of various alternative words fit into a sentence context. Some of these applications are in fact of significant interest for the author's work at the Educational Testing Service. Within linguistic theory, the critical implication is its indication that the formal and many of the semantic characteristics of constructions may be learnable directly from distributional evidence. Linguistic knowledge of a basic sort is necessary for the techniques described here to work - e.g. part
Cooccurrence and constructions
133
of speech identification of individual words, and the ability to infer, e.g. noun phrases constituent structure from pronouns - but given that knowledge, there is reason to expect that straightforward statistical techniques can extract the information needed to characterize constructions from simple ngram word sequences, and thus no clear motivation for more complex theories of learnability which posit universal constraints on the syntax/semantics interface.
Appendix: A method for deriving term and context relationships using a rank ratio statistic and singular value decomposition The vector-space method used in this study to calculate context-context similarities embodies a number of innovations, both in the underlying statistic and in the application of singular value decomposition to natural language data.
1. The log rank ratio statistic Let us define the context of a word as some word sequence or attribute of a word sequence which is associated with a particular instance of a word in a document. For example, in the first sentence of this paragraph, the word context appears in the following contexts, among others: a. appearing after the word the b. appearing after the sequence define the c. appearing between the words the and of in that order d. appearing before the sequence of a. e. appearing as the object of the verb define. f. modified by a prepositional phrase whose object is word g. appearing in the same sentence as attribute. There are a wide range of contexts that can be associated with a word; what matters is that these contexts be defined in such a way that their presence or absence is determined by a sequence of words appearing near or around the target word within a document. In the particular data source employed here, contexts are instantiated as (a) the immediately preceding word; (b) the immediately preceding two word sequence; (c) the word immediately before and the word immediately after in combination; (d) the immediately following word; (e) the immediately following two word sequence. The following data is collected: (a) the total frequency of each word-context combination in the corpus; (b) the total frequency with which each context appears
134
Paul Deane
with any word in the corpus. Then the contexts are ranked by word. Two rankings are applied: First, a local ranking, which simply orders the contexts by their frequency with each word. I.e., if the context "appearing before barked" appeared so often with the word dog that it was the most frequent context, it would have the local rank one; if "appearing after the" were the the next most frequent context with dog it would receive local rank two, and so forth. If two contexts have equal local frequencies, they tie, and are assigned a rank equal to the average of the ranks they would occupy if their frequencies were distinct. Second, a global ranking, which orders the contexts which actually appear with a word by the overall frequency of the contexts in the corpus. In other words, if the context "appearing before barked" appeared a total of fifty times in the corpus, and "appearing after the" appeared a total of fifty thousand times in the corpus, the latter would be assigned a higher global rank than the former. If two contexts have equal global frequencies, they tie, and are assigned a rank equal to the average of the ranks they would occupy if their frequencies were distinct This method assigns each context a global rank and a local rank. The log rank ratio statistic is the logarithm of the global rank divided by the local rank. This statistic is very effective at identifying contexts which are particularly characteristic of a word, since it evaluates the significance of an entire set of contexts against one another, even where they may partially overlap or be partially redundant one with another.
2. Singular value decomposition by contexts Take the total set of contexts (as defined above) which are associated with the words in a corpus, and produce a matrix of words by contexts, with some attribute of the contexts as the value. In the simplest case, the actual frequency of each context in the corpus (or some smoothed value) will suffice. In the instantiation discussed in this paper, the value of each cell will be its log rank ratio. Then apply singular value decomposition to this matrix (or to a submatrix reduced to a size small enough to be computationally feasible.) The number of factors extracted by the singular value decomposition is an open parameter; the output reported in this study identified 100 factors for approximately 20,000 common English words. Singular value decomposition results in three matrices: a term matrix, a factor weighting matrix, and a context matrix, which when multiplied together approximate the values observed in the source matrix, with generalizations induced by the compression of the information into a smaller number of dimensions.
Cooccurrence and constructions
135
3. Term-term similarity Given the vectors in the term matrix, similarity of terms to terms can be induced by the cosine of the angle between vectors in factor space. Cosine similarity in the term matrix is the basis for the term-term similarities reported in this study, i.e., if Τ stands for the vector associated with a term in term space, and S is the factor weight vector, we produce the appropriate dot product as Τ χ S χ S χ Τ.
4. Context-context similarity Given the vectors in the context matrix, similarity of contexts to contexts can be induced similarly. If D stands for the vector associated with a context in context space, and S is the factor weight vector, we produce the appropriate dot product as D χ S χ S χ D. Cosine similarity in the context matrix is the basis for the contextcontext similarities reported in this study.
5. Term-context fit Given a context, one can estimate which terms fit a context well by taking the cosine of the context vector for that context against each term vector. This can be used to estimate the class of words most strongly associated with each context. I.e. if Τ stands for the term's vector in term space, and D for the context's vector in context space, and S for the factor weighting, we take the dot product Τ χ S χ D.
6. Estimation of term and context vectors One advantage of this method is that an SVD analysis can be performed as training on part of a corpus (for instance, the Ν most frequent words) and then it can be extended to the remaining words or contexts in the corpus by exploiting the interdependence between term and context vectors. I.e., given a vector representing the raw context data for a word not appearing in the original SVD analysis, one can multiply the vector times the context matrix to obtain a term vector for the word in factor space.
7. Estimation of parallelism between contexts For contexts with equal numbers of elements, the degree to which they are parallel can be estimated by term-term similarity of the parts. E.g. if the starting context is mrs. then one can estimate how similar another context such as dr. or john
136
Paul Deane
is by taking the cosine between the factor vectors of mrs. and dr. or john\ and similarly for each subsequent word in a context.
8. Identification of constructions using these methods Given an SVD analysis along these lines, then the methods suggested in the body of the text can be extended to infer potential constructions or grammatical patterns from the context data. Essentially, given a seed context, (i) produce a list of contexts similar to the original context, down to some cutoff; (ii) eliminate any contexts from this list that do not have parallel basic structure (i.e., the same number of words with the open position in the same spot), (iii) take the cosine between component words of the original context and the word in the parallel position in the other context, and reject any contexts where any of these cosines fall below a threshold. This produces a list of contexts. Take each list of words which fill the same position in each context, sum their vectors, and use the sum to induce a context vector for that position. The result will be a sequence of context vectors which select the words that will be appropriate for each position in the construction.
References Anglin, Jeremy M. 1993 Vocabulary development: A morphological analysis. Monographs of the Society for Research in Child Development 58 (10), Serial No. 238. Bencini, Giulia, and Adele Ε. Goldberg 2000 The contribution of argument structure constructions to sentence meaning. Journal of Memory and Language 43: 640-651. Bloomfield, Leonard 1933 Language. New York: Holt, Rinehart. Brooks, Patricia, and Michael Tomasello 1999 How young children constrain their argument structure constructions. Language 75: 720-738. Buchanan, Lori, Curt Burgess, and Kevin Lund 1996 Overcrowding in semantic neighborhoods: Modeling deep dyslexia. Brain and Cognition 32: 111-114. Burgess, Curt, and Kevin Lund 1997a Modeling Parsing constraints with high-dimensional context space. In Language and Cognitive Processes: Special Issue on Lexical Representations in Sentence Processing 12: 177-211. 1997b Modeling cerebral asymmetries of semantic memory using highdimensional semantic space. In Getting it Right: The Cognitive Neuroscience of Right Hemisphere Language Comprehension, Mark
Cooccurrence and constructions
137
Beeman, and Christine Chiarello (eds.), 215-244. Hillsdale, NJ: Lawrence Earlbaum Associates. Burgess, Curt, K. Livesay, and Kevin Lund 1998 Explorations in context space: Words, sentences, discourse. Discourse Processes 25: 211-257. Burstein, Jill C. 2003 The Ε-rater scoring engine: Automated essay scoring with natural language processing. In Automated Essay Scoring: A Cross-Disciplinary Perspective, Mark D. Shermis, and Jill C. Burstein (eds.), 113122. Mahwah, NJ: Lawrence Earlbaum. Burstein, Jill C., and Martin Chodorow 1999 Automated essay scoring for nonnative English speakers. Joint Symposium of the Association of Computational Linguistics and the International Association of Language Learning Technologies, Workshop on Computer-Mediated Language Assessment and Evaluation of Natural Language Processing, College Park, Maryland. Burstein, Jill C., Karen Kukich, Susanne Wolff, Chi Lu, Martin Chodorow, Lisa Braden-Harder, and Mary Dee Harris 1998 Automated scoring using a hybrid feature identification technique. In Proceedings of the Annual Meeting of the Association of Computational Linguistics, Montreal, Canada, August 1998. Burstein, Jill C., Karen Kukich, Susanne Wolff, Chi Lu, and Martin Chodorow 1998a Enriching automated essay scoring using discourse marking. In Proceedings of the Workshop on Discourse Marking and Discourse Relations, Annual Meeting of the Association of Computational Linguistics, Montreal, Canada, August 1998. 1998b Computer analysis of essays. NCME Symposium on Automated Scoring, April 1998. Burstein, Jill C., Claudia Leacock, and Richard Swartz 2001 Automated evaluation of essays and short answers. Paper presented at the Fifth International Computer Assisted Assessment Conference, Loughborough University, UK, 2nd and 3rd July 2001. Burstein, Jill C., Claudia Leacock, and Martin Chodorow 2003 CriterionTM online essay evaluation: An application for automated evaluation of student essays. In Proceedings of the Fifteenth Annual Conference on Innovative Applications of Artificial Intelligence, Acapulco, Mexico, August, 2003. Burstein, Jill C., Daniel Marcu, Slava Andreyev, and Martin Chodorow 2001 Towards automatic classification of discourse elements in essays. In Proceedings of the 39th Annual Meeting of the Association of Computational Linguistics, Toulouse, France, July 2001. Chomsky, Noam 1959 A review of B. F. Skinner's Verbal Behavior. Language 35 (1): 26-58.
138
Paul Deane
Church Kenneth, Patrick Hanks, Donald Hindi, William Gale, and Rosamund Moon 1994 Substitutability. In Computational Approaches to the Lexicon: Automating the Lexicon II: Schema, B.T.S. Atkins and Antonio Zampolli (eds.), 153-180. Cambridge, U.K.: Oxford University Press. Collins, Michael J. 1996 A new statistical parser based on bigram lexical dependencies. In Proceedings of the 34th Annual Meeting of the Association for Computational Lingusitics, 184-191. 1997 Three generative, lexicalized models for statistical parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Association for Computational Linguistics, 16-23. Curran, James R., and Marc Moens 2002 Scaling Context Space. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 231-238. Deane, Paul D. 1993 Grammar in Mind and Brain: Explorations in Cognitive Syntax. Berlin: Mouton de Gruyter. Dorr, Bonnie J., and Doug Jones 1996 Role of word sense disambiguation in lexical acquisition: Predicting semantics from syntactic cues. In Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen. Ersan, Murat, and Eugene Charniak 1995 A statistical syntactic disambiguation program and what it learns. Brown CS Tech Report CS-95-29. Fillmore, Charles J. 1989 Grammatical construction theory and the familiar dichotomies. In Language Processing in Social Context, R. Dietrich, and C.F. Graumann (eds), 17-38. Amsterdam: North-Holland/Elsevier. 1998 The mechanisms of 'Construction Grammar'. In Proceedings of the 14th Annual Meeting of the Berkeley Linguistic Society, 35-55. Fillmore, Charles J., Paul Kay, and Mary C. O'Connor 1988 Regularity and idiomaticity in grammatical constructions: The case of Let Alone. Language 64 (3): 501-538. Finch, Steve, Nick Chater, and Martin Redington 1995 Acquiring syntactic information from distributional statistics. In Connectionist Models of Memory and Language, Joe Levy, Dimitrios Bairaktaris, John Bullinaria, and Paul Cairns (eds.), 229-242. London, UCL Press. Goldberg, Adele Ε. 1995 A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press.
Cooccurrence and constructions
139
Grefenstette, Gregory 1992 Use of syntactic context to produce term association lists for text retrieval. In Proceedings of the Fifteenth Annual ACM SIGIR Conference on Research and Development in Information Retrieval: 89-97. 1994 Explorations in Automatic Thesaurus Discovery. Boston: Kluwer Academic Press. Hovav, Malka R., and Beth Levin 1998 Building verb meanings. In Lexical and Compositional Factors, Miriam Butt, and Wilhelm Geuder (eds.), 97-134. Stanford, CA: CSLI. Hughes, John 1994 Automatically acquiring classification of words. Ph.D. thesis, University of Leeds, School of Computer Studies. Kay, Paul, and Charles J. Fillmore 1999 Grammatical constructions and linguistic generalizations: The What's X doing Y? construction. Language 75 (1): 1-33. Laham, Darrell 1997 Latent Semantic Analysis approaches to categorization. Poster presented at the 19th annual meeting of the Cognitive Science Society. Landauer, Thomas K. 2002 On the computational basis of learning and cognition: Arguments from LS A. The psychology of learning and motivation 41: 43-84. Landauer, Thomas K., and Susan T. Dumais 1997 A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104: 211-240. Langacker, Ronald W. 1991 Concept, Image, and Symbol: The Cognitive Basis of Grammar. Berlin: Mouton de Gruyter. Lapata, Maria 1999 Acquiring lexical generalizations from corpora: A case study for diathesis alternations. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 397-404. Leacock, Claudia, and Martin Chodorow 2000 Automated scoring of short-answer responses. ETS Technologies Research Report. 2003 Automated grammatical error detection. In Automated Essay Scoring: A Cross-Disciplinary Perspective, Mark D. Shermis, and Jill Burstein (eds.), 195-207. Hillsdale, NJ: Lawrence Erlbaum. Forthcoming C-rater: Automated scoring of short answer questions. Computers and the Humanities.
140
Paul Deane
Lee, Lillian 1999 Measures of distributional similarity. In 37th Annual Meeting of the Association for Computational Linguistics, 25-32. Los Altos, CA: Morgan Kaufmann Publishers. Lin, Dekang 1998 Automatic retrieval and clustering of similar words. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, 898-904. Lund, Kevin, and Curt Burgess 1996 Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments and Computers 28 (2): 203-208. Lund, Kevin, Curt Burgess, and Ruthanne Atchley 1995 Semantic and associative priming in high-dimensional semantic space. Proceedings of the Cognitive Science Society, 660-665. Magerman, David M. 1994 Natural Language Parsing as Statistical Pattern Recognition. Ph.D. thesis, Stanford University University. 1995 Statistical decision-tree models for parsing. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 276-283. Manning, Christopher D., and Hinrich Schütze 1999 Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press. Michaelis, Laura Α., and Knud Lambrecht 1996 Toward a construction-based theory of language function: The case of nominal extraposition. Language 72: 215-247. Pantel, Patrick, and Dekang Lin 2002 Discovering word senses from text. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2002, Edmonton, Canada, 613-619. Philip Resnik 1993 Selection and information: A class-based approach to lexical relationships. Ph.D. thesis, University of Pennsylvania. Pinker, Steven 1984 Language Learnability and Language Development. Cambridge MA: Harvard University Press. 1989 Learnability and Cognition: The Acquisition of Argument Structure. Cambridge, MA: MIT Press. Powers, Donald E., Jill Burstein, Martin Chodorow, Mary E. Fowles, and Karen Kukich 2001 Stumping Ε-Rater. Challenging the Validity of Automated Essay Scoring. GRE Board Professional Report No. 98-08bP; ETS Research Report 01-03. Princeton, NJ: ETS.
Cooccurrence and constructions
141
Saussure, Ferdinand de 1916 Cours de Linguistique Generale. English translation by Wade Baskin, Course in General Linguistics. New York: Philosophical Library, 1959. Schunn, Christian D. 1999 The presence and absence of category knowledge in LS A. In Proceedings of the 21st Annual Meeting of the Cognitive Science Society. Mahwah, NJ: Earlbaum. Schütze, Hinrich 1992 Context space. In Working Notes of the AAA1 Fall Symposium on Probabilistic Approaches to Natural Language, Robert Goldman, Peter Norvig, Eugene Charniak, and Bill Gale (eds.), 113-120. Meno Park, CA: AAAI Press. 1993 Word Space. In Advances in Neural Processing Systems 5, Stephen Hanson, Jack D. Cowan, and C. Lee Gils (eds.), 895-902. Los Altos, CA: Morgan Kaufmann. Tomasello, Michael, and Patricia Brooks 1999 Early syntactic development: A Construction Grammar approach. In The Development of Language, Martyn Barret (ed.), 161-190. Hove: Psychology Press. Zipf, George K. 1935 The Psycho-Biology of Language. Boston, MA: Houghton Mifflin.
Chapter 5 On the reduction of complexity: Some thoughts on the interaction between conceptual metaphor, conceptual metonymy, and human memory Rainer Schulze
1. Introduction The following paper summarizes my reflections on what I see as one of the most controversial - and possibly frequently neglected - topics of current cognitive linguistic research. I am convinced of the fact that linguistic structures, including morphemes, words, composite structures, sentences or texts (i.e., as linguistic manifestations of concepts or conceptual complexes), can be reasonably likened to aspects of human memory or categorization. Although I do not want to enter into a discussion about the alleged validity of some Humboldtian or Sapir and Whorfian claims concerning the shaping potential of language in general, I will endorse a milder version of this claim and confine my attention to some selected nonliteral (i.e., figurative) units the semantic structure of which can reasonably serve as a window on conceptual structure.
2. The architecture of memory and cognition As has been known for some time, memory contents seem to be organized in different stores, and the contents themselves are likely to be structured by the type of representation they use. According to Atkinson and Shiffrin (1967), three different, although interacting stores have been identified, including the sensory stores with a short-lived but large capacity, the shortterm memory of limited capacity and duration and the large and durable long-term memory. From a more linguistic point of view, long-term memory has received some attention and is probably of some obvious relevance to the study of knowledge and representation: while declarative knowledge in this context can be equated with 'knowing that' (e.g. that linguists are fascinating people), procedural knowledge is rather linked to
144
Rainer Schulze
'knowing how' (e.g. how to write an academic paper on anaphora for firstyear students). Furthermore, it is to Paivio (1986) that we owe the distinction between verbal representation and imagery representation ('bigger than', 'on top o f , 'by the side o f , etc.) of declarative knowledge, and it is to Tulving (1972) that we owe the specification of declarative knowledge into episodic memory (i.e., time-place dependent personal facts and details) and semantic memory (i.e., general facts). Early work by Collins and Quillian (1969) suggests that semantic memory can be conceptualized as a network of interrelated nodes; each node is assumed to stand for a concept the meaning of which being represented by its connections with other nodes. It is worth mentioning here that the distinction between type nodes (i.e., general cases) and token nodes (i.e., exemplars) helps to identify a number of different relations that may hold between nodes; in particular, the relationship between species and genus ('a linguist is a human being') can be complemented by property ('she has red hair') and affectedness relations ('she hits the linguist'). What makes this approach so attractive to linguists is the assumption that subsetsuperset relations or property and affectedness relations seem to be represented in memory as a cluster of descriptive propositions (i.e., units of thought that are comprised of nodes) (Bower 1981): nodes or units in memory, as they are 'propositionally' represented in 'Manu hit the linguist' and Ί feel scared' are clearly connected with the proviso that the activation of one node can spread to adjacent nodes; i.e., activated nodes in the first sample proposition can spread to 'switch on' the nodes constituting the second sample proposition (Bower and Cohen 1982). As the deliberate analogy between processes in the mind and electrical current illustrates, knowledge pertaining to the biochemical, electrical and cognitive bases of the human memory is still in its infancy, despite some major achievements, especially in psychology, neuroscience, artificial intelligence and linguistics (Gernsbacher 1994; Anderson 1983; Baddeley 1986; Baddeley, Aggleton, and Conway 2002; Cowan 1985; Fuster 1995; Gazzaniga 2000; Kandel, Schwartz, and Jessel 2000; Schacter and Tulving 1994; Squire and Schacter 2002; Pishwa 2003). Scholars seem to be quite unanimous in accepting the following hypotheses: The origin of information stored in the human brain is to be regarded as a triple one. They originate (1) from the history of species through the biological function of information storage in particular, as well as through fundamental types of decisions based on experience of the species, thus originating from the history of evolution; (2) the information stored in
On the reduction of complexity
145
memory originate from the history of society: knowledge of phenomena of nature and their internal correlations has been imported by instruction, through lecture and education, via language; (3) it receives information through the individual life history, through experiences made by the 'ego' in dealing with the things of perceptible reality (Klix 1980: 12).
Although an overall theory of human memory is still to be expected, both linguistic and non-linguistic, some principles, drawn from large-scale experimental results, can be posited in order to elucidate the cognitive processes involved: 1. Human memory is organized ... 2. The organization embodied in a mental structure is revealed in free recall 3. The organization of memory is based on experience ... 4. The tendency of a person to recall an element that occurred in an event depends on two factors: (a) the amount of elaboration of the person's mental structure, and (b) the degree to which the element is typical in events of the kind being examined ... (Freeman, Romney, and Freeman 1987: 313314)
3. Metaphors, metonymies, concepts, schemata, the principle of relevance, and knowledge representation Linguists of a cognitive persuasion have constantly emphasised the fact that language principally serves two main functions in society (Tomasello 2003): language is used for communication and for cognitive representation, and both linguistic functions seem to be inextricably intertwined. This is to say that linguistic symbols (as linguistic manifestations of concepts) are employed for interpersonal communication and that these interpersonal tools, once they have been internalised, become the major representational medium for certain kinds of human cognition, including basic cognitive skills such as perception, categorisation, memory, relational understanding, problem solving, etc. If language users are endowed with the ability to cognitively represent the world, they are able to recall where things are located, to anticipate impending events, to use spatial detours and shortcuts (i.e., cognitive mapping), to categorise novel objects on the basis of perceptual similarity or to solve novel problems on the basis of some mental trial-anderror strategy or via insight. Seen from that point of view, language is very
146
Rainer Schulze
much a human creation; in its flexibility, its efficiency to convey thought and to establish rapport, and its provision for socialisation, it surpasses by far any other vehicle of verbal exchange and interaction in the world. Tomasello gives, in distinct form, the model of communication which is either in wide acceptance or remains unquestioned by cognitive linguists (2003: 50): "Simply said, language is about sharing and directing attention, and so it is no surprise that it emerges along with other joint attentional skills". What are then the broad implications of human and linguistic creativity, dynamicity, efficiency, flexibility, schematicity and frequency with language? Metaphors, metonymies and schemata (i.e., the conceptual means by which the language user makes sense of the world s/he lives in) play a key role in learning, creating, remembering and communicating; metaphors, metonymies and schemata are related to well-organised clusters of concepts which are assumed to be mentally construed models of reality as we perceive it (Lakoff and Johnson 1999). We owe the notion of the embodied mind hypothesis to the latter luminaries who view concepts as being established and organised according to the way the body and especially the brain are structured and function in the physical world. Experience is thus embodied, being based on the interaction of reality with our biology or physiology, and so are our concepts. Our human capacity to process newly perceived information (and to link it with prior knowledge) requires the language user to segment the world into relevant and irrelevant portions, with relevance being individually, culturally and socially constrained. This process of segmentation forms the basis of the mental construction of reality and the simultaneous or subsequent categorisation of the perceived world into concepts. As a necessary preparation to exploring the ways language and cognition interact, let us assume that there is a natural tendency in the human being to link distinct domains, a perceptual process that has become known as analogical reasoning. Whatever locus analogical reasoning may have in the brain, it is ubiquitous with each of us and manifests itself through various metaphorical and metonymical operations. This is where the key role of concepts becomes crucial: concepts help to relate causes and effects; concepts enable us to identify things by categorising them, and concepts help to reduce the complexity of the environment. A concept itself may comprise different phenomena and thus presupposes a significant degree of flexibility; but there is also evidence that the flexibility of concepts has its limits, possibly to be grounded in its structure.
On the reduction of complexity
147
As Barsalou (1982) points out, concepts (including both definitional, prototypical and functionally more important information) typically exhibit a stable core with a context-sensitive periphery (i.e., concepts may consist of context-dependent and of context-independent components). Barsalou furthermore hypothesises that context-independent components emerge from context-dependent ones due to a frequent combination with the conceptactivating words. The more basic point in this is that, depending on the communicative needs of the language user, concepts are activated only partially, which results in the vagueness of the conceptual structure. Some effort has been made to provide us with relevant information concerning the internal structure of concepts. Bartlett (1967), just to mention an early representative, asserts that the content of a concept does not correspond to the sum of its components, but is represented holistically in overlapping schemata; schemata in this context emerge as holistic images, as figurative, subconsciously working generic structures the elements of which are mutually related. Knowledge is thus organised in schemata and interacts with reality through the observer. Such a view entails that, if concepts are usually activated only partially, i.e., only the situationally relevant meaning is highlighted, then in terms of schema theory, only a particular schema is activated, not whole concepts to serve the communicative aim. Activated schemata then contain knowledge presumably shared by speaker and hearer about what can be associated with a concept in a given context. I have brought up this matter of schema activation in part to emphasise and specify the interaction of supposedly incompatible target and source domains in conceptual metaphors: independent of any truth value (Claudi and Heine 1986), correspondences between the two domains depend on a similarity of stereotype associations rather than on a similarity of attributes, and all the concomitant entailments are memorised in mental images, i.e., schemata. It goes without saying that not all schemata of a concept have equal status: some are more central, others are more peripheral. What emerges from this observation is that those schemata of a concept that are activated in a particular context become established as conventional through repeated occurrence: the more often a schema is activated by different speakers and in various contexts, the more it becomes a core element of the concept and the more likely it becomes conventional in a speech community. There is no denying the fact that schema activation also has a neurological dimension in that concepts can equally well be understood as cognitive routines or interrelated sets of sub-routines. More precisely, what triggers
148
Rainer Schulze
the activation of a subroutine is closely linked to selective attention, closely associated with the intensity or energy level of cognitive processes (Luck and Beach 1998). Against this background, automatic attention implies less intentional effort, less intentional resources, higher speed of cognitive processing and lower level of cognitive processing (Stamenov and Andonova 1998: 1; Pinsky 2002). Thus, from a neurological point of view, practice produces routines that require minimal effort and yield automatic processing; from a linguistic point of view, the co-text of a verbal expression suffices to trigger the selective attention that highlights some aspects of a concept and thereby neglects others. Or as Stamenov and Andonova (1998: 7) put it: "Practically every word collocation leads to focal adjustments and backgrounding of some aspects of the mental representation constructed in processing verbal expressions". In sum, we can claim that schemata represent abstract knowledge about phenomena, reducing the complexity of phenomena to their stereotypical characteristics; their holistic character as well as the use of previous knowledge in the activation makes information-processing fast and efficient and reduces processing effort. Evidence nowadays continues to mount that there is one particular strategy underlying everyday encounters, both spoken and written: maximal benefit at minimal cost (Sperber and Wilson 1986). This strategy turns out to be a basic principle of metaphorisation or metonymisation in that relevance is seen as the result of the optimisation of resources in which high informational benefits are achieved through low processing efforts: information in this context is relevant if it interacts with existing assumptions (either with long-term memory assumptions) or with contextdependent assumptions, stored in short-term memory and activated by situation and co-text. Co-text, in analogy to similar constructs in cognitive linguistics, is psychological in nature and based on the speaker's assumptions about the world which in turn evoke assumption-schemata. Conceptual metaphors and metonymies, from a relevance-theoretical vantage point, represent condensed forms of information; they can be easily processed because of their brevity and conciseness and because of their interaction with prior knowledge and beliefs. Conceptual metaphors and metonymies can also be grasped as tools to optimise resources, rendering the comprehension of an unfamiliar domain easier by structuring it in terms of a familiar domain. More specifically, they simplify the structure of the target domain and thereby reduce its complexity and the otherwise necessary processing effort, simultaneously enabling insights into the target domain that provides maximal information (Goatly 1997). And this is
On the reduction of complexity
149
where we have come frill circle again: by means of conceptual metaphors and metonymies, we are made available a tool which serves as an efficient strategy of categorising the world, a categorising procedure which is consonant with analogical and inferential thinking. Different kinds and types of processing can be directly related to high versus low loads on working memory (Smith, Patalano, and Jonides 1998): analytic and strategic processing, e.g. is assumed to be fairly slow, because it requires an extensive use of attention and working memory, which, in terms of relevance, raises cognitive costs by involving high processing effort. In contrast, the more automatic processing in similarity calculation (i.e., similarity in stereotype associations) imposes a comparatively small load on working memory by retrieving components from long-term memory. Thus, analogical thinking, as triggered by conceptual metaphors, keeps cognitive costs low by reducing processing effort in categorisation tasks considerably.
4. A brief outline of phenomenology and neuro-connectionism Having presented and discussed some possible contributions to the topic which are desirable for a study of the most recent findings in linguistic cognition, I would now like to make some remarks about an endeavour which relates linguistics, connectionism, philosophy and neuro-biological findings. The research activities of the past decade have concentrated on connectionism at beginning levels and have been largely within the framework of computational linguistics. We have learned something about how elements within a system communicate with one another via spreading activation and inhibition across weighted links (e.g. Pinker and Mehler 1988). We have witnessed the publication of texts (e.g. Bartsch 1998; Bartsch 2003, or Bartsch forthcoming) which vigorously discard the view that memory can be seen as a huge storehouse or library, holding readymade images or (largely propositional) representations of past events. It is especially the latter publications which endorse a two-level approach: first, a phenomenological level of experience, combined with a theory of dynamic conceptual semantics, and second, a neuro-connectionist level which accounts for the formation and linkage of concepts. Against this background, linguists have accustomed themselves to speaking about the process of nerve stimulations and neural activity with nets of neurons and growing and strengthening or weakening and diminishing connections between them. Seen against the learning phase of a human being, groups of
150
Rainer
Schulze
neurons (or neuronal patterns) stabilise because of the strengthening of links between neurons. 'Pre-petrified' patterns can thus be equated with more or less stable conceptual indicators which get activated from certain input; and being activated, they confirm or corroborate this very input (Bartsch forthcoming). I suspect that conceptual indicators get associated, either simultaneously or subsequently, with neuronal indicators for linguistic expressions. So far, memory has become lost somewhere; yet, the linguist might still feel that there is more to gain from the assumptions implied by some neuroconnectionist architecture: by receiving (and learning) input data, contiguity relationships are established; by contiguity, sequences of data are understood as being structured according to a couple of basic schemata, i.e., including those pertaining to space, time, cause-effect relationships, agentaction-instrument relationships, agent-action-patient relationships, actionresult relationships, etc. As a result, input data in contiguity form clusters, thus being similar to semantic networks. If concept formation in the mind is based on experiences and if the possible contiguity of experiences can be understood as some constrained ordering of data according to local, temporal, causal or other factual relationships, then the ordering of our experiences according to possible similarity and non-similarity should not be unjustly neglected. The human being seems to be fully aware of some events or episodes in his or her personal life story as resembling one another. This is to say that events or properties may share some (or at least one) properties and qualities without being exactly identical. Similarity in structure, a more complex pattern, can be called analogy and seems to be based on the human being's capacity to identify contiguity. In sum, we may claim (and in this attempt following Bartsch forthcoming) that two different kinds of memory can be distinguished: on the one hand side, a general and semantic memory may be discerned where all the general concepts are stored, comprising more Stative and generic content; on the other hand side, the existence of a fact and episodic memory may be posited in which individual concepts and objects, being part of an individual's life story, can be found, rather storing dynamic and referential information (and I am aware of the fact that this distinction approximates to Kövecses' claim of three interrelated levels (2002: 239-241): the supraindividual level at which conceptual metaphors and metonymies are analysed on the basis of de-contextualised linguistic examples; the individual level at which these concepts are related to the knowledge of individual speakers; and the sub-individual level at which universal sensorimotor experiences operate which in turn generate conceptual metaphors and me-
Ort the reduction of complexity
151
tonymies). Furthermore, both kinds of memory interact and are linked to neural patterns pertaining to primary sensorial fields and to proprioception areas of motion and emotion. I propose that we reaffirm the relevance of metaphorical and metonymical thinking as arising from recurring patterns of embodied experience; seen against this background, concepts or clusters of concepts for metaphorical and metonymical mappings seem to pre-exist as independent entities in long-term memory. Memory as an unconscious phenomenon may be made 'visible' or more transparent by viewing experienced episodes and events as results of concept formation, remembering and understanding. More precisely, notions such as similarity, contiguity or inferencing in concept formation and concept retrieval will help to unveil some aspects of everyday experience, irrespective of some (empirical) findings made in the cognitive neuroscience realm.
5. Metaphor, metonymy, and memory in operation In trying to comprehend the ongoing debate on figurative expressions in a language and their conceptual bases of similarity and contiguity, it is inevitable that we reconsider some of the basic findings in cognitive linguistics. Figurative expressions here are not primarily seen as a matter of language, but as a matter of thought. The early theories, far from being adequate representations of human interactions, are outstripped by those which are armed with pertinent and numerous data from disciplines as diverse as developmental psychology, social and cognitive anthropology, social cognition, etc. On the basis of what is known nowadays, figurative expressions tend to be linguistic outcomes of some very special operations and interactions of concepts, and not of words. The basic function of most figurative expressions is to better understand certain concepts, and not for just some artistic or ornamental and aesthetic purpose; and since they are used quite effortlessly in everyday life by ordinary human beings, i.e., even by those 'uncontaminated' by linguistic knowledge, it can be reasonably assumed that they are far from being superfluous and that they form part of the inevitable process of human thought and reasoning. It is no accident that notions such as similarity or contiguity owe much to research focussing on aspects of mental disorder, i.e., on aspects related to selection aphasia and agrammatism. While the former case deals with the disturbance of the ability of substituting words for other words, the latter case is clearly linked to the disturbance of the ability of making phrases and sentences; both disorders easily translate into similarity disorder and conti-
152
Rainer Schulze
guity disorder respectively. Against this background, Jakobson (2003) sketches a two-dimensional framework for conceiving of and analysing linguistic expressions. The first dimension is paradigmatic where the principle of selection and substitution as alternative conceptualisations for the same phenomenon obtain; the second dimension is syntagmatic where the linking or combination of phenomena which are spatially, temporally, causally, etc. contiguous is essential. Being able to accept figurative language as a reality is important for the functioning of creativity, cognition and memory in all sorts of human endeavours; and trying to understand how the notions of similarity and contiguity necessarily combine with metaphor and metonymy as two central exponents of figurative language. The following diagram, slightly modified in order to serve our purposes, might help to elucidate the kind of interaction we should espouse (Dirven 2003: 77): metaphor syntagmatic operation based on combination or contexture·,me tonymy exploiting spatial, temporal, causal, etc. contiguity
1k
•
paradigmatic operation based on selection or substitution; exploiting similarity and contrast The role of similarity may be easily explained in the conceptual metaphor SADNESS IS A NATURAL FORCE (capitalised items are used to stand for concepts). Kövecses (2002: 92), for example, provides a brief discussion of the interactional possibilities of the target domain SADNESS and the source domain NATURAL FORCE in order to account for a linguistic expression such as Waves of depression came over him. We have to accept the fact that the correspondences between the domains do not depend on a similarity of attributes, but on a similarity of stereotype associations or schemata, including those pertaining to 'passivity' and 'lack of control'
On the reduction of complexity
153
rather than those pertaining to 'cause', 'attempt at control' and 'behavioural responses'. The schemata activated in this context exhibit and highlight relations between parts and suppress others, and the concomitant entailments or inferences are easily memorised in mental images. Substitution or selection here is so successful since phenomena in different domains are felt to be almost the same, but not exactly the same. These mental operations can be summarised and applied to metaphor in the following diagram: Metaphor source domain NATURAL FORCE
target domain SADNESS
We propose two types of associative relations here that may obtain between the concept and the linguistic manifestation on the one hand side and across the domains on the other hand side: The vertical arrows display the nonrandom relationship between the concept and the selected, prototypical and most salient linguistic realization, simultaneously ignoring aspects of agentivity, intentionality, willpower, voluntary and intentional exposure to the environment, etc. The horizontal arrows display the mapping process across two experientially different domains, simultaneously indicating the partial mapping of highly schematic information and ignoring the simple mapping of attributes. But there is a more basic point in unveiling the relationship between memory and language: the role of contiguity. As is common knowledge in linguistics, a linguistic unit, such as a constituent, consists of a set of linguistic forms, these forms being in a sequential relationship to one another. While this observation and statement prove to be a mere truism at first
154
Rainer Schulze
sight, taking the stringing together of linguistic units to form phrasal verbs, prepositional verbs, irreversible binomials, collocations, idioms, proverbs, sayings, clauses, etc. for granted, there is, undeniably and on closer inspection, a conceptual twist to the whole enterprise in that the assumption of a linguistic syntagmatic axis as one of the basic modes of speaking can be expanded into a non-linguistic one, coinciding with one of Jakobson's basic modes of thinking. In his pre-scientific theory of the mind, Jakobson complements the idea of syntagmatic relations in speaking and writing with associative elements in thinking. Thus, non-linguistic syntagmatic relationships may comprise conjunctive syntagms, based on juxtaposition, and inclusive syntagms, based on a chain of inclusion (Dirven 2003: 79-81). What is interesting about this claim is the fact that a linguistic expression such as Langacker is tough to read provides a convincing illustration of the systematic character of meaning extension from Langacker as a unique human being via the status of an eminent dignitary within the field of cognitive linguistics to Langacker as 'prolific writer' and 'progenitor of linguistic ideas'. Thus, following a survey article in one of the more reputable reference books on diverse linguistic disciplines (Dirven 2002: 80), Langacker is directly associated with notions and propositions including 'space grammar', 'linguistic meaning resides in conceptualisation', 'things vs. relations', 'construal', 'perspective', 'gestalt (profile vs. base)', 'trajector vs. landmark' or '(a-)temporal relations'. The full understanding of Langacker is tough to read, a paradigm case of an inclusive syntagm, might require that the memory include a representation of the expression itself and more schematic and relevant information pertaining to relationships such as THE PRODUCER FOR THE PRODUCT. We thus want to claim that each segment of a linguistic expression is likely to generate an automatic search of the long-term memory, i.e., the central repository of representations as the products of our understanding. We can represent the underlying metonymical process in the following way:
Metonymy: source entity: LANGACKER target entity: COGNITIVE LINGUISTICS
On the reduction of complexity
155
ν
In the example above, the understanding of one entity less readily or easily available (i.e., the target domain) in terms of something else being more concrete or salient that is closely related to it (or sufficiently contiguous to, i.e., the source domain) clearly illustrates the basic nature of metonymy, i.e., the close connection and hence association between two sub-domains which form part of a more encompassing conceptual structure. Contiguous sub-domains, in general, are assumed to represent a functionally ordered set of entities which facilitate the performance of the required or intended inference; prototypical and most frequently encountered 'conceptual interactions' include PRODUCER FOR PRODUCT, BUILDING FOR INSTITUTION, PLACE FOR EVENT, CONTROLLER FOR CONTROLLED or OBJECT USED FOR USER. These interactions have achieved prominence and significance through sufficient contiguity and/or repetition. Thus far, we have established a number of observations which can be visualised in the following way:
Schematic metonymy
The type of schematic metonymy we want to construe is able to show two possible ways of interaction within intra-domain mapping processes: domain expansion in source-in-target metonymy and domain reduction in
156
Rainer Schulze
target-in-source metonymy (Ruiz de Mendoza Ibänez and Diez Velasco 2004: 297-298). Notice that notions such as source or target do not represent different domains whose initial mismatch is eventually salvaged by later assumptions concerning their alleged similarity (as in a conceptual metaphor); rather, both are seen as place-holders for sub-domains which must be sufficiently distinct and should bear some mutually beneficial activation potential. Thus, a source may be seen as a sub-domain of the target and vice versa. These are the minimal requirements for a conceptual metonymy in which, for example, the concomitant NPs are used either for referential purposes (as in The trombone has the measles, Baghdad has already seen many Londons or The MIT has redefined the structure of language) or in a predicative way (as in She's all legs). In particular, She's all legs clearly gives rise to a source-in-target metonymy in which the language user conceptualises the source as a subdomain of the target and in which he/she gains access to the whole domain, i.e., the matrix domain. This type of domain expansion can be captured in the following way (Ruiz de Mendoza Ibänez and Diez Velasco 2004: 298):
Source-to-target metonymy
The second type worth mentioning here views the target as the sub-domain of the source and thus operates as a target-in-source metonymy, assuming The MIT has redefined the structure of language as a clear case of domain reduction. Again, a relevant part of the domain is being highlighted here, thus gaining primary status: Target-in-source metonymy
On the reduction of complexity
157
Both types of conceptual metonymy clearly address issues of inferential processes and simultaneously show the way metonymy motivates and constrains the selection of the relevant sub-domains. The selection of sub-domains seems to be governed by some very general cognitive principles which favour the controlling entity over the controlled entity (as in the latter example) or the part over the whole (as in the former example). This is to say that not only words (Hoey 2005), but even concepts are strongly primed to occur in very special environments and less strongly to occur in others. Priming is thus facilitated when two concepts are sufficiently contiguous (i.e., spatially, temporally, causally, etc.). As we have seen, any explanation of the pervasiveness of conceptual metonymy is required to be psychological or even mental because, as has become apparent in the brief presentation, conceptual metonymy is fundamentally a cognitive concept. What has to be accounted for is the recurrent 'co-occurrence' of concepts: if they were stored in our minds separately, the kinds of inferences activated would be inexplicable. As a concept or clusters of concepts are acquired through encounters with them in speech and writing, they become cumulatively and incrementally loaded with the contexts in which they are encountered.
6. Conclusion In this paper, we have analysed an area of linguistics and neuro-anatomy in which conceptual metaphors and metonymies play a central role. The deliberations presented so far provide additional evidence in favour of the claim that conceptual metaphors and metonymies are linguistically and psychologically real, as the vast amount of relevant literature additionally shows (De Knop et al. 2005); and the partial prevention of complete disorder and confusion seems to be their major raison d'etre as seen from neuroanatomy. Conceptual metaphors and metonymies are closely intertwined with two cognitive principles, i.e., the similarity and the contiguity principle, which help to account for different conceptual entities: whereas the similarity principle, operative in conceptual metaphors, accounts for the identity and/or similarity of the properties of objects and situations, the contiguity principle, operative in conceptual metonymies, accounts for the identity and/or conceptual closeness (e.g. from a part of something to the whole, from cause to effect, etc.) of individuals and events. The concept itself or a cluster of related concepts is seen as a stabilised set of experienced exam-
158
Rainer Schulze
pies fit for linguistic expression: The notion of stabilisation or even flexibility carries very strong implications here since we want to argue that the internal similarity and contiguity of some representative set of examples is neither increased or decreased by the addition of new examples. In order to maintain and secure stability or flexibility of some increasingly and incrementally growing representative set, a new example of use of the expression that does not match the essential requirements of the set has to be seen as a new example constituting a new set which is metaphorically or metonymically linked to the old set. Thus, we are inclined to presuppose the existence of non-metaphoric and non-metonymic terms that can be transferred to new extended uses. In a nutshell, conceptual metaphors and metonymies presuppose an already stabilised concept in long-term memory and a conventionalised use of the expression for this very concept. Two main possibilities for the modification of both linguistic and conceptual categories (due to sufficient flexibility and despite inherent stability) include domain expansion and domain reduction or broadening and narrowing respectively; similarly, both can be seen as conceptual processes brought about by contextually or situationally available information about the focus of attention, desires, interests, etc. Some of these observations seem to have very clear correspondences in neuro-anatomy where the processing of nerve stimulations and neuronal activity are the prime focus of investigation. This is to say that neuronal patterns seem to stabilise because of the strengthening of connections between neurons; strengthened patterns seem to be paralleled by more or less stable conceptual indicators which get activated from certain input, and becoming activated implies that the input becomes more or less confirmed. This is also to say that for linguistic expressions being in operation, the association of conceptual indicators with neuronal indicators is an indispensable prerequisite (e.g. Bartsch 2003). Broadly speaking, conceptual metaphors and metonymies are bodily motivated or are embodied: source and target domain correlations seem to be embodied in our neuro-anatomy, and source domains seem to arise from sensori-motor experiences; in the world of experiences, repeated experiences contribute to the increased 'connectivity' of source and target domains. Frequency seems to be the relevant factor for the explanation of the conventionalisation of inference, a process applied by the language user in order to cope with old and new members of linguistic and conceptual categories. Seen from that point of view, memory seems to interact with other mental phenomena such as sensory perception, logical or inferential thinking or cognition, learning, etc. If memory can be seen as a huge asso-
On the reduction of complexity
159
ciative network of semantic concepts, it can be grasped as a (possibly problem-solving) device for ordering chaos, and conceptual metaphors and metonymies help to overcome the imminent state of complete disorder and confusion.
References Anderson, John R. 1983 The Architecture of Cognition. Cambridge, MA: Harvard University Press. Atkinson, Richard C., and Richard M. Shiffrin 1967 Human memory. In Human Memory: Basic Processes, Gordon H. Bower (ed.), 89-195. New York: Academic Press. Baddeley, Alan 1986 Working Memory. Oxford/New York: Oxford University Press Baddeley, Alan, Martin A. Conway, and John P. Aggleton (eds.) 2002 Episodic Memory: New Directions in Research. New York: Oxford University Press. Barsalou, Lawrence W. 1982 Context-independent and context-dependent information in concepts. Memory and Cognition 10: 82—93. Bartlett, Frederic Charles 1967 Remembering: A Study in Experimental and Social Psychology. Cambridge: Cambridge University Press. Bartsch, Renate 1998 Consciousness Emerging: The Dynamics of Perception, Imagination, Action, Memory, Thought, and Language. Amsterdam/Philadelphia: John Benjamins. 2003 Generating polysemy: Metaphor and metonymy. In Metaphor and Metonymy in Comparison and Contrast, Rene Dirven, and Ralf Pörings (eds.), 49-74. Berlin/New York: Mouton de Gruyter. forthc. Memory and Understanding: Proust's A la recherche du temps perdu in the Light of Dynamic Conceptual Semantics. Amsterdam/ Philadelphia: John Benjamins. Bower, Gordon H. 1981 Mood and memory. American Psychologist 36: 129—148. Bower, Gordon H., and Paul R. Cohen 1982 Emotional influences on memory and thinking: Data and theory. In Affect and Cognition, Margaret Sydnor Clark, and Susan T. Fiske (eds.), 291-331. Hillsdale, NJ: Erlbaum.
160
Rainer Schulze
Claudi, Ulrike, and Bernd Heine 1986 On the metaphorical base of grammar. Studies in Language 10 (2): 297-335. Collins, Allan M., and M. Ross Quillian 1969 Retrieval times from semantic memory. Journal of Verbal Learning and Verbal Behavior 8: 240-247. Cowan, Nelson 1985 Attention and Memory: An Integrated Framework. Oxford/New York: Oxford University Press. De Knop, Sabine, Rene Dirven, Carlos Inchaurralde, and Rainer Schulze (eds.) 2005 Bibliography of Metaphor & Metonymy. Amsterdam/Philadelphia: John Benjamins. Dirven, Rene. 2002 Cognitive linguistics. In The Linguistics Encyclopedia, Second Edition, Kirsten Malmkjaer (ed.), 76—82. London/New York: Routledge. 2003 Metonymy and metaphor: Different mental strategies of conceptualisation. In Metaphor and Metonymy in Comparison and Contrast, Rene Dirven, and Ralf Pörings (eds.), 75—111. Berlin/New York: Mouton de Gruyter. Freeman, Linton, A. Kimball Romney, and Sue Freeman 1987 Cognitive structure and informant accuracy. American Anthropologist 89: 310-325. Fuster, Joaquin M. 1995 Memory in the Cerebral Cortex: An Empirical Approach to Neural Networks in the Human and Nonhuman Primate. Cambridge, MA: The MIT Press. Gazzaniga, Michael S. (ed.) 2000 The New Cognitive Neurosciences, (Section VI, Memory), 2d ed. Cambridge, MA: The MIT Press. Gernsbacher, Morton Ann (ed.) 1994 Handbook of Psycholinguistics. San Diego, CA: Academic Press. Goatly, Andrew 1997 The Language of Metaphors. London/New York: Routledge. Hoey, Michael 2005 Lexical Priming: A New Theory of Words and Language. London/ New York: Routledge. Jakobson, Roman 2003 The metaphoric and metonymic poles. In Metaphor and Metonymy in Comparison and Contrast, Rene Dirven, and Ralf Pörings (eds.), 41—47. Berlin/New York: Mouton de Gruyter. Kandel, Eric R., James H. Schwartz, and Thomas M. Jessell 2000 Principles of Neural Science (Section IX, Language, thought, mood, learning, and memory), 4th ed. New York: Mc Graw-Hill.
On the reduction of complexity
161
Klix, Friedhardt 1980 On structure and function of semantic memory. In Cognition and Memory, Friedhardt Klix, and Joachim Hoffmann (eds.), 11—25. Amsterdam: North-Holland. Kövecses, Zoltän 2002 Metaphor: A Practical Introduction. Oxford: Oxford University Press. Lakoff, George, and Mark Johnson 1999 Philosophy in the Flesh: The Embodied Mind and Its Challenge to Western Thought. New York: Basic Books. Luck, Steven J., and Nancy J. Beach 1998 Visual attention and the binding problem: A neuro-physiological perspective. In Visual Attention, Richard D. Wright (ed.), 455-478. New York: Oxford University Press. Paivio, Allan 1986 Mental Representations: A Dual Code Approach. New York: Oxford University Press. Pinker, Steven, and Jacques Mehler (eds.) 1988 Connections and Symbols. Cambridge, MA: The MIT Press. Pinsky, Manuela 2002 The management of attention. MA Thesis, University of Hanover. Pishwa, Hanna 2003 Accurate fuzziness as constructive reduction in communication. In Text, Context, Concepts, Cornelia Zelinsky-Wibbelt (ed.), 299-331. Berlin/New York: Mouton de Gruyter. Ruiz de Mendoza Ibänez, Francisco J., and Olga I. Diez Velasco 2004 Metonymie motivation in anaphoric reference. In Studies in Linguistic Motivation, Günter Radden, and Klaus-Uwe Panther (eds.), 293-320. Berlin/New York: Mouton de Gruyter. Schacter, Daniel L., and Endel Tulving 1994 Memory Systems. Cambridge, MA: The MIT Press. Smith, Edward E., Andrea L. Patalano, and John Jonides 1998 Alternative strategies of categorization. Cognition 65 (2—3): 167— 196. Sperber, Dan, and Deirdre Wilson 1986 Relevance: Communication and Cognition. Oxford: Blackwell. Squire, Larry R., and Daniel L. Schacter 2002 Neuropsychology of Memory, 3rd ed. New York: Guilford Press. Stamenov, Maxim, and Elena Andonova 1998 Attention and language. In Handbook of Pragmatics, Jef Verschueren, Jan-Ola Östman, Jan Blommaert, and Chris Bulcaen (eds.), 1-15. Amsterdam/Philadelphia: John Benjamins.
162
Rainer Schulze
Tomasello, Michael 2003 The key is social cognition. In Language in Mind: Advances in the Study of Language and Thought, Deidre Gentner, and Susan GoldinMeadow (eds.), 47—57. Cambridge, MA: The MIT Press. Tulving, Endel 1972 Episodic and semantic memory. In Organization of Memory, Endel Tulving, and Wayne Donaldson (eds.), 381-403. New York: Academic Press.
Chapter 6 Making ends meet Uwe Multhaup
1. Language, memory, culture, and learning The academic training of foreign language teachers, if we wish them to have solid theoretical background knowledge, raises a number of questions that require a realistic concept of how language knowledge is acquired, stored, and retrieved from memory for communicative purposes. Among them are: (1) What is the relation of cognitive development and conceptual knowledge to language knowledge? (2) Is there a fundamental difference between first and second language learning? (3) Do (second) language learners need explicit rule knowledge or is implicit learning supported by plenty of (comprehensible) language input all they need? (4) What role do motivation, understanding, culture, and literature play in language learning? These questions have been addressed by different theories of language and learning. I divide them for the present purpose into three types of theories (because there are many variations of the theme). The first type are behaviouristic theories, the second type are innatist theories, and the third type comprises approaches like the constructivist, or interactionist, or connectionist, or emergentist approach. If we want to compare their relative merits from a theoretical as well as practical perspective, we must require them to be theoretically consistent and compatible (a) with what we know from the cognitive sciences about the architecture of the human brain and the neural basis of mental processes, (b) with the evidence from empirical studies of first and second language acquisition, (c) with the findings of experimental studies of language processing in psychology, (d) with linguistic models of the interplay of phonological, lexical, grammatical, and pragmatic aspects in the learning and use of language, (e) with what we know from decades of (documented) practical experience with the pros and cons of different approaches to second language teaching and their changing scientific backgrounds. Viewing and bringing together such different strands of research may be trying to make ends meet because up to the recent past the different approaches tended to lead a life in splendid isolation. But it may be argued,
164
Uwe Multhaup
too, that we paid dearly for letting them exist in isolation; and recent developments in psychological, neuroscientific, anthropological, and linguistic research warrant the hope that new insights allow to end their coexistence in isolation.1
2. Stocktaking and definitional issues The seat of knowledge, including language knowledge, undoubtedly is the human brain. And the sets of questions that the three types of language acquisition theories above mentioned are confronted with, and give different answers to, comprise issues like: What do declarative and procedural knowledge have to do with explicit and implicit knowledge? And what, in turn, do they have to do with episodic memories and semantic knowledge (or memory)? What roles do imitation and cognition play, and how do they relate to conceptual and semantic knowledge, to cognitive development, and first language acquisition? Is there a language module with an innate knowledge of grammar or do we learn languages like other skills and culturally handed down knowledge? How can we explain the many similarities and differences between first and second language learning? How is second language knowledge represented in the human brain? And what role do motivation and emotion play in that context? What are the roles of short term (working) memory and long term memory? Are the abstract algebraic rules which serial symbol processing computers use a reliable model for simulating mental processes, or are connectionist models, with their algorithms that result from associations and frequent trial-and-error processes and pattern-finding abilities, a better model for understanding neural parallel distributed processing and language learning processes? Once again: Asking all these questions at the same time may be asking too much, but from the perspective of a language teacher they represent the background against which solid research for the improvement of teaching practice must be done. Neither in practice nor theory trusting the wrong model is a good move. The sorry fate of teaching methods that followed the claim of early behaviorist theories, which considered mindless imitation, habit formation, and the instant correction of formal errors (pattern drills) necessary for language learning, is a good example of why one should not trust theories blindly. We do well, therefore, to subject more recent approaches to a critical scrutiny, which includes innatists theories, and approaches that underline the importance of social interactions and (real
Making ends meet
165
life like) tasks, combined with form- and meaning-oriented activities in the classroom (Doughty and Williams 1998). Many promising new developments in the cognitive sciences started with viewing the brain as a computational device that registers, stores, and reorganizes the flux of the incoming sensory data, transforming the chaotic masses of information into categorized knowledge, which is the precondition for the purposive recall of knowledge and acquired skills for solving problems of all kinds. Early models of cognitive processes were, not surprisingly at the time, modeled on the architecture and processing principles of classical serial symbol processing computers that work with propositional representations of knowledge and a predefined set of algebraic rules that are both meaningless themselves and insensitive to the meanings of the elements they algorithmically combine. Initially less known were so-called connectionist computers that do not need an a priori defined set of rules but work with a probability calculus and create their own processing routines by a large amount of trial-and-error processes with associative and frequency based rules of learning. They are typical of the self-organizing quality of neural (connectionist) networks (Freeman 2001).2 We must remember, however, that trial and error processes presuppose 'something' that must declare something to be an error. With regard to the brains of living organisms and language acquisition the question is, therefore, who or what decides what constitutes an error. That takes us to the classical distinction between declarative and procedural knowledge as the two types of knowledge needed for computational processes. Declarative knowledge is mostly defined as factual knowledge, and procedural knowledge as rule knowledge. With regard to the question how humans acquire the two types of language knowledge, the answer of early behavioristic theories was that it results from successful imitation of exemplary behavior in others, or by blind trial-and-error processes which may by chance produce 'good results' that are then reinforced by repetition. In his criticism of behaviorist models of language learning Chomsky demonstrated, however, that mere reproductive imitation of adult models cannot do the trick because mindless imitation and habit formation cannot explain how children learn to produce grammatical sentences which they never heard before. That, among other things, was the basis of the new generative approach in linguistics that became dominant in the latter part of the 20th century and characterized itself as a rationalistic study of language. Because generative linguistic studies are rationalistic in a very formalistic sense, they have a problem explaining how humans acquire their knowledge of grammar. They (must) argue that humans are innately endowed
166
Uwe Muhhaup
with knowledge of grammar (Universal Grammar, UG) because of what they call the logical problem of language acquisition (Chomsky 1980; Pinker 1989). The logic behind their argument is that if the brain works with fixed rules (like serial symbol processing computer do), it cannot rationally be explained how children, who are constantly exposed to incorrect uses of grammar in adult speech, distil from such corrupt evidence the correct set of grammar rules of their first language without receiving 'negative evidence', that is explicit correction of grammatically faulty sentences by their care givers. For innatists the only way out of this logical trap is postulating that children do not learn grammar like they learn other things in life, but acquire grammar by virtue of an innate knowledge of grammar principles, and by setting UG parameters on the values of the grammar of the first language they come in contact with. With that goes that innatists distinguish a 'core' (grammar) from a 'periphery' (e.g. lexicon) in language acquisition processes. A disconcerting aspect of that theory is, however, that it must declare that words and their meaning, as well as pragmatic aspects of language use, and other 'periphery' aspects of language, are learnt on the basis of associative (connectionist) principles of learning, i.e., on the basis of exactly those principles of learning which the theory declares useless for grammar acquisition.3 The postulated dichotomy between core and periphery in language acquisition is known as the dual process approach. An alternative to it is spelled out by critics of the innatist position who show that a single process approach provides absolutely satisfactory answers to the question how children learn the grammar of the language of the social group they grow up in, and they put forward phylogenetical, neuroscientific, and psycholinguistic arguments that support their theory (Bybee 1985; MacWhinney 2002; Tomasello 2003). They argue that language learning is not an isolated association-making and induction process by a machine that disinterestedly engages in processing linguistic symbols, but integrated with the development of other human cognitive and social-pragmatic skills. It is guided by pattern finding abilities in ways that the early Behaviorists (and Chomsky in his critique of them) did not envisage. If that is taken into account, it becomes clear that imitation and cognition are not two mutually exclusive but two mutually supportive strategies of knowledge construction. Apart from that a number of sources of negative evidence can be made out which generative approaches declare nonexisting. Much of the discussion of the use of negative evidence has tended, for instance, to underestimate the overall information-processing abilities of the child. Generative accounts ignore that children do not rely on formal
Making ends meet
167
linguistic rules, but rather on a combination of cues·, cues include contextual evidence of the intended meanings of messages, overt correction by other speakers, recasting, expansions, clarification questions, topic continuation, gesture, and intonation. If a child puts together all of this information and thus establishes an overall 'negative feedback index' for each utterance, there is enough feedback to tag specific sentences as grammatical or ungrammatical (MacWhinney 1997). Theories that take such a holistic approach to language learning do not need to assume an innate knowledge of grammar because they view linguistic processing not separate from the uses of language in social contexts. Under that perspective the putative logical problem of language acquisition disappears. Two sets of human skills which are of particular importance in that context are (1) various aspects of intention-reading skills, which emerge in children before they produce their first words and are fundamental to all human communication, and which make it possible for them to take the perspective of another person and to understand her/his intentions (a basic aspect of reading, enjoying, and understanding literature, too); (2) various kinds of pattern-finding skills that enable humans to classify the multifarious aspects of their environment and reduce the masses of input data to a manageable number of kinds of things and events (Tomasello 2003). Guided by such basic skills, children construct their knowledge of language in interaction with the members of their social group (Slobin 2001). With regard to second language learning the controversy between nativist and functionalist (constructivist) approaches is of interest because second language teachers, who cannot hope that a UG does all the difficult work for their learners, wonder which theory they should rely on for guidance. They know from experience, for instance, that cognitive processes, including the processing of linguistic elements, do not take place without motivation, which is why a sound theory must contain an integrated appraisal of the emotional side of language learning rather than treating motivation as an optional extra. But UG theories are tacit on that point because for them 'language contact' does the trick. Furthermore, the roles of conceptual and cultural knowledge in language learning must be (reconsidered. Recent studies, which were conducted on a broad intercultural basis, underline that we need to take a fresh look at the concepts of linguistic universale and linguistic relativism. The evidence is that conceptual and linguistic knowledge interact in their development; there is more relativism than universalism. For example, only some languages have tense, and dif-
168
Uwe Multhaup
ferent systems of tense interact with very variable notions of aspect in different languages (Behrens 2001). The central problem here is how do children, from an initially equivalent base, end up controlling often very differently structured languages? In other words, how do children successfully diverge in order to control the local language, whatever its idiosyncracies? The Chomskyan tradition, with its emphasis on an innate syntactic ability, has led us to seriously underplay the extent and depth of semantic variation across languages. (Bowerman and Levinson 2001: 11)
Keeping that in mind, I return to the distinction of declarative and procedural knowledge, and what they mean for language learning. For the ordinary reader understanding the function of declarative and procedural knowledge in language learning is complicated by the fact that many authors confound these two terms with another pair of terms, explicit and implicit knowledge, which brings into play the role of consciousness in learning (N. Ellis 1994). The confusion is reflected in a statement by J.R. Anderson, who was one of the first to distinguish declarative and procedural knowledge, and who writes: "Declarative knowledge is explicit knowledge that we can report and of which we are consciously aware. Procedural knowledge is knowledge of how to do things, and it is often implicit." (1995: 308). Correspondingly some researchers (Bialystok 1984) argue that language knowledge must first be explicit knowledge before, by much practice, it can turn into implicit knowledge. The latter is, in turn, considered a precondition of the development of the fast activation processes (procedural knowledge) that are typical of language use. That theory, however, rather complicates matters than making them clearer. It, for instance, raises the question how to classify knowledge that we have but cannot verbalize at a particular moment in time. It, too, makes us wonder whether incidental learning must be explained as resulting from conscious (explicit) knowledge that is turned into implicit knowledge by plenty of practice. Intuitively one would tend to say 'No'. But it seems better not to mix the information processing side of the debate (that is, the declarative procedural dichotomy) with the psychological distinction of explicit and implicit knowledge. The explicit-implicit knowledge issue is also discussed by Paradis (1994: 401-402) whose conclusion is: It appears that what has been acquired incidentally is stored implicitly and can only be evidenced through behaviour (performance). On the other hand, some deliberately learned tasks seem to gradually become automatic through prolonged practice...What is automatised is not the explicit knowl-
Making ends meet
169
edge of the rule, ... but its application. ... What is automatised is the ability to produce the correct sequence of words in their proper inflectional form, whatever the processes that have been used to reach this result.
We here come up against a fundamental discussion in the modern cognitive sciences, though, which refocuses the old philosophical debates about the dualism of mind and body. And it is noteworthy that neuroscientists started to discuss embodied cognition as an answer to the old questions. "Instead of emphasizing formal operations on abstract symbols, the new approach foregrounds the fact that cognition is, rather, a situated activity, and suggests that thinking beings ought therefore to be considered first and foremost as acting beings." (Anderson 2003: 91). The new approach can be seen to stand in the tradition of and further develop social interactionist theories like Vygotsky's (1962) social development theory, Bandura's (1977) social learning theory, and Bruner's (1986) constructivist theory. To appreciate the essence of such theories and what divides them from propositional linguistic accounts of language and learning requires that we look at the architecture of the human brain and the neurophysiological nature of the cognitive processes taking place there. The point of that exercise is to show that in the brain declarative and procedural knowledge of language are not stored in separate compartments but in a way that makes (physical) forms have (cognitive) functions. That contrasts with the processing principles of serial symbol processing computers that store abstract rules (programs) and content (personal data files) on separate partitions of the hard disk. Their computational rivals, connectionist (constructivist, emergentist) theories, focus on how various associative, pattern finding and meaning oriented activities enable the mind (machine) to generate knowledge of grammar. They are of interest from a pedagogical perspective because they provide a theoretical underpinning to the age old experience that we learn from examples rather than from abstract rules. They explain why inductive approaches are mostly more successful than deductive teaching. And they are helpful in evaluating the issues that underlie the debates over whether comprehensible input (Krashen 1985) is all that second language learners need, or if some kind of input enhancement (Sharwood-Smith 1993) and the inclusion of some kind of form oriented teaching (Long and Robinson 1998) help. The experience of teachers is that visible, audible, and tangible patterns in the world facilitate the construction of patterns in the mind, and that goes with a holistic approach that can look back on a very long tradition in educational theory (Herbart and Pestalozzi, for instance).
170
Uwe Multhaup
3. Forms and functions: The neural basis of cognitive processes In the course of evolution three 'storeys' of the human brain developed: the brain-stem, the limbic system, and the neocortex. The brain-stem is the innermost and phylogenetically oldest part; it is responsible for the vegetative functions of the organism. The limbic system is the second oldest part; it encompasses the brain stem and developed in the mammalian phase of evolution in which caring for the offspring and social bonds among group members became important. It correspondingly subjects all waves of neural activity that pass through it to an emotional evaluation. Some researchers therefore call it the emotional brain (Goleman 1995). The neocortex lies crescent-shaped over the limbic system, interacting with it in many ways. It is the youngest and biggest part of the brain, and responsible for higher cognitive functions. As is well known, the modular organization of the brain does not end here. The left side of the body is controlled by the right hemisphere of the cortex, and the right side is controlled by the left hemisphere. Furthermore: Visual, auditory, olfactory, somatosensory, and motoric data are processed in separate lobes (modules) which, however, are interconnected by a vast neural network and interact in many ways. The network is made up of about 10 billions of neurons and the nerve fibres that connect them. And because a specific bit of experience (knowledge) may well comprise visual, auditory, somatosensory, emotional and other types of information, the corresponding network of cell assemblies that 'represent' it may be distributed over many modules of the brain. The parallel activation of the distributed cell assemblies activates that piece of knowledge. The activation spreads along the axons of the neurons and their synapses, but there are different types of axons. Sets of specialized modular (short-range) perceptual, motor, memory, evaluative, and attentional processors (neurons) work automatically (reflex like, with no conscious control), but they do not 'know' what is going on in the other parts of the brain. In contrast neurons with long-range axons are mobilized in effortful tasks for which the specialized but 'stupid' modular (local) processors do not suffice. Long range neurons selectively mobilize or suppress the contribution of modular neurons (Dehaene et al. 1998); they control cognitive activities and serve a central executive authority. That must be kept in mind if we wish to understand the roles of working memory and long term memory, and what they have to do with declarative and procedural knowledge, and why there are processing constraints that have to do with the interaction of working and long term memory.
Making ends meet
171
Technically the exchange of information among the different parts of the brain is ensured by neurotransmitters and neuromodulators, i.e., electrochemical substances like acetylcholine, dopamine, epinephrine (adrenalin), norepinephrine, serotonin, glutamate, and histamine. They are instrumental in the storing and spreading of neural activation patterns along the paths of activation that develop as a result of experience and knowledge management. Different types of neurotransmitters lead to different kinds of emotional and motivational responses to input data. Dopamine, e.g. is an inhibitory neurotransmitter involved in controlling movement, but it also modulates mood and plays a central role in positive reinforcement; glutamate is a major excitatory transmitter that is associated with learning and memory. And the limbic system as a whole plays a crucial role in setting free different 'cocktails' of neurotransmitters. It does not only add some emotional overtones to incoming sensory data, it also controls attention and is instrumental in creating long-term potentiation (LTP) in sets of neural cell assemblies.4 LTP is the long-term strengthening of the synapses between two neurons (patterns of neurons) that are activated simultaneously. That is the basis of learning. Of course, memorization does not depend solely on the LTP of a few synapses in the hippocampus. Several studies indicate that the major neuromodulation systems in the brain also greatly influence synaptic plasticity. These neuromodulators are part of the molecular mechanisms through which factors such as motivation, rewards, and emotions can influence learning. That explains why, in teaching for instance, personal factors, stimulating learning environments, choice of interesting topics, and similar motivational 'gimmicks1 matter no less than rationally well organized instruction. And because living beings have interests and personal likes and dislikes, they selectively react to objects and events in their environment. From a cybernetic perspective that is the basis of the selected self-organization in neural networks Rocha (1996). In that sense neuronal structure is information, form is function; and therefore declarative and procedural knowledge must not be conceived of as 'rules' and 'contents' that are stored in two separate compartments of the brain. Procedural knowledge is contained in the LTP that subsequently makes possible the reciprocal activation of sets of cell assemblies, including sequence sets that remember the temporary sequence of events (Pulvermiiller 2002). The latter are of interest for understanding the neural basis of syntax, because, in speech, syntax is related to the temporal sequencing of lexical items. Because self-organization in neural networks is not without guidance from instincts, interests, goals, and intentions, memories do not contain
172
Uwe Multhaup
'copies' of real-life scenes but traces of how we viewed them. Our perceptions of the world contain personally 'distorted' views of it. When we recall stored knowledge, what we do is reconstruct views of the world, not activate 'true images' of it. Our memories are not necessarily faithful reconstructions; at times we remember what we want to remember (see), not what we really did see. In the course of the lifetime of a person his/her knowledge undergoes many changes. We do not only constantly add new knowledge to existing knowledge structures, we also regularly reorganize them and optimize the processes that allow their fast and efficient use (Rumelhart and Norman 1978). We may say, therefore, because cognition has its seat in the organic matter of the brain, that metamorphosis, which is typical of all organic life, is typical of cognitive processes, too. No rules are permanently resistant do modification. Against that background I next discuss what that means for the role of episodic memories in relation to semantic knowledge, and for the roles of short term (working) memory in relation to long term memory in language learning.
4. Episodic and semantic memory In a very general sense all learning starts with episodic memory, also called autobiographical memory. It lets us remember the events that we personally experienced at a specific time and place, including their emotional overtones. Contents of episodic memory can be transformed into semantic knowledge by categorization processes that home in on common features of various episodes and extract them from their context. They are also called decontextualization processes (Barrett 1995). There is a gradual transition from episodic to semantic memory, in which the latter reduces its sensitivity to particular events so that the information about them can be generalized. Conversely, our personal experiences are dyed by the concepts stored in semantic memory because new situations are interpreted in the light of what we know from previous experience of that type of situation. In other words, pre-knowledge takes a shaping influence on what and how we understand things. Episodic and semantic memory are not two isolated entities but share neural pathways via 'junctions' (which can be 'blocked', however, by a central control). From a neuroscientific perspective it is important to note that no nerve impulse ever encounters a 'dead end' in the brain. "The point where it arrives in any part of the brain is always a potential point of departure toward other neurons. This assemblage of billions of
Making ends meet
173
circuits that loop back on themselves makes it difficult to have 'entirely rational thoughts' or 'purely emotional reactions'." 5 In that context it is important to note that all language learning starts with using unanalyzed chunks of language for communicative purposes, which indicates there are pragmatic, usage oriented learning strategies at work. Such (lexical) chunks are the linguistic equivalent to episodic memory in cognitive development. The spatio-temporal contiguity of perceived objects and the words used by others to refer to them is typical of early word learning. In cognitive processes, contiguity leads to the formation of connections between a particular referent and a new name. But the first 'initial mapping' is typically fast, sketchy, and tentative; however, as the child is exposed repeatedly to new instances of an old word, the semantic range of the referent slowly widens (MacWhinney 2002). The next question is how children take the step from semantic memory to knowledge of grammar. The general direction of the cognitive principle underlying that feat was indicated already by Behagel (1932): "What belongs together mentally is placed together syntactically." (quoted in Slobin 2001:441). From an information processing perspective, chunks of language are, to start with, declarative knowledge. But they are simultaneously the source from which, guided by the intention to use that knowledge for communicative purposes, procedural knowledge in the sense of knowledge of grammar is developed. The construction of grammatical knowledge necessitates cognitive processes that abstract from the episodic connotations of memorized chunks of language, by contrastive comparison, variably combinable parts, which lead to the detection of grammatical and lexical morphemes, for instance. That means that declarative knowledge does not become procedural knowledge simply by frequent but mindless repetition of given language forms; the construction of procedural (grammatical) knowledge rather requires the cognitive analysis of the (pragmatic) functions of the lexical units detected in contexts of their use, together with memories of which other words they tend to occur with (collocations, word order). If we wish to understand what makes chunks of language (declarative knowledge) useable and creates procedural knowledge, we must include contextual, pragmatic, and affective aspects in our considerations. Such social factors feed back into the cognitive construction processes in the individual mind, much in the sense of Vygotsky's (1962) social development theory, or Burner's (1986) constructivist theory. There is a dynamic relationship that connects social developmental conditions, which confront us with patterns in the world, and individual cognitive constructions, which
174
Uwe Multhaup
create patterns in the mind. They progressively allow us to cope with ever more complex cognitive matters. In other words, there are both mind-external and mind-internal constraints on the construction of (procedural, grammatical) knowledge of language, and at each stage pre-existing knowledge structures delimit the kind of challenges which cognition can next cope with. Against that background I next discuss the roles of short term (working) memory in relation to long term memory and their respective functions in language learning and language use.
5. Capacities, competitions, coalitions, and entrenchment Considering the roughly 10 billions of neurons in the human cortex, its storage capacities are enormous. With regard to learning, skills, and problem solving the crucial point therefore is how to keep the vast number of storable input data under control. This is where the interplay of short-term and working memory (Baddeley 1986) with long-term memory comes into play. Incoming sensory data are registered as activation patterns for milliseconds, and up to a few seconds, in ultra-short or short-term memory. Most of them fast decay without leaving any permanent traces in memory. Shortterm memory may therefore be called a mere temporary receptacle of incoming data. But factors like attention and rehearsal can turn such temporary activation patterns into LTP, and thus cause learning. The stored knowledge can be retrieved from long-term memory and used in problem solving tasks. Working memory is where such a matching of present goals and intentions with stored knowledge for problem solving purposes is brought about. Anatomically the prefrontal lobe plays a crucial role in the control of such cognitive activities (Baddeley 1986). But because it cannot do its work as central processor without enlisting the help of specialized processors in other parts of the brain, recent studies speak of cognitive activities as taking place within a global workspace (Dehaene, Kerszberg, and Changeux 1998). Neural activities in that workspace are characterized by constant competitions and coalitions among cell assemblies that receive activation from external and internal sources in parallel and vie with each other for attention by the central processor (Bates and MacWhinney 1989). It is well known that the processing capacity of working memory is limited. It can be viewed as the total amount of information an individual can maintain in a ready-to-use state during cognitive tasks. That means that
Making ends meet
175
there is no strict line of demarcation between memories and thoughts. Thoughts are selectively activated memories, which are stored in long term memory (with its seemingly limited capacities), and which can be modified by experience. For that reason, problem solving tasks which can fall back on stored procedural routines for tackling certain problems are executed faster than tasks for which no memories of how to do something exist. They relieve working memory of the necessity to do the computations anew (schema knowledge). Procedural knowledge cannot be restricted to the activation of existing knowledge structures, however; it must also include knowledge of how to create and modify existing knowledge structures. Here, interaction of episodic with semantic memory (as it was discussed further up) comes into play, which in learning psychological terms leads to transfer of learning. It, too, takes us to view cognitive phenomena like entrenchment, developmental stages and interference from a psycholinguistic and neuroscientific perspective. Entrenchment refers to the fact that established (deeply engrained) knowledge structures and activation procedures make it difficult for new 'ways of thinking' to compete with and gain ground against established procedures. Interference results from entrenchment in a special way. It refers to unwelcome outcomes of a transfer of learning. In second language learning, for instance, interlingual and intralingual errors are distinguished. Interlingual errors are due to interference of first language processing habits with yet to be developed second language processing routines. They can be found on all levels of language, from the phonological, via the lexical, to the grammatical, pragmatic, and cultural levels of language use. One obvious example is that people hardly ever (completely) lose the accent (phonological speech habits) which they acquired in childhood. Intralingual errors are the result of overgeneralizations of 'discovered' rules, like adding -ed to irregular verb forms (*goed), or (for German learners or English) using the 'progressive form' in far too many contexts. If not attended to (corrected) in time, inter- and intralingual errors can lead to fossilization, a specific kind of entrenchment. In that context, interesting messages from empirical studies of second language learning are (a) that adult second language learners have more difficulties with overcoming first language speech habits (procedures) than young children; (b) that neither explicit rule instruction nor plenty of implicit (inductive) rule learning (i.e., pattern practice) prevent errors of the above types, even if instructors rely on carefully constructed syllabi that use formal linguistic criteria for grading (including a contrastive analysis of
176
Uwe Multhaup
first and second language structures). Grading aims to take learners from 'easy to process simple structures' to 'difficult to process complex structures'; (c) that all learners, irrespective of age and target language, make in core areas of their developing grammatical knowledge, similar errors in (universally similar) developmental sequences, which explicit instruction cannot prevent, at least as far as the fast processing procedures typical of spoken language in communication are concerned (Pienemann 1998). With regard to (a) nativist theories proffer biological arguments which postulate maturational constraints as an explanation for the problems of second language learners (the critical period hypothesis; Lenneberg 1967). That goes with a stipulated UG that is set on language specific values only once in life (or, up to a critical age, possibly twice). The persuasiveness of their arguments is undermined, however, by facts like: (1) that some people, like the famous Polish born writer Joseph Conrad (Teodor Josef Konrad Korzeniowski) who learnt English only after he had well passed the 'critical period', acquire full 'native speaker competence' in the use of a second language grammar; (2) that it is well known from other fields of human life, like sports for instance, that some skills will not be acquired to perfection if they are not practiced intensively early, which however does not seem to justify postulating a 'tennis module' in the brain, for instance. That makes attractive theories on an information processing basis which explain phenomena like the 'critical period' by processing constraints without needing to take recourse to the strong postulation of an innate grammar acquisition device. The candidates are constructivist and emergentist theories. With regard to (b), the use of explicit rule instruction in second language learning, the nativist position practically leads to theories like the 'input hypothesis' (Krashen 1985). That approach points out that explicit rule instruction does not help. It favors the communicative approach to language teaching and error tolerance. Twenty years of experience with communicative language teaching made clear, however, that it does not lead to better results. Apart from that it confronts foreign language teachers with the problem that learners frequently want to know the 'rules' behind the language game. That is were (c) theories come into play that, from an information processing perspective and with neuroscientific backing, explain how from initially fuzzy concepts and episodic memory, via flexible processes of reanalysis of memorized chunks of language and the establishing of fast (rule like) procedures of activation among memorized lexical items, and can explain why there are universal developmental sequences in all second
Making ends meet
177
language learning processes, which differ however from traditional 'grammatically graded' syllabi, which prescribe which forms must be learnt when. The new approaches acknowledge that at specific stages of second language learning processes certain types of errors are unavoidable, but they too explain why that is so, and why when something must be done about specified types of errors to avoid entrenchment (fossilization). For research that opens new avenues of exploring roles for form-oriented instruction (Doughty and Williams 1998). Constructivist or emergentist theories provide answers, too, to how the doubtlessly existing individual differences among learners can be theoretically reconciled with the fact that there are universal similarities in second language learning processes (Pienemann 1998). In its most general form their answer is that we must distinguish route and rate in language acquisition. Learning motivation, social and other environmental factors, bring into play forces that may individually speed up or slow down the rate at which a learner progresses along a universally similar route. Different rates also include different range of the registers and number of words learnt by one learner rather than another. What is responsible for the route in second language acquisition, and its universally similar aspects, I want to briefly outline in the following final chapter.
6. The hierarchy of processing procedures in second language learning processes The use of language in normal conversational exchanges puts higher demands on the speed with which the relevant linguistic constructions must be produced than is the case in writing, for instance. In foreign language teaching, time frames play a role in some other respects, too. A well-known phenomenon is, for instance, that after a number of formal teaching units many students may successfully pass the final written test. Everyone knows, though, that this is no guarantee that they, some weeks or months later, reproduce the tested items error-free in a new test, or - even more interestingly - that they can use them correctly if they get involved in a conversation with a native speaker immediately after the test (Legenhausen 1999). With regard to errors and the hierarchy of processing procedures discussed in this chapter it is important to see that the focus is on what happens in communicative exchanges in spoken speech (conversation), which typically requires fast activation processes in the mind. And if education
178
Uwe Multhaup
authorities declare communicative competence the main goal of foreign language teaching, they mostly have oral communication in mind. The test case are, therefore, the very fast activation processes typical of spoken language, and the demands they put on cognitive processing, that is on the interplay of working memory, with its limited processing capacities, and long term memory, with its potentially rich stock of lexical knowledge and 'ready made' (grammatical) procedures for coping with tasks. The first stage in understanding developmental sequences is that we remember: ... the logico-mathematical hypothesis space in which the learner operates is further constrained by the architecture of human language processing. Structural options that may be formally possible, will be produced by the language learner only if the necessary processing procedures are available that are needed to carry out, within the given minimal time frame, those computations required for the processing of the structure in question. Once we can spell out the sequence in which language processing routines develop in the learner, we can delineate those grammars that are processable at different points of development. (Pienemann 1998: 1).
The second stage is that in conversation speakers typically start from prelinguistic concepts of what they want to say, and first look for lexical items or chunks in their mental dictionary that suit their communicative intentions. Only if they find some can they start arranging them in a syntactically correct form and order, and plan their articulation (Levelt 1989: 24). If no suitable words are found in the mental dictionary, the production procedures are blocked and the result is either silence or activation of other, compensatory communicative strategies. The third stage requires looking at typical features of grammar and text processing. From an information processing perspective the major challenges involved in that task are: generating correct word order (syntax), mapping of spatio-temporal conceptual items to linguistic forms {singularplural, mass-count, tense and aspect), and ensuring agreement (unification) within and beyond phrase boundaries. The latter are important because they are psychologically real units of language processing, which means that unification operations that must transcend a phrase boundary put higher processing load on working memory. With regard to the grammaticality of constructions (word order, agreement, and pragmatically appropriate topic-comment relations) the procedures must fall back on the available pathways in the neural net (including loans from first language processing habits or direct mappings of
Making ends meet
179
conceptual structures onto linguistic surface forms) or stop (and be silent) or effortfully try to create new connections. Without being able to go into details here, that explains why there is a (universally similar) hierarchy of processing procedures, which Pienemann (1998: 87) has described: This hierarchy represents a set of hypotheses about the implicational order in the development of L2 processing procedures. The implicational nature of the hierarchy derives from the fact that the processing procedures developed at one stage are a necessary prerequisite for the following stage: A word needs to be added to the L2 lexicon before its grammatical category can be assigned. The grammatical category of a lemma is needed before a category procedure can be called. Only if the grammatical category of the head of phrase is assigned can the phrasal procedure be called. Only if the latter has been completed and its value is returned can Appointment Rules determine the function of the phrase after which it be attached to the S-node. Only after appointment rules are refined by 'Lemma Functions' can subordinate clauses be formed - with their own structural properties. The predictions about acquisition that can be derived from this hierarchy are based on the following logic: the learner cannot acquire what he/she cannot process. It should be noted that this proposition is different from the notion of processing complexity which assumes that an increased number of processes increases the complexity of a structure.
Note that at each of the successive stages of the developmental processes learner language is characterized by errors typical of that stage. The evidence is, too, that explicit instruction (rule teaching or pattern drills) helps learners overcome their processing deficits only if the focus is on the problems that they are developmentally ready to tackle. That contrasts with the grading principle of grammatical syllabi, which hoped to ensure an errorfree progression from simple to complex forms - trusting that a 'rational' analysis of surface forms and their arrangement from 'simple' to 'complex' was a learning psychologically viable step (which it is not). That does not preclude that learners proceed faster on their route to the next stage in the hierarchy of processing procedures in cases in which their first and second languages offer structurally similar procedures for coping with the verbalization of conceptual knowledge. If that is taken into account, a lot of time and pains spent on formal grammatical instruction and pattern drills may be saved and fruitfully invested in exemplar and task based language learning without turning a blind eye to the fact that in well defined cases instruction does help.
180
Uwe Multhaup
Notes 1. The publications I refer to comprise, for instance, the books by Tomasello (2003), Pulvermüller (2002), Bowerman and Levinson (2001), MacWhinney 1999, and Doughty and Williams 1998. 2. Connectionist processing principles in contrast to classical serial symbolic processing are discussed in Bechtel and Abrahamsen 1991; and in http://www.mind.ilstu.edu/curriculum2/connectionism/connectionism_l.html. 3. Psycholinguists like Hörmann (1978: 320) therefore called the 'language acquisition device', i.e., UG, the deus ex machina of generative linguists. And neuroscientists and biologists like Lieberman (1991) and Deacon (1997) declare it impossible from an evolutionary perspective that a language acquisition device and innate knowledge of grammar could have developed within the short period of time in which the development of homo sapiens diverged from that of other primates. Another disconcerting aspect of UG is that it leaves open if second language learners have access to UG or not. 4. http://www.thebrain.mcgill.ca/flash/i/i_07/i_07_m/i_07_m_tra/i_07_m_tra.html 5. http://www.thebrain.mcgill.ca/flash/i/i_01/i_01_cr/i_01_cr_fon/i_01_cr_fon. html#2
References Anderson, John R. 1995 Learning and Memory: An Integrated Approach. New York: Wiley. Anderson, M. L. 2003 Embodied cognition: A field guide. Artificial Intelligence 149: 9 1 130. Baddeley, Arthur 1986. Working Memory. Oxford: Clarendon Press. Barrett, M. 1995 Early lexical development. In Handbook of child language, Paul Fletcher, and Brian MacWhinney (eds.), 362-392. Oxford: Basil Blackwell. Bates, Elisabeth, and Brian MacWhinney 1989 Functionalism and the competitive model, In The Crosslinguistic Study of Sentence Processing, Brian MacWhinney, and Elisabeth Bates (eds.), 3-73. Cambridge: Cambridge University Press. Bandura, Albert 1977 Social Learning Theory. New York: General Learning Press. Bechtel, William, and Adele Abrahamsen 1991 Connectionism and the Mind: An Introduction to Parallel Processing in Networks. Cambridge, MA: Basil Blackwell.
Making ends meet
181
Behrens, Heike 2001 Cognitive-conceptual development and the acquisition of grammatical morphemes: The development of time concepts and verb tense. In Language Acquisition and Conceptual Development, Melissa Bowerman, and Stephen C. Levinson (eds.), 450—474. Cambridge: Cambridge University Press. Bialystok, Ellen 1984 Strategies in interlanguage learning and performance. In Proceedings of the Seminar in Honour of Pit Corder, A. Davies, C. Criper, and A. P. Howatt (eds.), 37—48. Edinburgh: Edingburgh University Press. Bowerman, Melissa, and Stephen C. Levinson (eds.) 2001 Language Acquisition and Conceptual Development. Cambridge: Cambridge University Press. Bruner, Jerome 1986 Actual Minds, Possible Worlds. Cambridge, Mass.: Harvard University Press. Bybee, Joan L. 1985 Morphology: A Study of the Relation between Meaning and Form. Amsterdam: Benjamin. Chomsky, Noam 1980 Rules and Representations. Behavioral and Brain Sciences 3: 1-61. Deacon, Terence W. 1997 The Symbolic Species: The Co-Evolution of Language and the Human Brain. London: Penguin. Dehaene, St., M. Kerszberg, and J.-P. Changeux 1998 A neuronal model of a global workspace in effortful cognitive tasks. Proceedings of the National Academy of Sciences of the United States 95 (24): 14529-14534. Doughty, C., and J. Williams (eds.) 1998 Focus on Form in Classroom Second Language Acquisition. Cambridge: University Press. Ellis, Nick 1994 Vocabulary acquisition: The implicit ins and outs of explicit cognitive mediation. In Implicit and Explicit Learning of Languages, Nick Ellis (ed.), 211-282. London: Academic Press. Freeman, Walter J. 2001 Self-organizing brain dynamics by which the goals are constructed that control patterns of muscle actions, http://www.cse.cuhk.edu. hk/~apnna/proceedings/iconip2001/papers/3 24a.pdf. Goleman, Daniel 1995 Emotional Intelligence: Why it Can Matter More than IQ. London: Bloomsbury.
182
Uwe Multhaup
Hörmann, Hans 1988 Meinen und Verstehen: Grundzüge einer psychologischen Semantik. 3:e Auflage. Frankfurt: Suhrkamp. Krashen, Steven 1985 The Input Hypothesis: Issues and Implications. London: Longman. Legenhausen, L. 1999 The emergence and use of grammatical structures in conversational interactions - Comparing traditional and autonomous learners. In The Construction of Knowledge, Learner Autonomy and Related Issues in Foreign Language Teaching: Essays in Honour of Dieter Wolff, B. Mißler, and Uwe Multhaup (eds.), 27-40. Tübingen: Stauffenburg. Lenneberg, Eric Η. 1967 Biological Foundations of Language. New York: Wiley. Levelt, Willem J. M. 1989 Speaking. Cambridge: MIT Press. Lieberman, Philip 1991 Uniquely Human: The Evolution of Speech, Thought, and Selfless Behavior. Cambridge, MA: Harvard University Press. Long, Michael H. and P. Robinson 1998 Focus on form: Theory, research, and practice. In ,Issues and terminology', Catherine Doughty and Jessica Williams (eds.), In Focus on Form in Classroom Second Language Acquisition, Catherine Doughty, and Jessica Williams (eds.), 15—41. Cambridge: Cambridge University Press. MacWhinney, Brian 1997 Rethinking the logical problem of language acquisition. Unpublished manuscript http://psyscope.psy.cmu.edu/local/Brian/papers.html 2002 Language emergence. In An Integrated View of Language Development - Papers in Honor of Henning Wode, P. Burmeister, Τ. Piske, and A. Rohde (eds.), 17-42. Trier: Wissenschaftlicher Verlag Trier. MacWhinney, Brian (ed.) 1999 The Emergence of Language. Mahwah, NJ: Lawrence Erlbaum. Paradis, Michel 1994 Neurolinguistic aspects of implicit and explicit memory: Implications for bilingualism and SLA. In Implicit and Explicit Learning of Languages, Nick Ellis (ed.), 393-419. London: Academic Press. Pienemann, Manfred 1998 Language Processing and Second Language Development. Processability Theory. Amsterdam and Philadelphia: Benjamins. Pinker, Steven 1989 Learnability and Cognition: The Acquisition of Argument Structure. Cambridge, Mass.: London: MIT Press.
Making ends meet
183
Pulvermüller, Friedemann 2002 The Neuroscience of Language: On Brain Circuits of Words and Serial Order. Cambridge: Cambridge University Press. Rocha, L. M. 1996 Language theory: Consensual selection of dynamics. Cybernetics and Systems: An International Journal 27: 541-553 Rumelhart, D. E., and D. A. Norman 1978 Accretion, tuning, and restructuring: Three modes of learning. In Semantic Factors in Cognition, John W. Cotton, and R.L. Klatzky (eds.), 37-53. Hillsdale, New Jersey: Lawrence Erlbaum. Sharwood-Smith, Michael 1993 Input enhancement in instructed SLA: Theoretical bases. Studies in Second Language Acquisition 15: 165-179. Slobin, Dan 2001 Form-function relations: How do children find out what they are? In Language Acquisition and Conceptual Development, Melissa Bowerman, and Stephen C. Levinson (eds.), 406—449. Cambridge: Cambridge University Press. Tomasello, Michael 2003 Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge: Harvard University Press. Vygotsky, L.S. 1962 Thought and Language. Cambridge, Mass.: MIT Press.
Part 2. Select linguistic notions and memory
Chapter 7 Evaluation and cognition: Inscribing, evoking and provoking opinion Monika A. Bednarek
1. Introduction The media - whether newspapers, the radio, the internet or the TV - arguably influence to a great extent how we view and think of the world we live in. Consequently, "[t]o study media discourse ... is to work to make sense of a great deal of what makes up our world" (Cotter 2001: 431). This is one of the concerns of this paper whose purpose it is to analyze newspaper discourse (focusing on the "hard news" [Bell 1991: 14] item), and to provide new insights into the phenomenon variously known as evaluation, appraisal and stance and its connection to cognition. Evaluation (the expression of speaker/writer opinion), has only recently become the focus of linguistic analysis and this mainly within studies of EAP (English for Academic Purposes) or - under the name of appraisal - within SFL (Systemic Functional Linguistics). In contrast, the approach taken here draws on a wide range of linguistic studies on evaluation to establish its own framework of evaluative parameters, which is then applied to a close manual analysis of a small corpus of British 'tabloids' and 'broadsheets' that report on the Conservative Party conference in Blackpool (UK) on 10 October 2003. The paper also investigates the relationship between memory and evaluation, and the extent to which evaluative meaning is 'inherent' in lexical items and to which it depends on the readers' application of cognitive frames. It will be shown that the interplay between evaluation and cognition is in fact highly complex, and depends both on the context and on the reader's position. The structure of this paper is as follows: I shall first introduce the parameter-based framework of evaluation in section 2, before providing an account of evaluation and its connection to cognition in section 3, analyzing evaluation in the corpus in section 4, and making some final remarks in section 5.
188
Monika A. Bednarek
2. The parameter-based framework of evaluation 2.1. Introduction: What is evaluation? Generally speaking, there are at least three possible answers to the above question, since evaluation can be looked at from very different points of view. Consequently, we must make a basic distinction between three notions or definitions of evaluation: (a) the cognitive operation of evaluation, (b), the relatively stable evaluation attached to mental representations, and (c) evaluation as the linguistic expression of speaker/writer opinion. I shall come back to this later in more detail. However, it must be pointed out now that the parameter-based approach to evaluation focuses on evaluation as in (c), and takes as a springboard Thompson and Hunston's (2000) definition of evaluation as the broad cover term for the expression of the speaker's or writer's attitude or stance towards, viewpoint on, or feelings about the entities or propositions that he or she is talking about. That attitude may relate to certainty or obligation or desirability or any of a number of other sets of values (Thompson and Hunston 2000: 5).1 These "sets of values" are identified here as evaluative parameters (a term adopted from Francis 1995). Broadly speaking, I suggest that there are (at least) ten parameters along which speakers can evaluate aspects of the world. Each of the proposed parameters involves a different dimension along which the evaluation proceeds and includes what I call sub-values which either refer to the different poles on the respective evaluative scale or to different types of the parameter:2 PARAMETER
VALUES:
examples
COMPREHENSIBLE:
plain, clear
INCOMPREHENSIBLE: POSITIVE: 2.
mysterious, unclear
a polished speech
EMOTIVITY NEGATIVE:
a rant
Evaluation and cognition
189
C~ IMPORTANT: key, top, landmark 3.
4.
IMPORTANCE
SERIOUSNESS
I
UNIMPORTANT:
C
HUMOROUS:
minor, slightly
funny
serious EXPECTED: familiar, inevitably UNEXPECTED: astonishing, surprising CONTRAST: but, however .CONTRAST/COMPARISON: not, no, hardly, only (negations) SERIOUS:
5.
EXPECTEDNESS
s-
accept, doubt scared, angry EXPECTATION: expectations KNOWLEDGE: know, recognise STATE-OF-MIND: alert, tired, confused PROCESS: forget, ponder VOLITION/NON-VOLITION: deliberately, forced to
BELIEF/DISBELIEF: EMOTION:
6.
MENTAL STATE
(marginal evaluation)
[he said it was] "a lie " "well done " [he thought] PERCEPTION: seem, visibly, betray GENERAL KNOWLEDGE: (in)famously EVIDENCE: proof that .UNSPECIFIC: it emerged that, meaning that "HEARSAY: MINDSAY:
7.
8.
EVIDENTIALLY
NECESSARY: had
to
NOT NECESSARY:
need not
POSSIBILITY/NECESSITY POSSIBLE:
could inability, could not
NOT POSSIBLE:
real choreographed
"GENUINE: FAKE: 9.
RELIABILITY HIGH:
will, be to likely may
MEDIUM: LOW: 10. STYLE
frankly, briefly
190
Monika Α. Bednarek
Most of the proposed parameters (but not all) are scales involving two poles, but also potential intermediate stages between them (cf. also Lemke 1998; Malrieu 1999; Hunston 1993, 2000; Bublitz 2003). For instance, the parameter of EMOTIVITY concerns the evaluation of aspects as more or less positive or more or less negative. As a consequence, most evaluative meanings can be located on a cline of low to high force/intensity (see also White 2001a: 5). However, this notion of scaling "can be seen as an interpersonal coloration or tonality across the APPRAISAL [here: evaluation] system" (White 1998: 109, small caps in the original), and is thus not considered as a 'parameter' of evaluation in the framework adopted here. Moreover, there is no appropriate methodology available for identifying the exact position of an evaluator on an evaluative scale. This is why, in the empirical analysis, the evaluators are classified as belonging to one of the two poles on the scale (e.g., as POSITIVE or NEGATIVE) rather than categorizing them according to their evaluative intensity. Only with RELIABILITY was it possible to distinguish between three positions on the scale: LOW, MEDIAN and HIGH.3 In the following section (2.2) I shall briefly comment on the parameters (more detailed information on the framework is provided in Bednarek in press) before outlining the connection between cognition and evaluation (3).
2.2. A brief outline of the parameter-based framework As previously mentioned, there are at least ten parameters along which speakers/writers can express evaluations. I shall now discuss these in turn. Evaluations of COMPREHENSIBILITY have to do with the extent to which writers evaluate entities, situations or propositions as being within or outside the grasp of human understanding. Such evaluations are situated on a cline ranging from more or less COMPREHENSIBLE {clear, definite) to more or less INCOMPREHENSIBLE (unclear, vague, complex), because aspects of the world can be more or less understandable and complex, and we can understand them fully, partly or not at all. The parameter of EMOTIVITY is concerned with the writer's evaluation of aspects of events as good or bad, i.e., with the expression of writer approval or disapproval. Evaluations of EMOTIVITY are situated on a cline ranging from more or less POSITIVE {polished, stoutly) to more or less NEGATIVE {fanatic, perverse). Evaluations along the parameter of IMPORTANCE evaluate the world (and discourse about it) according to the speaker's subjective evaluation of its
Evaluation and cognition
191
status in terms of importance, relevance and significance. Evaluations of IMPORTANCE are situated on a scale ranging from IMPORTANT (,significant, importantly)
to UNIMPORTANT (unimportant, minor). The parameter of SERIOUSNESS is identical to Lemke's (1998) parameter
of humorousness/seriousness and has to do with the writer's evaluations of aspects of the world as situated on a cline of SERIOUSNESS, i.e., as more or less SERIOUS (serious) or HUMOROUS (hilarious). The parameter of EXPECTEDNESS involves the writer's evaluations of aspects of the world (including propositions) as more or less EXPECTED {usual, little wonder that) or UNEXPECTED (unexpected, surprising, astonishingly) (again, a cline is involved). I also regard CONTRAST (expressed e.g., by but, while, still, although, though) as well as CONTRAST/COMPARISON (expressed by negation) as sub-values of EXPECTEDNESS.
The parameter of MENTAL STATE is an instance of marginal evaluation (it is excluded for example by Biber and Finegan (1989) in their concept of stance). The parameter refers to the writer's evaluation of other social actors' mental states. Here the sub-values are associated with the different kinds of mental states actors can experience: emotions, wishes/intentions, beliefs, expectations, knowledge, etc (the examples are extracted from a larger corpus of newspaper discourse): MENTAL STATE: BELIEF MENTAL STATE: EMOTION MENTAL STATE: EXPECTATION MENTAL STATE: KNOWLEDGE MENTAL STATE: STATE OF MIND MENTAL STATE: PROCESS
MENTAL STATE: VOLITION MENTAL STATE: NON-VOLITION
the individual suspected by the Princess appalled chiefs The day began with high expectations half of all players knew other pros who took recreational drugs the weary PM For the conspiracy theorists who have spent six years pondering the significance of the missing white Fiat An asylum seeker who deliberately infected two women with the Aids virus So how did such an intelligent, cultivated and, in her youth, extremely attractive woman end up running the world's biggest international vice ring?
The difference between the parameter of EXPECTEDNESS and the parameter of MENTAL STATE: EXPECTEDNESS is hence the source of the evaluation: with the former it is the author's expectations that are referred to, with the latter it is the expectations of a social actor other than the author that are
192
Monika A. Bednarek
involved. With the parameter of MENTAL STATE what we are dealing with are the author's inferences about the mental states of third parties. EVIDENTIALITY concerns writers' evaluations of the 'evidence' for their knowledge. Evidential evaluators, or evidentials "put in perspective or evaluate the truth value of a sentence ... with respect to the source of the information contained in the sentence" (Rooryck 2001). Here, the sub-values relate to the different types of source on which the writer's knowledge is based (the examples below are invented): EVIDENTIALITY: HEARSAY EVIDENTIALITY: MINDSAY EVIDENTIALITY: GENERAL KNOWLEDGE EVIDENTIALITY: UNSPECIFIED EVIDENTIALITY: PERCEPTION EVIDENTIALITY: PROOF
He said they were right. He thought they were right. It's well-known they were right. It emerged that they were right. There are signs they were right. Evidently, they were right.
The parameter of POSSIBILITY/NECESSITY deals with what has traditionally been described as deontic or dynamic modality, i.e., with the writer's evaluation of what is (not) necessary or (not) possible, of what you should, do not need to, can and cannot do. The two notions (possibility and neces-
sity) are in fact closely connected and can be associated with just one parameter because they are logically related: 'It is not possible for you to leave' is logically equivalent to 'It is necessary for you not to leave/to stay' (on logical relations and modality see e.g., Lyons 1977: 787; Coates 1983: 19-20). Evaluations of RELIABILITY are connected to what is generally described as epistemic modality, i.e., to matters of reliability, certainty, confidence and likelihood. The parameter of RELIABILITY goes beyond this, however, to include both the writer's evaluation of the reliability of a proposition and his/her evaluation of the 'genuineness' of an entity/entities. There are five values subsumed under this parameter: FAKE, GENUINE, LOW, MEDIAN, HIGH. The first two (FAKE/GENUINE) refer to the evaluation of genuineness - writers evaluate states of affairs as either real (genuine, real) or artificial (fake, artificial). As with other parameters, this parameter can thus be regarded as having a 'positive' (GENUINE) and a 'negative' (FAKE) value. The remaining sub-values (LOW, MEDIAN, HIGH) refer to the evaluation of the likelihood of propositions being true and have been adopted from Halliday (1994):
Evaluation and cognition RELIABILITY: LOW RELIABILITY: MEDIAN RELIABILITY: HIGH
193
could likely to certainly
Finally, evaluations of STYLE concern the writer's evaluation of the language that is used, for instance comments on the manner in which the information is presented, or evaluations of the kind of language that is used (Biber et al. 1999: 975). In the newspaper corpus this parameter is important only in connection with reporting expressions (verbs, nouns, adjectives, adverbs) which can be classified according to the following sub-values (modified from Caldas-Coulthard 1994): (only referring to the act of saying): e.g., say, tell ILLOCUTIONARY (mentioning the speaker's purpose): e.g., demand, promise DECLARATIVE (dependent on a cultural-institutional setting): e.g., acquit, plead guilty DISCOURSE SIGNALLING (marking the relation to the discourse): e.g., add, conclude PARALINGUISTIC (commenting on prosodic/paralinguistic aspects of the utterance): whisper, scream NEUTRAL
These parameters can also be combined to greater and lesser extents. For instance, reporting expressions such as promise, threaten or accuse simultaneously indicate that the following proposition is based on HEARSAY and express the speaker's/writer's comment on the type of illocutionary act involved (STYLE).
3. Evaluation and cognition So far I have exclusively focused on evaluation defined as in (c) above, i.e., evaluation as the linguistic expression of speaker opinion. However, on account of the inherent complexity involved with the notion of evaluation, and in order to exemplify its connection to memory, I think it is clearly necessary to discuss the relation of the proposed parameter-based framework of evaluation to evaluation as a cognitive operation (a), and evaluation as the relatively stable evaluation attached to mental representations (b). I will attempt to offer some tentative remarks concerning the general relationship between evaluation and cognition (3.1, 3.2, 3.3), and relating to
194
Monika Α. Bednarek
the extent to which evaluative meaning is 'inherent' in lexical items (3.4). The latter will be demonstrated with examples from the corpus.
3.1. The parameter-based framework and evaluation as a cognitive operation Concerning the notion of evaluation as a cognitive operation, Talmy suggests that [a] psychological entity can perform the cognitive operation of evaluating a phenomenon for its standing with respect to some system of properties. A system of properties of this sort is typically understood as being scalar, running from a negative to a positive. Such systems of properties include veridicality, function, importance, value, aesthetic quality, and prototypicality. Thus a cognitive entity can assess some phenomenon at the positive pole of these scales as being true, purposeful, important, good, beautiful, and standard. (Talmy 2003: 476)
It will be noted that Talmy's systems of properties exhibit some common features with the above-established parameters of evaluation. However, Talmy's systems are specifically related to a cognitive framework for narrative structure, and, moreover, do not seem to be grounded in, or based on, language (as is my framework). In other words, Talmy assumes that "a person's assessments ... are due to the operation of a cognitive brain system whose function is to perform such assessment" (Talmy 2003: 478), whereas there is no a priori assumption in my approach that the parameters of evaluation that can be linguistically expressed necessarily reflect such operations. Though it may be assumed that some cognitive operation of evaluation must precede the linguistic expression of evaluation, no assumptions are made on the exact status and nature of this cognitive operation. In my view, more research is needed in order to determine the cognitive status of parameters (or systems) of evaluations before anything beyond hypothesis is achieved. In other words, I only claim that the English language allows us to express evaluations according to certain parameters whether or not the cognitive operation of evaluation proceeds along the same kinds of parameters is as yet unknown. Talmy's theory seems to suggest that there is some overlap (veridicality is related to RELIABILITY: GENUINE/FAKE, importance to IMPORTANCE, value and aesthetic quality to EMOTIVITY, and prototypicality perhaps to EXPECTEDNESS) though he also mentions an additional 'system' (function). In this respect, it must be
Evaluation and cognition
195
stressed that even the cognitive status of evaluation as such is unclear: Some authors [in social and cognitive psychology] argue that evaluative processes are affective, some argue that they are cognitive, and still others claim that they are both cognitive and affective (Malrieu 1999: 53). In any case, such evaluative processes seem to depend on the individual's memory of prior evaluations, and probably involve the activation of memorized representations of the world, and the perception of contrast or deviation from these memorized representations. This can be exemplified with respect to how a culture evaluates narratives according to Talmy's category of prototypicality: As he suggests, members of the culture at large will generally have certain norms, expectations, and forms of familiarity pertaining to [narrative] structure as a result of experiences with the historical tradition or with other exposure to narrative contexts. ... Authors ... that compose their works to deviate substantially from the current norms may be considered by contemporaries to be avant-garde and their works to be experimental. (Talmy 2003: 479--480, emphasis in the original).
3.2. The parameter-based framework and evaluation as memorized representation Let us now turn to (b), i.e., the evaluation that we attach relatively permanently to mental representations. This relates directly to the question of how we organize our knowledge of the world cognitively. This has been discussed in artificial intelligence research, cognitive psychology and linguistics with the help of frame theory (cf. Pishwa, Introduction). Frame theory suggests that our knowledge of the world is organized in terms of mental knowledge structures which capture the typical features of the world (for an overview of research on frames and related notions such as scripts, schemas, cognitive models etc., see Bednarek 2005). Frames are part of our semantic memory, and usually shared by members of the same linguistic community (they are more or less conventionalized) and can refer to both more or less factual knowledge (spiders usually have eight legs), and to scientifically wrong folk knowledge (spiders are insects). Concerning the structure of frames, they are often assumed to consist of categories and the specific interrelations (e.g., X has a Υ, X is on Υ, X is a part of Y) existing between them, the categories providing default assignments (by supplying prototypes) and associated expectations (e.g., Ungerer and Schmid 1996: 212-213).
196
Monika A. Bednarek
In terms of the relation between knowledge and evaluation it is of course possible to use the notion of frame to refer only to non-evaluative aspects of the entities, situations, events etc in the world. Hence, our frame knowledge of spiders would include the fact that many people dislike spiders while excluding our own opinion on the matter (i.e., whether we dislike them or not). However, we undoubtedly have certain opinions concerning objects, events and situations in the world and it seems reasonable to conclude that these are part of our mental representations. It would then be possible to assume that frames have one or potentially more 'slots' for such evaluations. Whereas the purely factual frame features would be intersubjectively shared across a large number of people (disregarding specialized, scientific knowledge), such evaluative frame features may be more individual or shared only within certain discourse communities (though some evaluations are perhaps also shared among many people). For instance, most linguists will share a frame for corpus linguistics, which includes (more or less detailed) knowledge about the methodologies and assumptions of corpus linguists. However, the different linguistic schools will have contrasting opinions about the usefulness and significance of the approach, and whether and when it is necessary and possible to use such an approach. These opinions themselves depend on previous cognitive operations of evaluation in specific, individual instances (see above), which are abstracted and become part of the mental representation of the individual for corpus linguistics. In the assumption that frames involve both factual and evaluative features I follow those cognitive scientists who assume that evaluative information resides in memory, and is stored together with other knowledge of aspects of the world (Malrieu 1999: 53). Concerning the relation of such memorized evaluative information to the parameter-based framework, the crucial question is what kinds of values (or what evaluative parameters) can fill the evaluative slot of frames. Presumably, this relates to the above-mentioned question of the cognitive status of parameters of evaluations, since it was assumed that the evaluative features of frames are the result of an abstraction or condensation of individual cognitive operations of evaluation. As previously mentioned, much more research is needed before any valid conclusions can be drawn.
Evaluation and cognition
197
3.3. Functions of evaluations To sum up the discussion above, the following figures demonstrate again the distinction between the three notions or definitions of evaluation, and our knowledge about the kinds of parameters that may be involved: Evaluation
(a)
Psychological entity (PE)
phenomenon/a
PE ,does' evaluation parameters of evaluation: how good, bad, important? (unclear how many and what kind of cognitive parameters) Evaluation (b) Psychological entity
mental representation of phenomenon/a
PE 'has' evaluation evaluative frame features: how good, bad, important? (unclear how many and what kind of evaluative frame features) Evaluation (c) Psychological entity
phenomenon/a
PE 'says' evaluation linguistic parameters of evaluation: COMPREHENSIBILITY, EMOTIVITY, EVIDENTIALITY, EXPECTEDNESS, IMPORTANCE, MENTAL STATE, POSSIBILITY/NECESSITY, RELIABILITY, SERIOUSNESS, STYLE
The difference between these three points of view can thus be discussed in terms of 'doing', 'having', and 'saying' evaluation. A last point to dis-
198
Monika A. Bednarek
cussed, then, is the extent to which linguistic expressions of evaluation (saying evaluation) are related to the cognitive operation of evaluation (doing evaluation) and its mental representation (having evaluation), i.e., the potential functions of linguistic expressions of evaluation. In my view, there are at least four possibilities. Firstly, such linguistic expressions may be the result of spontaneous, individual operations of evaluation. Secondly, they may be regarded as reflexes of the evaluative features of our mental representations (frames). Thirdly, they may simply indicate the existence of mental representations (frames) and associated expectations. Fourthly, they may be exploited purely for rhetorical-pragmatic purposes. Let me give two examples: (1)
The evaluative utterance "Corpus linguistics is greaf (involving the parameter of EMOTIVITY) may relate to an evaluation of corpus linguistics at a particular point in time (e.g., as a reaction of looking at concordance lines) or it may reflect the speaker's general positive mental representation of corpus linguistics. A third option is that the evaluation is uttered for rhetorical-pragmatic purposes (e.g., to flatter a lecturer or to be polite).
(2)
The evaluative utterance "It is surprising that Paul's research is not based on corpus evidence" (involving the parameter of EXPECTEDNESS) may relate to an evaluation of the fact that Paul's research is not based on corpus evidence as surprising at a particular point in time (as a reaction of reading Paul's essay), and can then be considered as an indicator of the existence of a particular mental representation (frame) concerning Paul's research (e.g., involving the assumption that his research is usually based on corpus evidence). Additionally, the evaluation may be used to evaluate Paul's research negatively (if the speaker assumes that corpus evidence is necessary for good research).
The example in (2) points to the special role of evaluators of expectedness. Such linguistic expressions explicitly make reference to the fact that something is unexpected or expected in terms of our (factual) frame knowledge of how things are in the world rather than making reference to an evaluative frame feature of expectedness. 4 They are the results of the default assignments and associated expectations concerning the categories that make up frame structure (see above). In other words, evaluations of
Evaluation and cognition
199
EXPECTEDNESS (e.g., contrastive coordinators and subordinators, negations,
adjectives and adverbs such as surprisingly), astonishingly)), are potential indicators for the existence of frames in speaker's minds (see also Tannen 1993). For instance, negative statements by American speakers watching the famous 'pear' film (Tannen 1993: 21) such as this road that's ... UH it's not paved, it's just sort of a dirt road are regarded by Tannen as "evidence that Americans expect roads to be paved" (Tannen 1993: 41), and of course, the relation between contrast and expectation has repeatedly been pointed out in research (e.g., Greenbaum 1969: 250, Quirk et al 1985: 935, Biber et al 1999: 1047). Other indicators of frames that have been identified by Tannen (1993), and that relate somehow to the parameter-based framework of evaluation are: - obviously, seem (EVIDENTIALITY) - kind of (RELIABILITY/STYLE, cf. Bednarek in press) - just, even (EXPECTEDNESS) - must, should, may, can (POSSIBILITY or RELIABILITY) - evaluative adjectives and adverbs such as important, beautiful, carelessly, luckily, funny, suddenly, strange, artificial (IMPORTANCE, -
EMOTIVITY, EXPECTEDNESS, RELIABILITY: GENUINE/FAKE) inferences (MENTAL STATE) moral j u d g e m e n t s (EMOTIVITY).
Tannen suggests that such linguistic expressions are created by expectations or frames (Tannen 1993: 53) (and hence work as indicators of frames), but she also notes that employing evaluative adjectives and adverbs results from and reflects an evaluative process (Tannen 1993: 48). In other words, such expressions are regarded as being both the result of a cognitive operation of evaluation, and as indicators of the existence of mental representations. This two-fold function is in fact explicable by the above-made assumption that evaluative processes depend on the individual's memory, and involve the activation of memorized representations of the world (frames). In other words, evaluations in discourse may be related to memory in (at least two) ways: Firstly, they may reflect an evaluative frame feature (corpus linguistics is great/important/difficult etc), and secondly, they may simply point to the existence of a particular frame (Paul's research is usually based on corpus evidence). In the second case, they are probably also the result of a cognitive operation of evaluation at a particular point in time. However, to assume that evaluative language simply reflects or relates to
200
Monika A. Bednarek
our frame knowledge in a variety of ways is to disregard the fact that evaluations may be used for purely rhetorical purposes, as has been shown in much linguistic research for instance for expressions of modality (used to express politeness, for boosting etc rather than to reflect the speaker's cognitive state of mind). It is a much too narrow view of evaluative language to regard it simply as a reflex of the speaker's cognitive operations, and as an indicator of aspects of his/her memory. This, indeed, is one of the basic tenets of research on the language of evaluation, which holds that such language is poly functional and interpersonal. Let me very briefly sum up this complex discussion: when we encounter aspects of the world we can perform a cognitive operation of evaluation on them (involving the activation of memorized representations); this evaluation may then become (relatively permanently) attached to our mental representations of these aspects; and linguistic expressions may (but need not) be used to reflect evaluative components of our mental representations as well as to indicate the existence of certain mental representations. Nevertheless, linguistic expressions of evaluation are also used for rhetorical and other purposes. In my view, the rhetorical-pragmatic functions of linguistic expressions of evaluation are at least as important, if, indeed, not more important.
3.4. Inscribing, evoking and provoking evaluation Let us now turn to a slightly different topic, namely the question of how evaluation can be expressed (we are now back to our original definition of evaluation as the expression of speaker/writer opinion). What are the options on the part of the speaker to express evaluation in discourse, and how does this relate to memory? The crucial question is to what extent evaluative meaning is 'inherent' in lexical items, and to what extent it depends on the readers' application of cognitive frames. In other words, what is the role of the reader's memorized representations (frames) in evaluation? As a starting point for the discussion we can take White's (2001a) distinction between "inscribed" (or explicit) and "evoked" (or implicit) appraisal (evaluation). Inscribed appraisal refers to evaluation that is "overtly 'inscribed' in the text through the vocabulary choice" (White 2001a: 6), i.e., with the help of explicitly evaluative adverbs such as justly, fairly, adjectives such as corrupt, dishonest, nouns such as a cheat and a liar, a hero, and verbs such as to cheat, to deceive (White 2001a: 6). With
Evaluation and cognition
201
evoked appraisal, the evaluation is triggered by "tokens" of appraisal, i.e., "superficially neutral, ideational meanings which nevertheless have the capacity in the culture to evoke judgmental responses (depending upon the reader's social/cultural/ideological reader position)" (White 2001a: 12). Evoked evaluations thus crucially rely on the reader's interpretation and, moreover, are very much context-dependent (White 2001a: 13). Examples of "tokens" of evoked evaluation are descriptions such as the government did not lay the foundations for long term growth or they filled the mansion with computers and cheap plastic furniture (White 2001a: 13) which can trigger negative evaluation in their given context. Such evaluation depends on shared socio-cultural norms and "rel[ies] upon conventionalised connections between actions and evaluations" (White 2001a: 13). In fact, as White (2004) has shown, there is a great extent of variability in emotive expressions, a variability which is crucially dependent on the context in which they occur, and which provides some evidence against assuming a strict dichotomy of explicitness and implicitness. He concludes: I am proposing, therefore, that rather than making a clear-cut distinction between explicit [inscribed] and implicit [evoked] evaluation, we work with a notion of degrees of attitudinal saturation. The more limited the semantic variability of the term the more saturated it is, the less limited the semantic variability, the less saturated. (White 2004: 2-3). If we talk about inscribed and evoked evaluation, then, this represents a simplification to a certain extent. Nevertheless, the distinction is theoretically valid and useful as a starting point for discussing the connection between evaluation and cognition. In cognitive terms, evoked evaluation often depends on the readers' application of cognitive frames to the discourse at hand. For instance, the description of Iain Duncan Smith's downward look in the corpus example Mr Duncan Smith spent most of the time staring at his own feet — because the autocue was bizarrely at floor level (Mirror) seems to clearly 'evoke' negative EMOTIVITY with the help of a cognitive frame: the wider context can be regarded as opening up a 'conference frame' which includes conventional knowledge that staring at your feet is definitely not a good thing for public speaking, especially at a party conference where you are supposed to assert your authority. Additionally, the causal conjunction because gives us the reason for this behavior (the fact that the autocue was at floor level), which, in turn, causes this fact to be evaluated as negative as well. This negative evaluation seems then to be intensified by the writer's evaluation of it as UNEXPECTED (bizarrely, which probably carries negative
202
Monika A. Bednarek
connotations in itself), as a deviation from the norm (perhaps aiming to prompt the reader's questioning 'Why in God's name did they do this? Can't the Tories even get this right?'). The utterance therefore not only evaluates Iain Duncan Smith as negative, but the whole organization of the conference, i.e., the Tory Party as a whole. Some other examples in the corpus are: (1)
The Tory leader's more polished performance delighted the party faithful inside the Empress Ballroom, earning him a climactic 12minute ovation. (Guardian)
(2)
The activists interrupted his speech with no fewer than 20 standing ovations - plus a final salute lasting fully eight minutes (Mail)
(3)
The panic move fuelled fresh murmurs over the leadership and overshadowed a performance which won mixed reviews in the conference hall. (Sun)
(4)
He defended his support for the Iraq war, while respecting the opinion of those who opposed it. (Independent)
In examples (1) and (2) the reader again applies his/her conference frame to the discourse, leading him/her to evaluate Iain Duncan Smith's performance as successful, whereas in example (3) the same frame leads to a less positive evaluation of Iain Duncan Smith. In example (4) it is a frame about 'moral' values whose application can lead to positive evaluation ('in an argument it is good to respect your adversary's opinion'). On the other hand, EMOTIVITY can be expressed by "very clearly evaluative" (Thompson and Hunston 2000: 14) lexical items such as fail: (5)
(6)
Mr Duncan Smith received 18 standing ovations in a speech marked by a ferocious attack on the government and the Liberal Democrats, but failed to see off the threat of a leadership challenge this autumn. (Financial Times) Tory leader's tirade against Blair fails to stave off revolt (Independent)
where Iain Duncan Smith is explicitly evaluated as a failure. Some other examples in the corpus are:
Evaluation and cognition (7)
(8)
(9)
203
It was meant to convince his party he's tough. Instead it evoked images of another wannabe urged to make the clenched hand his trademark — tennis ace Tim Henman (insert). (Sun) What was meant to be a roar turned into a bore as he delivered the longest speech in modern political history. (Mirror) Delegates were forced to rise to their feet 19 times to take part in "spontaneous " standing ovations orchestrated by a small group of fanatics. (Mirror)
In these examples quite explicitly negative expressions evaluate Iain Duncan Smith as a wannabe, his speech as a bore and the ovations as arranged by fanatics (note also the negative EMOTIVITY offorced to and the hedged spontaneous). In-between inscribed and evoked evaluation we can find evaluators such as admit, which carry implicit evaluative assumptions. Admit shows that a statement was produced reluctantly (Clayman 1990: 87), carries the implied assumption that some negative act has been committed (Hardt-Mautner 1995: 13) or suggests that the content of the reported proposition is negative. It seems reasonable that its synonyms, acknowledge, concede, and confess, have similar evaluative meanings. (These evaluative assumptions might perhaps be argued to be part of a cognitive frame that we attach to the speech act of admitting.) These verbs are also all part of Thompson's (1994) group of reporting verbs which imply the writer's belief in the truth of the attributed proposition (Thompson 1994: 50). Consequently, such attributing expressions can be regarded as expressing a combination of four parameters of evaluation (when used for attributing propositions to sources other than the speaker): -
they name ILLOCUTIONARY (STYLE: I L L O C U T I o n a r y ) acts and evaluate a proposition as based on HEARSAY (EVIDENTIALITY: HEARSAY) they express the writer's negative evaluation of the "Sayer" (Halliday 1 9 9 4 : 1 4 0 ) (EMOTIVITY: NEGATIVE)
-
they express the writer's belief that what the Sayer says is true (RELIABILITY: HIGH).
In my corpus six out of ten newspapers {Express, Star, Sun, Financial Times, Times, Telegraph) employ admit/acknowledge. Apart from the Daily Telegraph, it is always members of the Tory Party (usually those loyal to IDS) who admit something that is negative to Iain Duncan Smith.
204
Monika A. Bednarek
In the Daily Telegraph, the situation is reversed: although it is still members of the Tory Party who concede something, the Sayers are critics of IDS and what is admitted is positive to Iain Duncan Smith: (10) And even some members of the shadow cabinet admitted yesterday that it remained to be seen whether their leader had done enough to stave off a challenge. (Express) (11) Even shadow cabinet members acknowledged that this weekend could determine his fate. (Financial Times) (12)
But IDS's aides admitted that some influential Tories still wanted him removed. (Star)
(13) One senior loyalist admitted IDS had no more than a "50-50" chance of survival. (Sun) (14)
But the speech, though rapturously received by the hardcore Tories in the seaside resort, failed to settle the question marks over his future — as one of his Shadow Cabinet members, Tim Yeo, swiftly and ominously acknowledged within minutes of it ending. (Times)
(15)
His critics acknowledged that he had gained a reprieve but said he still had to demonstrate that he could build on the momentum of the conference speech to quell the doubts in the party and the country about his leadership. (Telegraph)
Because of the evaluation of HIGH RELIABILITY that is expressed by the evaluators, what is said to be positive or negative to news actors additionally gains 'factual' status, and contributes to the evaluation of Iain Duncan Smith's performance. The use of these evaluators can thus be regarded as correlating to newspaper stance to some extent: the pro-Tory Daily Telegraph differs crucially in this respect from all the other 'anti-Iain Duncan Smith' newspapers. In the above examples we have seen how EMOTIVITY can be evoked by more or less factual descriptions of social actors' behavior (though there is probably a cline between description and evaluation). However, evaluations of EMOTIVITY can also be triggered by evaluations along other parameters. In some of the above examples we can find evaluations of MENTAL STATE {delighted, respecting the opinion of), whereas in the following examples
Evaluation and cognition evaluations of RELIABILITY, EXPECTEDNESS, and both positive and negative EMOTIVITY:
MENTAL STATE
205
trigger
(16) Some delegates shook their heads and refused to rise to the orchestrated [RELIABILITY: FAKE; NEGATIVE EMOTIVITY] applause. (Sun) (17) And while it was a carefully-choreographed show of support, there was no doubt that the warmth and enthusiasm for Mr Duncan Smith in the packed Blackpool conference hall was genuine. [RELIABILITY: GENUINE; POSITIVE EMOTIVITY] (Mail) (18)
Mr Duncan Smith delivered an unprecedented [EXPECTEDNESS: UNEXPECTED; NEGATIVE EMOTIVITY] personal attack on a serving prime minister. (Telegraph)
(19) Abandoning his "quiet man" image of a year ago, he unleashed an unusually [EXPECTEDNESS: UNEXPECTED; NEGATIVE EMOTIVITY] strong tirade against the Prime Minister. (Independent) (20)
There
was
nothing
[EXPECTEDNESS:
CONTRAST/COMPARISON;
about the economy - still Labour's strongest card - but he hit the hot buttons on asylum, Europe, where he will campaign harder for a constitutional referendum, and the threat of still higher taxes, to warm applause. (Guardian)
NEGATIVE EMOTIVITY]
(21)
But he shocked [MENTAL STATE: EXPECTATION; NEGATIVE EMOTIVITY] observers with a savage attack on Mr Blair over the suicide of MoD Iraq weapons expert Dr David Kelly. (Sun)
(22)
Mr Duncan Smith surprised [MENTAL STATE: EXPECTATION; NEGATIVE EMOTIVITY] some MPs with his personal attacks. (Financial Times)
In examples (16) and (17), evaluations of the reactions to Iain Duncan Smith's conference speech as GENUINE or FAKE can - again with the help of the conference frame - give rise to evaluative overtones. Through evaluations of Iain Duncan Smith's attacks as UNEXPECTED in examples (18) and (19), he is simultaneously evaluated as deviating from the 'normal' frame of how political debates are supposed to be held. This increases the negative EMOTIVITY that may be expressed by the lexical items which
206
Monika A. Bednarek
these evaluations modify {personal attack, strong tirade). In example (20), contrasting the facts with an alternative possibility that could have been expected in the context of a conference speech (economy can presumably be regarded as a matter of high interest for the British public and as central to a party platform), a negative evaluation of the Tory Party can be triggered, without the writer's actually using any explicit evaluations of negative EMOTIVITY. In examples (21) and (22), finally, evaluations of MENTAL STATE (where the mental state of surprise is attributed to social actors other than the writer) seem to work similarly to examples (18) and (19), again implying a deviation on the part of Iain Duncan Smith from the norm of political debate. At the same time, however, the evaluation that is present in these examples also depends on the lexical items that are involved: for instance, with shock the evoked evaluation appears more negative than with surprise. For the above cases, where evaluation along one parameter is triggered by evaluations along other parameters (rather than by purely factual descriptions), it would be possible to revive White's notion of 'provoked' evaluation. This concept is used by White (1998: 105-106) to refer to cases where JUDGEMENT (positive/negative moral evaluation) is triggered (provoked) by other APPRAISAL values (usually AFFECT: reference to emotions) in contrast to being evoked by experiential tokens. In connection with the parameter-based framework the difference between evoked and provoked evaluation becomes the difference between evaluation that is triggered by factual descriptions (evoked evaluation) and evaluation that is triggered by evaluations along other parameters (provoked evaluation). Lemke calls this "prosodic overlap" (Lemke 1998: 48). In most of the examples mentioned above we are therefore actually dealing with provoked evaluation rather than evoked evaluation. The distinction between inscribed, evoked and provoked evaluation is in fact very complex in that provoked evaluation may be expressed via an inscribed evaluation. Thus, the example And while it was a carefully-choreographed show of support, there was no doubt that the warmth and enthusiasm for Mr Duncan Smith in the packed Blackpool conference hall was genuine (Mail) involves an inscribed evaluation of RELIABILITY {genuine) which provokes positive EMOTIVITY. In other words, there are two issues involved: firstly, is the evaluation more or less inscribed ("IDS is incompetent") or evoked ("IDS spent most of his time staring at his feet"). Secondly, is the evaluation along a certain parameter expressed (inscribed or evoked) by an evaluation of the same parameter or is it provoked by an (inscribed or evoked) evaluation of a dif-
Evaluation and cognition
207
ferent parameter? The following figures exemplify the difference between inscribed, evoked and provoked evaluation in connection with the paramet e r Of EMOTIVITY:
When talking about the difference between inscribed and provoked evaluation in the following, this hence represents another simplification, and obscures the question of how the respective provoked evaluation is expressed. inscribed degree of attitudinal
saturation
evoked Factual description
evokes
Explicit evaluation of EMOTIVITY
inscribes
Evaluation of a parameter other than EMOTIVITY via linguistic devices of different degrees of attitudinal saturation (inscribed-evoked)
^
> provokes
EMOTIVITY
One as "the can be ters, it
might be tempted to assume that it is only EMOTIVITY - identified most basic parameter" by Thompson and Hunston (2000: 25) - that provoked. However, this is not the case: other evaluative parameseems, can also be triggered. For instance, evaluations of RELIABILITY can be provoked (rather than inscribed via modal expressions such as put in doubt, certainly, possibly or evoked by factual descriptions) by evaluations of EXPECTEDNESS: UNEXPECTED with even. These can be used to attach more RELIABILITY or give more credence to a reported proposition: (23) But a YouGov poll of grassroots Tory members, published yesterday, put that in doubt by revealing 53 per cent thought they had made a mistake in electing Mr Duncan Smith in the first place. And even some members of the shadow cabinet admitted yesterday that it remained to be seen whether their leader had done enough to stave off a challenge. (Express) (24) Mr Duncan Smith received 18 standing ovations in a speech marked by a ferocious attack on the government and the Liberal Democrats,
208
Monika A. Bednarek but failed to see off the threat of a leadership challenge this autumn. Even shadow cabinet members acknowledged that this weekend could determine his fate. (Financial Times)
(25)
Though some smart Tories watching on TV thought the performance too "mannered" to appeal to the wider audience at home, the snap verdict in Blackpool, even among sceptics, was that it was far from the feared disaster and at least good enough to "get him through the next week". (Guardian)
This effect has to do with the fact that, in order to evaluate the reliability of an utterance, a critical listener will apply his/her world knowledge and ask what the quoted speaker's interests are and how these may distort his/her statements (Du Bois 1986: 323). By pointing out that even those speakers who would not normally be expected to utter the reported proposition because of conflicting interests (shadow cabinet members, sceptics) did in fact do so, higher reliability may be attached by readers to this attributed proposition. Similarly, evaluations of MENTAL STATE might be argued to be provoked by evaluations of EVIDENTIALITY/STYLE as in example (26) where the delegate's mental state can be inferred by the paralinguistic reporting verb used: (26) But some visibly flinched as he stooped to gutter politics with vicious personal attacks on political opponents. He said that after the death of weapons expert Dr David Kelly "Tony Blair said he'd had nothing to do with his public naming. That was a lie. He chaired the meetings that made the fatal decisions. He is responsible. He should do the decent thing and resign." One delegate muttered: "Like you." (Mirror) As becomes evident, the connection between evaluation and cognition as well as the distinction between inscribed, provoked and evoked evaluation, is highly complex and far from straightforward. It seems that in many cases of evoked evaluation the discourse prompts the reader to apply cognitive frames to the text, which, in turn, can give rise to evaluations. The extent to which this process (the application of frames) works subconsciously remains to be researched. However, it has been suggested that in as far as evaluations are expressed in very subtle, indirect ways, it is naturally much more difficult for readers to recognize and challenge them (Thompson and
Evaluation and cognition
209
Hunston 2000: 8). Thus, whereas inscribed evaluations are more or less explicit and recognizable (and can hence theoretically be challenged by the reader), this becomes more and more difficult as we move along the cline from inscribed to evoked evaluation.
4. Empirical analysis 4.1. Evaluation in the corpus In discussing inscribed, evoked and provoked evaluation I have used examples from the newspaper corpus mentioned in the introduction. This mini-corpus was also the basis for an analysis of evaluation in newspaper reportage on Iain Duncan Smith's (IDS - the leader of the Tory Party at that time) speech at the Conservative Party conference in Blackpool in 2003 using the parameter-based approach to evaluation introduced in section 2. This mini-corpus consists of ten (hard) news stories from the ten national newspapers in Britain: The Guardian (GUAR), The Independent (INDY), The Times (TIM), The Daily Telegraph (TEL), The Sun (SUN), The Star (STAR), The Daily Mail (MAIL), The Mirror (MIRR) and The Express (EXP). The word count is as indicated in Table 1: Table 1. The corpus Broadsheets FT GUAR 584 983 4.570 7.674
INDY TIM 1.204 1.017
TEL 782
Tabloids SUN STAR MAIL MIRR EXP 638 291 1.011 475 689 3.104
A general comparison of evaluations along all (combinations of) parameters in this corpus showed that evaluations are more frequent in the tabloids than in the broadsheets, as was to be expected: Table 2. Evaluations in the corpus Broadsheets Evaluations 407
word count 4.570
per 1000 words 89.1
Tabloids Evaluations 308
word count 3.104
per 1000 words 99.2
210
Monika A. Bednarek
The analysis of evaluation in this small corpus provides illustrative rather than representative findings (Bednarek in press, however, analyses evaluation in British tabloid and broadsheet publications in a larger corpus, confirming the fact that evaluations are slightly more frequent in tabloid publications). 5
4.2. Evaluative prosody: evaluation and context The examples given in section 3.2 clearly demonstrated the importance of the wider context for the analysis of evaluation. In the following I shall thus provide an illustrative analysis of a longer section of two texts, one from the tabloids and the other from the broadsheets, in order to show the interplay of evaluation in text as well as to exemplify again the distinction between different types of evaluation. In analogy to Bublitz's (2003) concept of emotive prosody (related to evaluation in terms of the good-bad parameter) we can speak of the evaluative prosody of each text. With both of these stories, most readers will intuitively recognize that they are highly evaluative, but only a linguistic framework allows us to say explicitly why this is so and how it comes about. Here is the analysis of the beginning of the news story in the Guardian (the parameter-based analysis is provided in square brackets): 1.
2.
Party
critics
told to [EVIDENTIALITY: HEARSAY/STYLE: ILLOCUTIONARY] put up or shut up after Duncan Smith wins [EMOTIVITY: POSITIVE] time with aggressive [MENTAL STATE: STATEOF-MIND] speech No [EXPECTEDNESS: CONTRAST/COMPARISON] more Mr Quiet Man [EMOTIVITY:
NEGATIVE]
Michael White, political editor Senior [IMPORTANCE: IMPORTANT] Conservatives last night launched a ferocious [MENTAL STATE: STATE-OF-MIND] counteroffensive against Iain Duncan Smith's party critics after unanimously proclaiming [EVIDENTIALITY: HEARSAY/STYLE: ILLOCUTIONARY] their leader's Blackpool conference speech to be the decisive triumph they had demanded. 4. The Tory chief whip, David Maclean, took the initiative against dissidents whose threats to [EVIDENTI ALITY: HEARSAY/STYLE:
3.
Evaluation and cognition ILLOCUTIONARY]
trigger a leadership crisis have dominated
211 the
conference week. 5. The former treasury minister, John Maples, and four other suspects are to [RELIABILITY: HIGH] be summoned to Mr Maclean's office for a "career development interview " [EVIDENTIALITY: HEARSAY] and told to shut up, ship out to their City jobs or put up a candidate to test Mr Duncan Smith's true level of support against their own. 6. But [EXPECTEDNESS: CONTRAST] such rallying talk will [RELIABILITY: HIGH] not [EXPECTEDNESS: CONTRAST/COMPARISON] disguise the fact that [RELIABILITY: HIGH] Mr Duncan Smith's speech in which he told [EVIDENTIALITY: HEARSAY/STYLE: NEUTRAL] plotters [EMOTIVITY: NEGATIVE] in the hall that the choice is him or Tony Blair — "there is no third way" [EVIDENTIALITY: HEARSAY] - has won [EMOTIVITY: POSITIVE] him only [EXPECTEDNESS: CONTRAST/COMPARISON] enough time to regroup and see if the flatlining opinion polls improve. I. The Tory leader's more polished [EMOTIVITY: POSITIVE] performance delighted [MENTAL STATE: EMOTION] the party faithful inside the Empress Ballroom, earning him a climactic [IMPORTANCE: IMP ORT ANT] 12-minute ovation. 8. His attacks [EVIDENTIALITY: HEARSAY/STYLE: ILLOCUTIONARY] on Labour's high taxation, bureaucracy and policy on Europe were also rewarded with 17 standing ovations as the speech was delivered. 9. Most pleasing [MENTAL STATE: EMOTION] to delegates was the harsh [EMOTIVITY: NEGATIVE] language directed personally [EMOTIVITY: NEGATIVE] against Tony Blair and his fantasy "Blair World" [EVIDENTIALITY: HEARSAY].
10. Deploying 11 pejorative adjectives against "the most corrupt, dishonest and incompetent government of modern times", [EVIDENTIALITY: HEARSAY] Mr Duncan Smith accused [EVIDENTIALITY: HEARSAY/STYLE: ILLOCUTIONARY] the prime minister of "a lie " [EVIDENTIALITY: HEARSAY] over his direct responsibility for the "outing" [EVIDENTIALITY: HEARSAY] of the weapons inspector, David Kelly. I I . In a judgment the Hutton inquiry is unlikely to [RELIABILITY: LOW] endorse, Mr Duncan Smith urged [EVIDENTIALITY: HEARSAY/STYLE: ILLOCUTIONARY] Mr Blair to resign. 12. "He won't of course, he won't do the decent thing, he never does, " [EVIDENTIALITY: HEARSAY] he added [EVIDENTIALITY: HEARSAY/STYLE: DISCOURSE SIGNALLING].
a reference to last year's much-mocked self-description, the Tory leader also
13./«
[EMOTIVITY: NEGATIVE]
told
[EVIDENTIALITY:
212
Monika A. Bednarek
his own party critics: "The quiet man is here to stay and he's turning up the volume" [EVIDENTIALITY: HEARSAY] - though [EXPECTEDNESS: CONTRAST] at times he spoke in a near-whisper. (Guardian) HEARSAY/STYLE: NEUTRAL]
Not surprisingly, this extract is dominated by evaluations of EVIDENTIALLY: HEARSAY and EVIDENTIALITY/STYLE, reflecting simply its status as a news text which is based on news actors' utterances. Next frequent are evaluations of EMOTIVITY. Although evaluations of NEGATIVE EMOTIVITY are more frequent, we can also find some evaluations of POSITIVE EMOTIVITY. Here the limits of the analysis of individual evaluators become clear: although won (6) can be classified as POSITIVE EMOTIVITY, its positive potential is very much limited by the following evaluation of EXPECTEDNESS: CONTRAST/COMPARISON {has won him only enough time to regroup and see if the flatlining opinion polls improve). Such evaluations of EXPECTEDNESS can trigger negative evaluation as we have seen, and are also quite frequent in the above text. Similarly, the comparison in line (7) implies that most other "performances" of Iain Duncan Smith were in fact not "polished". Apart from the explicit evaluations of NEGATIVE EMOTIVITY which are mostly directed against Iain Duncan Smith (Mr Quiet Man, harsh language directed personally against Tony Blair, muchmocked), other utterances with evaluations (mostly involving EXPECTEDNESS) can be said to provoke such EMOTIVITY: (6) but such rallying talk will not disguise the fact that, (11) in a judgement the Hutton inquiry is unlikely to endorse, and (13) the Tory leader also told his own party critics: "The quiet man is here to stay and he's turning up the volume " - though at time he spoke in a near-whisper). In such cases it is the whole utterance, rather than the individual evaluators that carries the evaluation. The extract also shows how evaluation can work retrospectively: the MENTAL STATE evaluations that are attributed to the Tory delegates in (7) and (9) work to evaluate them negatively, because they are seen as responding in a positive way (they are said to be delighted and pleased and to applaud strongly) to something that is evaluated as negative by the newspaper (the harsh language directed personally against Tony Blair). Additional evaluations concern IMPORTANCE, MENTAL STATE and RELIABILITY, but none of them are as frequent as are EVIDENTIALITY and EMOTIVITY. On the whole the text hence exhibits a negative stance towards Iain Duncan Smith and the Tory Party (especially in the last paragraph of the extract), without, however, accumulating solely inscribed evaluations of NEGATIVE EMOTIVITY. Apart from some provoked negative evaluations there are many evaluations of
Evaluation and cognition EVIDENTIALITY,
and
some
of
(albeit
weak)
POSITIVE
213
EMOTIVITY,
IMPORTANCE and MENTAL STATE.
Let us now look at the tabloid text (Mirror) reporting the same story: 1. GRRRRR\ 2. IDS GETS TOUGH. 3 . Nobody [EXPECTEDNESS:
CONTRAST/COMPARISON]
scared
[MENTAL
STATE: EMOTION]
By James Hardy, Political Editor 4. He gritted
his teeth and tried his best to sound tough, but [EXPECTEDNESS: CONTRAST] the hardman image didn't quite work [EMOTIVITY: NEGATIVE]/or Iain Duncan Smith yesterday. 5. What was meant to be [EVIDENTIALITY: MINDS AY/MENTAL STATE: VOLITION] a roar turned into a bore [EMOTIVITY: NEGATIVE] as he delivered the longest speech in modern political history. 6. The Tory leader droned on [EMOTIVITY: NEGATIVE] for an hour and two minutes as the Tory conference in Blackpool limped [EMOTIVITY: NEGATIVE] to a painful [EMOTIVITY: NEGATIVE] finale 7. Delegates were forced to [EMOTIVITY: NEGATIVE] rise to their feet 19 times to take part in "spontaneous" [RELIABILITY/STYLE/EVIDENTIALITY: HEARSAY HEDGE] standing ovations orchestrated [RELIABILITY: FAKE/EMOTIVITY: NEGATIVE] by a small group of fanatics [EMOTIVITY: NEGATIVE]. 8. Desperate [MENTAL STATE: EMOTION] party chiefs instructed [EVIDENTIALITY: HEARSAY/STYLE: ILLOCUTIONARY] 50 constituency party chairmen to keep the applause going. 9. Mr Duncan Smith spent most of the time staring at his own feet because the autocue was bizarrely [EXPECTEDNESS: UNEXPECTED] at floor level. 10 .He claimed [EVIDENTIALITY: HEARSAY/STYLE: ILLOCUTIONARY/ RELIABILITY: LOW] he would see off critics and lead the party back into power. 11 "The quiet man is here to stay and he's turning up the volume," [EVIDENTIALITY: HEARSAY] he declared [EVIDENTIALITY: HEARSAY/ STYLE: ILLOCUTIONARY] in his most menacing [EMOTIVITY: NEGATIVE] croak [EMOTIVITY: NEGATIVE], 12.He added [EVIDENTIALITY: HEARSAY/STYLE: DISCOURSE SIGNALLING]: "We must destroy this double dealing, deceitful, incompetent, shallow,
214
Monika A. Bednarek
inefficient, ineffective, corrupt, mendacious, fraudulent, shameful, lying Government once andfor all." [EVIDENTIALITY: HEARSAY] 13.And he urged [EVIDENTIALITY: HEARSAY/STYLE: ILLOCUTIONARY] his critics: "Don't work for Tony Blair, get on board or get out of our way for we have got work to do." [EVIDENTIALITY: HEARSAY] 14.Delegates cheered and clapped as he railed [EVIDENTIALITY: HEARSAY/STYLE: ILLOCUTIONARY/ EMOTIVITY: NEGATIVE] against Europe, asylum seekers, taxes, the NHS and the school system. 15 .But [EXPECTEDNESS: CONTRAST] some visibly [EVIDENTIALITY: PERCEPTION/RELIABILITY: HIGH] flinched [MENTAL STATE: EMOTION] as he stooped to [EMOTIVITY: NEGATIVE] gutter [EMOTIVITY: NEGATIVE] politics with vicious [EMOTIVITY: NEGATIVE] personal attacks [EMOTIVITY: NEGATIVE] on political opponents. 16.He said [EVIDENTIALITY: HEARSAY/STYLE: NEUTRAL] that after the death of weapons expert Dr David Kelly "Tony Blair said he'd had nothing to do with his public naming. That was a lie. "He chaired the meetings that made the fatal decisions. He is responsible. He should do the decent thing and resign." [EVIDENTIALITY: HEARSAY]
17.One
delegate
LINGUISTIC]:
muttered "Likeyou"
[EVIDENTIALITY:
HEARSAY/STYLE:
[EVIDENTIALITY:HEARSAY]
PARA-
(Mirror)
In contrast to the broadsheet text, in the tabloid text evaluations of EMOTIVITY: NEGATIVE are clearly of the greatest significance; at times, up to three or four such evaluations occur in the same sentence (6, 7, 1 5 ) whereas in the broadsheets only up to two such evaluations co-occur - and there are no evaluations of EMOTIVITY: POSITIVE. There are also evaluations that involve the parameter of NEGATIVE EMOTIVITY in addition to the parameter of RELIABILITY: FAKE (7), and in addition to the evaluative combination of EVIDENTIALITY: HEARSAY/STYLE: ILLOCUTIONARY (14). Moreover, many of the other evaluations have the potential to provoke negative evaluation in the context of the negative prosody present in the whole text. Examples are the expression of the contrast between what the speech was meant to be and what it turned out to be (5), the contrast between what Iain Duncan Smith tried to achieve, and the fact that it 'didn't work' (4), the contrast between Iain Duncan Smith "getting tough" and "nobody being scared" (2, 3), the hedge (7), the MENTAL STATE evaluations in (8) and (15), the evaluation of EXPECTEDNESS in (9) and the evaluation of EVIDENTIALITY: HEARSAY/STYLE: PARALINGUISTIC in (17). Furthermore, an evaluation of EVIDENTIALITY/STYLE/RELIABILITY evaluates Iain Duncan
Evaluation and cognition
215
Smith's utterance in (10) as potentially unreliable. Thus, most evaluations in this text either inscribe or provoke NEGATIVE EMOTIVITY. Only some additional parameters are not connected to EMOTIVITY but rather concerned w i t h ΕVIDENTIALITY: HEARSAY or EVIDENTIALITY/STYLE, again reflecting
the text type. All in all, however, this text exhibits a very clear and explicit negative prosody that extends like a wave over the text,6 showing the tendency of evaluation to accumulate, to cluster, or to "propagate or ramify through a text" (Lemke 1998: 49). Although it can thus be demonstrated that both newspapers express a negative stance towards Iain Duncan Smith (and clearly do not aim at 'objective' reporting), their means of achieving this stance are different: the negative evaluations are much more explicit and/or more frequent in the tabloid text than in the broadsheet text. In how far this consistent negative evaluation of Iain Duncan Smith contributed to his 'downfall' as leader of the Conservative Party remains open to debate: how newspaper bias affects its readership is "the source of the biggest debate surrounding media audiences because so little has really been discovered about the way that audiences receive and make sense of media texts" (Bell et al. 1999: 17). The analysis of the texts also demonstrates the important evaluative potential of contrasts and the significance of the notion of provoked evaluation, as well as suggesting that the clustering of evaluation is more frequent in the popular press than in the quality press. It furthermore points to the pressing need to analyze the systematics of context influence on evaluation in more detail (along the lines of Lemke [1998] and Jordan [2000].) We still seem to know only little about the actual workings of context influence on meaning. Evaluation is just one example where this influence becomes very obvious e.g., when lexical items with a more or less 'neutral' dictionary meaning become evaluative in their context. Here lexical items can become a platform of negotiation and debate. All in all, the complex interplay of evaluation and context (cf. also Lemke 1998) shows that manual text analysis is an indispensable methodological tool when analyzing evaluation in discourse.
5. Conclusion In this paper I have presented an approach to evaluation that aims to provide a synthesis of and an alternative to existing approaches to evaluation: the parameter-based framework of evaluation. I have tried to show that evaluation is a complex phenomenon that can be defined and viewed in at
216
Monika Α. Bednarek
least three different ways, and whose relation to cognition varies accordingly. The framework was also applied to a mini-corpus of newspaper reportage, showing (1) the difference between inscribed, evoked and provoked evaluation, (2) the complexity of the interplay between evaluation and cognition, and (3) the context-dependence of evaluation. Specifically, it was demonstrated that positive/negative evaluation (EMOTIVITY) can be evoked both by more or less factual descriptions and can be provoked by evaluations along other parameters such as EXPECTEDNESS, RELIABILITY and MENTAL STATE, and that evoked evaluation often depends on the reader's application of cognitive frames to the discourse. It was furthermore suggested that other parameters of evaluation can also be provoked in the same way as EMOTIVITY, and that evoked evaluation is more difficult for readers to challenge than inscribed evaluation. Finally, it was proposed that the difference between tabloid and broadsheet texts lies not so much in the stance they express as in the explicitness of the evaluation involved. At the same time the findings of this paper clearly remain illustrative rather than representative and many of the issues involving evaluation remain unsolved. Where evaluation is concerned nothing is settled yet: the ground is still shifting beneath our feet, and as yet it remains "relatively little explored" (Lemke 1998: 53) within linguistics.
6. Notes *
This paper is partly based on research undertaken at the University of Birmingham, where I was a visiting researcher from September 2003 to May 2004 with the support of the DAAD (German Academic Exchange Service). I wish to express my deep thanks to both the Department of English at the University of Birmingham (specifically Professor Susan Hunston) and the DAAD. I am also very grateful to Alexanne Don and Dr. Peter White for discussing the specifics of appraisal with me again and again, and to Prof. Wolfram Bublitz and Dr. Hanna Pishwa for their helpful comments. Additionally, I would like to thank Tony Bastow very much for his revision of an earlier version of this text and Collins and the University of Birmingham for permission to use the Bank of English. 1. Attitude is here used in a pre-theoretical sense, and not in its technical sense as in psychology. 2. The parameter-based framework ultimately derives from previous research that distinguishes between different 'axes', 'systems', 'domains', 'categories', 'dimensions', 'kinds', and 'parameters' of evaluation, for instance appraisal theory (e.g., White 1998), work on stance (e.g., Biber and Finegan 1988; Conrad and Biber 2000) and research on evaluation (e.g., Hunston 1994;
Evaluation and cognition
3.
4.
5.
6.
217
Francis 1995, Lemke 1998; Thompson and Hunston 2000). Cf. Bednarek in press for a detailed outline of these approaches and a comparison with the parameter-based approach. The central question is whether these parameters are in general exhaustive, in the sense that "no radically semantic different features occur" (Lemke 1998: 39). As far as the corpus at hand is concerned, this seems to be the case. However, research into different genres might point to additional parameters of evaluation. The parameter-based framework of evaluation is hence to be regarded as an open-ended approach, and in its present form allows the simple addition of more parameters as research into evaluation progresses. Likewise, evaluations of EVIDENTIALITY can be used to explicitly refer to a particular facet of our knowledge, namely its source. Both evaluations of EVIDENTIALITY and evaluations of MENTAL STATE clearly depend on our theory of mind constituted by our everyday mental concepts (Pemer 2000: 297). Young children, for instance, are not able to express the evidential (experiential) source of their knowledge (Perner 2000: 302). Memory, as Perner also points out, crucially entails a reflection of the evidential source of past events (2000: 307). Moreover, the quantified comparison is strictly limited to inscribed evaluations by the writer; other kinds of evaluations in the text are disregarded (on the complexity of evaluation in discourse see Hunston [2000: 181]). In order to identify evaluation, which is a task that is far from straightforward (Hunston and Thompson 2000: 14-15; Stotesbury 2003: 330-331) I did not rely solely on my intuition; instead, a combination of methods was used, involving corpuslinguistic methods: - previous (often corpus-based) research was surveyed to identify potential evaluative means; - native speakers were questioned: when they gave contradictory responses (as was frequently the case) the linguistic expressions were excluded (as not unequivocally evaluative); - the Bank of English (a general corpus of spoken and written English from Britain, the US, Canada and Australia, which stood at 450 million words at the moment of the analysis) was the basis for extensive corpus research concerning the evaluative potential of individual linguistic devices; - a corpus-based dictionary was used to check the evaluative force of linguistic expressions (COBUILD). For the wave metaphor in connection with evaluation see Hunston (1994: 200).
218
Monika A. Bednarek
References Batstone, Rob 1995 Grammar in discourse: Attitude and deniability. In Principle and Practice in Applied Linguistics, Guy Cook, and Barbara Seidlhofer (eds.), 197-214. Oxford: Oxford University Press. Bayer, Klaus 1982 Mit Sprache bewerten. Praxis Deutsch 53:15-25. Bednarek, Monika Α. 2005 Frames revisited: The coherence-inducing function of frames. Journal of Pragmatics 37: 685-705. in press Evaluation in Media Discourse: Analysis of a Newspaper Corpus. {Studies in Corpus and Discourse). London/New York: Continuum. Bell, Allan 1991 The Language of News Media. Oxford: Β lackwel I. Bell, Angela, Mark Joyce, and Danny Rivers 2 1999 Advanced Level Media. London: Hodder and Stoughton. Biber, Douglas 1988 Variation across Speech and Writing. Cambridge: Cambridge University Press. Biber, Douglas, and Edward Finegan 1988 Adverbial stance types in English. Discourse Processes 11: 1-34. Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad, and Edward Finegan 1999 Longman Grammar of Spoken and Written English. London: Longman. Bublitz, Wolfram 2003 Emotive prosody: How attitudinal frames help construct context. In Anglistentag 2002 Bayreuth Proceedings, Ewald Mengel, Hans-Jörg Schmid, and Michael Steppat (eds), 381-391. Trier: WVT. Caldas-Coulthard, Carmen R. 1994 On reporting reporting: The representation of speech in factual and fictional narratives. In Advances in Written Text Analysis, Malcolm Coulthard (ed.), 295-308. London: Routledge. Chafe, Wallace 1986 Evidentiality in English conversation and academic writing. In Evidentiality: The Linguistic Coding of Epistemology, Wallace Chafe, and Johanna Nichols (eds.), 261-272. Norwood, N.J.: Ablex. dayman, Steven E. 1990 From talk to text: Newspaper accounts of reporter-source interactions. Media, Culture and Society 12: 79-103. COBUILD Collins COBUILD English Dictionary, editor in chief John Sinclair, new and completely revised edition 1995.
Evaluation and cognition
219
Coates, Jennifer 1983 The Semantics of the Modal Auxiliaries. London: Croom Helm. Conrad, Susan, and Douglas Biber 2000 Adverbial marking of stance in speech and writing. In Evaluation in Text. Authorial Stance and the Construction of Discourse, Susan Hunston, and Geoff Thompson (eds.), 56-73. Oxford: Oxford University Press. Cotter, Colleen 2001 Discourse and media. In The Handbook of Discourse Analysis, Deborah Schiffrin, Deborah Tannen, and Heidi Hamilton (eds.), 416-436. Oxford: Blackwell. Dixon, Robert M.W. 1991 A New Approach to English Grammar, on Semantic Principles. Oxford: Clarendon Press. Du Bois, John W. 1986 Self-evidence and ritual speech. In Evidentiality: the Linguistic Coding of Epistemology, Wallace Chafe, and J. Nichols (eds.), 313-336. Norwood, N.J.: Ablex. Fowler, Roger 1991 Language in the News. London: Routledge. Francis, Gill 1994 Labelling discourse: An aspect of nominal-group lexical cohesion. In Advances in Written Text Analysis, Malcolm Coulthard (ed.), 83— 101. London: Routledge. 1995 Corpus-driven grammar and its relevance to the learning of English in a cross-cultural situation. Manuscript. Graham, Phil 2003 Critical Discourse Analysis and evaluative meaning: Interdisciplinarity as a critical turn. In Critical Discourse Analysis. Theory and Interdisciplinarity, Gilbert Weiss, and Ruth Wodak (eds.), 110-129. Houndmills/New York: Palgrave Macmillan. Greenbaum, Sidney 1969 Studies in English Adverbial Usage. London: Longman. Gruber, Helmut 1993 Evaluation devices in newspaper reports. Journal of Pragmatics 19: 469-486. Halliday, M.A.K. 2 1994 An Introduction to Functional Grammar. London: Edward Arnold. Hardt-Mautner, Gerlinde 1995 'Only connect.' Critical Discourse Analysis and corpus linguistics. Manuscript. http://www.comp.lancs.ac.uk/computing/research/ucrel/papers/techpaper/vol6.pdf
220
Monika A. Bednarek
Hoye, Leo 1997 Adverbs and Modality in English. London/New York: Longman. Hunston, Susan 1993 Professional conflict: Disagreement in academic discourse. In Text and Technology·. In Honour of John Sinclair, Mona Baker, Gill Francis, and Elena Tognini-Bonelli (eds.), 115-134. Philadelphia/ Amsterdam: John Benjamins. 1994 Evaluation and organization in a sample of written academic discourse. In Advances in Written Text Analysis, Malcolm Coulthard (ed.), 191-218. London: Routledge. 2000 Evaluation and the planes of discourse: Status and value in persuasive texts. In Evaluation in Text: Authorial Stance and the Construction of Discourse, Susan Hunston, and Geoff Thompson (eds.), 176207. Oxford: Oxford University Press. Jordan, Michael P. 2000 Lexical anaphoric and/or cataphoric assessors. LACUS-Forum 26: 281-292. Lemke, Jay L. 1992 Interpersonal meaning in discourse: value orientations. In Advances in Systemic Linguistics: Recent Theory and Practice, Martin Davies, and Louise Ravelli (eds.), 82-194. London: Pinter. 1998 Resources for attitudinal meaning: Evaluative orientations in text semantics. Functions of Language 5 (1): 33-56. Lyons, John 1977 Semantics. Vols. 1, 2. Cambridge: Cambridge University Press. Malrieu, Jean Pierre 1999 Evaluative Semantics: Cognition, Language and Ideology. (Routledge Frontiers of Cognitive Science 3). London/New York: Routledge. Perner, Josef 2000 Memory and theory of mind. In The Oxford Handbook of Memory, Endel Tulving, and Fergus I. M. Craik (eds.), 297-312. Oxford: Oxford University Press. Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik 1985 A Comprehensive Grammar of the English Language. London: Longman. Rooryck, Johan 2001 State-of-the-article: Evidentiality (part 1). Glot International 5 (4): 125-133. Sinclair, John M. 1991 Corpus Concordance Collocation. Oxford: Oxford University Press. 1998 The lexical item. In Contrastive Lexical Semantics, Edda Weigand (ed.), 1-24. Amsterdam: Benjamins.
Evaluation and cognition
221
Stotesbury, Hilkka 2003 Evaluation in research article abstracts in the narrative and hard sciences. Journal of English for Academic Purposes 2: 327—341. Talmy, Leonard 2003 Toward a Cognitive Semantics. Volume II: Typology and Process in Concept Structuring. Cambridge, Mass.: The MIT Press. Tannen, Deborah 1993 What's in a frame?: Surface evidence for underlying expectations. In Framing in Discourse, Deborah Tannen (ed.), 14—57. Oxford: Oxford University Press. Thompson, Geoff 1994 Collins COBUILD English Guides 5: Reporting. London: Harper Collins. Thompson, Geoff, and Susan Hunston 2000 Evaluation: An introduction. In Evaluation in Text: Authorial Stance and the Construction of Discourse, Susan Hunston, and Geoff Thompson (eds.), 1-27. Oxford: Oxford University Press. Ungerer, Friedrich, and Hans-Jörg Schmid 1996 An Introduction to Cognitive Linguistics. London/New York: Longman. Vestergaard, Torben 2000 From genre to sentence: The leading article and its linguistic realization. In English Media Texts: Past and Present, Friedrich Ungerer (ed.), 151-176. Amsterdam/Philadelphia: John Benjamins. White, Peter R.R. 1998 Telling Media Tales: The News Story as Rhetoric. Ph.D. diss., University of Sydney. 2001a Appraisal Outline. Manuscript. http://www.grammatics.com/appraisal.. 2001b Attitude/Judgement. Manuscript, http://www.grammatics.com/ appraisal. 2002 Appraisal. In Handbook of Pragmatics, Jef Verschueren, Jan-Ola Östman, Jan Blommaert, and Chris Bulcaen (eds.), 1-27. Amsterdam/Philadelphia: John Benjamins. 2004 Attitudinal meaning and inference - semantic prosodies in text and inter-text. Manuscript. University of Birmingham.
Chapter 8 Causality and subjectivity: The causal connectives of Modern Greek Eliza Kitis
1. Introduction This paper deals with the main (Modern Greek) MG causal conjunctions: epeiSi, yiati and Sioti. There are many connectives in Greek which, besides their other (main) function or meaning, have either causal connotations or secondarily are used as causal connectives. The prototypical, however, purely causal subordinating-connectives in Greek are the ones examined here. What is interesting to note is that these three causal subordinators are almost invariably translated as because in English.1 The last one, Sioti, has in the past been considered the high version of the causal connective yiati, and since the climate at the time (after the fall of the junta [1974] and when demotiki, the low variety, became the official language of the state) was unfavourable to lexical items originating from katharevousa, the high variety, Sioti was not even included in grammar textbooks as a causal connective. However, it is widely used both in spoken but primarily in written Greek. As both yiati and Sioti have identical etymologies, they also serve similar functions as I have shown in previous studies (Kitis 1994, 1996). On occasion, therefore, I may refer to these two connectives in the singular. In the present study I want to focus on degrees or types of causality designated by these connectives, the interrelation between causality and temporality as exhibited by epeiSi, and issues of subjectivity relating to the use of Sioti and yiati. It will be shown that the functions of these connectives correlate closely with their etymologies and their synchronic meanings are informed by their histories. Their distinct distribution, too, seems to be a reflex of their historical evolution. This study is part of a larger project aiming to prove that conceptual domains such as temporality and causality, as evidenced by Greek connectives, rather than being characterized by discreteness, merge in intricate ways in the use of these connectives. As has been noted, distinct uses of English fcecawse-clauses are invariably translated by either epeiSi or yiati and Sioti. If there is consistency in modes of translation, this is expected to inform current theories such as
224
Eliza Kitis
Sweetser's (1990).2 Before addressing these issues, however, let me summarize some of my previous findings, as they are pivotal in what follows. For reasons of clear exposition I will label epeiSi as because j (be j), on the one hand, and yiati and Sioti as because2(bc2), on the other. The examples have been drawn from a large corpus of both conversational and written language. All examples are real data unless otherwise stated. Causal connectives have been researched widely in Dutch, French and German (see Pit 2003 for references), but not in English (but see Breul 1997; Lagerwerf 1998). Because, with certain exceptions, is a glaring omission from research in connectives in the relevance or the neo-Gricean literature, for instance (cf. Blakemore 2002; Carston 2002; Levinson 2000). The reason for the interest in causal connectives in these other languages seems to be the various linguistic forms that exist in them, while in English because (except for infrequent for) is the typical causal subordinator. In Greek linguistics, apart from my work, there has been scant interest in the causal connectives of MG (Kalokerinos 1999, 2004), but in Ancient Greek (AG) there is a study of causal conjunctions that may inform our accounts (Rijksbaron 1976).
2. The descriptive account 2.1. Distribution of causal connectives As I have shown elsewhere (Kitis 1994, 1996), only epeiöi(bcjj can be preposed (cataphoric) in because p, q structures.3 Whereas epeiöi(bcj) can occur either initially (cataphoric or forward) in because p, q or finally (anaphoric or backward) in q because ρ structures, yiati and öioti (because2) cannot appear initially in them (*because p, q), but always follow their main clauses (q, because p). In other words, they are always anaphoric. However, these latter causal connectives can occur, and do occur predominantly, both in spoken (after a longer pause) and in written language (after a full-stop) sentence-initially, or even paragraph-initially,4 but always following their main complementing it. This is a most prevalent use of these two connectives. The former subordinator, epeiöi(bcj), when it appears finally, is always closely connected to its main, and if it appears sentenceinitially, outside the because p, q structure, the adverbial clause will constitute another speaker's turn completing, as in a question-answer adja-
Causality and subjectivity
225
cency pair, its main which is a question; that is, it retains the structure, q because ρ: (la)
yiati 6en ir6e ο Stefanos? Why not came-3SG the Stephen? 'Why didn't Stephen come?'
(lb)
epeiöi (yiati) ine arostos. B e c a u s e ^ is ill-MS. 'Because he is ill', (fabricated, both causals possible)
The distribution of these connectives can be summarized pictorially in figure 1: i. because p, q
ii q(,) because p,
Hi
q. Because p,
Figure 1. The distribution of epei8i{bc\) and yiati(bc2)
2.2. Initial placement of'epei5i(bci)'-clauses in spoken Greek It is interesting to note that truly initial placement of because-clauses, preceding the main clause, is not encountered in Schiffrin's (1985, 1987) con-
226
Eliza Kitis
versational data. Ford (1993: 17) also reports that in her conversational corpus, "because-cldXiSQS are never placed before the material they modify." This is not true of written English, as is well known. In written English, or in some genres (Ford, 1993: 86), initial placement of becauseclauses is not only possible but multiply practiced. Ford attributes the absence of initial because in conversational genre of talk to the "lesser degree of planning that goes into it" (86), but also to "the possibility of inferring cause from sequence" (89). My explanation, however, differs from Ford's (1993), as it draws on the connective's etymological make-up and its evolutionary meaning (see section 10). In Greek, on the other hand, initial placement of epeiöifbc/^-clauses is very frequent and absolutely typical of conversational data, too: (2)
epei6i(bcj) I Becausej
kolitiöa
sas ine malon
the colitis
your is rather
pjo more
'Because your colitis is rather more' ektetameni ap' oti siniOos extensive than what usually 'extensive than (what is) expected' Ig epeiöifbcj) to zaharo sas and because ι the blood-sugar your 'and because your blood-sugar' Sen mas epetrepe na sas öosoume kortizoni not us(acc.) allowed-3SG to you give-IPL steroids 'did not allow us to give you steroids,' y' afto sas kratisame mesa liyo parapano. for this you(acc.) kept-IPL in a little longer, 'that's why we kept you in a bit longer', (telephone conversation between patient and doctor) This frequent initial placement of epeiöifbc j) in conversational Greek is considered to be a reflex of aspects of its meaning which derive from its etymological make-up and its diachronic use (see sections 8 and 10).
Causality and subjectivity
111
2.3. Morphological/phonological reduction The connective yiati(bc2) appears in reduced form as yia in some dialects and in demotic songs (Tzartzanos, V. II: 138): Strates mou, kadarisete milies mou, fountoQite yia Θα perasi ο yambros. 'Paths, clear up, apple trees, send forth leaves, for the bridegroom will pass'. It can also appear in reduced form as για in fast speech, just as the English because, although not so frequently, it can be both phonologically and morphologically reduced if it is epistemic or performative (as because is reduced to 'cause). It is interesting to note that yiatifbc^ cannot bear the main stress in a construction, whereas epeiöifbcj) can. The reason for this potential of epeiöi as well as for the reduced form of yiati will become evident further down.
3. Grounding the discourse In an earlier paper (Kitis 1996), I argued that epeiöi (be^-introduced
pre-
posed clauses, just like initial conditional //-clauses (Ford and Thompson 1986; Haiman 1978), "serve a general framework- or background-creating function for the discourse that follows them" (Ford 1993: 14). The following example comes from an interview with NATO's ex-Secretary General. The interviewer is not quite happy with the answers he got from Solana regarding his conversion from a staunch anti-Nato campaigner to NATO's Secretary General, and he decides to rub it in by grounding his argument in a completely new perspective. This grounding and perspectivization is in this case effected by the use of a very long epeiöi(bc^-introduced clause (containing 59 words). Solana's response to it is the second longest in the specific interview: (3)
Because\!tpei5i
I come from a country which, like yours, had and
has many prejudices as regards NATO [... etc.] for this (reason) (that's why) I would also very much like you to tell me [... etc.]
228
Eliza Kitis
The pro-form 'for this [that's why]' substitutes for the topical causal construction. As Haiman (1978: 577) writes "left-dislocation apes the discourse situation in which topics are generally established: initial mention in a full form is followed by subsequent mention in reduced form." Thematic left-dislocation or a pro-form encapsulating the reason/cause clause is not a prerequisite for the initial placement of epei0i(bcj j-clauses. Such preposed clauses can preface (un)shared information that serves as background knowledge for the interpretation of more topical discourse (Schiffrin 1987: 205), as in the following example, a memo that circulated at our Department: (4)
Becausey/epeiöi
the loss of the OHP that was in Mr. X's office will
be reported as a theft, we request that whoever took it or forgot it lying around should report it at 308A by Wednesday 9/11/04. Indeed, preposed epeiSi(because]), just like temporal markers, can function, not only as a topic builder, but also as a topic shift builder and a segmentation marker in continuous discourse (Bestgen and Vonk 2000; Virtanen 1992); but this claim has to be substantiated by a corpus analysis. In sum, the initial findings of these sections can be stated as follows: Table 1. Functions of epeiSi(bcj) clauses i.
epeiSi(bcj), but not yiati or öioti, can occur in sentence-initial position in
ii.
because p, q structures in all genres, including casual conversation.5 Sentence-initial epeiöi(bcj) is discourse dependent in that its completion (or
iii.
iv. v.
part of it) is to be sought in the following discourse. Preposed because ρ clauses in because p, q structures have thematization potential. In this function, the presented proposition is assumed to be given or undisputed. On account of the above, and in particular (ii), we can claim that epeiSiibcj) introduces dependent clauses. On account of the facts summarized in figure 1, it becomes clear that, while yiati/öioti{bc2) can be only anaphoric and it therefore has a backward or retrospective function, epeiSi(bc j) can be both anaphoric, and hence can have a backward or retrospective function, but it can also be cataphoric, and hence it can have a forward or prospective function.
Causality and subjectivity
229
It is often stated in the literature that subordinate adverbial clauses are grouped in language with the "logically" presuppositional clauses (Givon 1982). This claim, though not explicitly stated, must concern only propositional uses of connectives.6 However, as Givon (1982) stresses, becauseclauses are "logically presuppositional" but they can also pragmatically rather than logically background information which functions as a topic. Although he does not restrict the topic-function potential of becauseclauses to initial placement, that is, to forward causal because, this point is implicit in his examples. He (1982: 102) concludes: "Logical presupposition, involving 'truth-values', is thus a more limited phenomenon, often corresponding to - but never identical with - extreme cases of pragmatic backgrounded-ness". It is worth stressing, then, that epeidifbcj) preposed adverbial clauses do not necessarily (re-)introduce presupposed information, but rather information that needs to be taken on trust, as having been established in previous discourse, as given or as indisputable. Owing to this connective's (becausej) potential for (back-)grounding the following discourse, I dubbed epeidifbc j) 'polyphonic' or 'heteroglossic' (Bakhtin 1981); its adverbial clauses can be likened to 'quotations from previous discourse' (Akatsuka 1986; Van der Auwera 1986).7
4. The factual causality of 'epei6i(bci)' In this section, I will demonstrate that only epeiSi (because])
introduces
factual propositions. To this end I will consider Haiman's (1978) claim that because-clauses are asserted rather than presupposed (573). Although he is not explicit about it, what Haiman has in mind are q because ρ constructions rather than because p, q ones. In other words, he examines backward or retrospective because. Since in MG we have the option between at least two causal subordinating conjunctions, it is interesting to see how English because would be each time translated in Greek depending on whether it introduces presupposed or new information. Let it be noted that both connectives are acceptable (except in 5a) if the presuppositional issue is not heeded. Haiman considers the timeworn sentence: (5)
Do you beat your wife because you love her?,
230
Eliza Kitis
which is three-way ambiguous: it can question, depending on where the focus stress occurs, either (i) the causal relation between the two clauses, or (ii) either of the two clauses: (5a) (5b) (5c)
Do you beat your wife because you love her? Do you beat your wife because you love her? Do you beat your wife because you love her? (stress indicated)
In (5a) the question is whether the causal relation holds between the two clauses which represent material or information that is regarded as given. In (5b), on the other hand, the speaker questions the validity of the content of the causal subordinate clause as being adequate grounds for the given action of beating (Given that you beat your wife, do you beat her because you love her?); whereas in (5c) the speaker questions the validity of the content of the main clause as being the effect of the causal proposition. (Given that you love your wife, do you beat her because you love her?). Of course, Haiman's concern is with topical constructions, but topical constructions need to represent given information. In MG now the situation is quite different, because given causal material is best structured in epeiöifbcyj-clauses rather than in yiati fbc 2)-c\auses, as can be readily shown in translating Haiman's sentences: (6a)
xtipas
ti yineka sou epeiöifbci)/*yiatifbco)
tin ayapas?
Beat-2SG the wife your because(bc_T)/* (bc2) her love-2SG? (6b)
xtipas ti yineka sou *epeiöi /yiati tin ayapas? Beat-2SGthe wife your because*(bc] V (bc2) her love-2SG?
(6c)
xtipas ti yineka sou epeiöi /?yiati tin ayapas? Beat-2SG the wife your because(bci)/?(bc2) herlove-2SG? (stress indicated)
In (6a), which is the equivalent of (5a), the acceptable causal conjunction is epeiöifbcj), but not yiati (be2), which, moreover quite expectedly, cannot carry the main stress. The reason for this situation seems to be that the causal construction must carry presupposed information since what is questioned is, not the contents of the two propositions presented in the two clauses, but rather the causal connection between them. As has been
Causality and subjectivity claimed,
epeiöifbcj),
but
not
yiati (be2),
can
signal
231
presupposed
assumptions. Moreover, since the focus of the question is the causal relation between the two propositions, epeiöifbcj) is the only choice as it is the exponent of 'pure' or 'direct' causality and can be, unlike yiati (be2), topicalized or become the focus of cleft-constructions (Kitis 1994). On the contrary, in (6b), which translates (5b), yiati(bc2) sounds better, while epeiSifbcj) seems to be outright unacceptable (unless [6b] is a rhetorical question or a quotational one echoing someone else's claim); the reason for its unacceptability is the questioning of the causal clause's proposition. Similarly, in (6c), which translates (5c), epeiSifbc]) is the natural choice, while yiati(bc2) does not seem to presuppose the truth of the causal clause's proposition. Rather, the speaker's choice of yiati (be2) will probably indicate his/her doubts about the truth of the state of affairs presented in the causal clause (a sign of subjectification of causality, see below).8 The claim made regarding the choice of causal connectives in translating Haiman's examples is corroborated if we try to paraphrase them making explicit each time the presupposed material: (6bi) öeöomenou oti xtipas ti yineka sou, ti xtipas * epeiöifbc j)/yiati(bc2) tin ayapas! 'Given that you beat your wife, do you beat her because(bc2) you love her?' The unacceptability of epeiöi(bcj)
becomes obvious in an alternative
question in which the two alternants are contradictories: (6bii) Given that you beat your wife, do you beat her because (yiati be2) you love her or because (yiati be 2) you hate her? Let it be noted that epeiöifbc j) in (6bi) is unacceptable inasmuch as the question is about whether your loving her or not constitutes the grounds for beating her, which (the beating) is the topic (given), and not just about the causal relation; (6c) will be paraphrased as follows: (6ci) öeöomenou oti ayapas ti yineka sou, ti xtipas tin ayapas?
epeiöi(bcl)/?jati(bc2)
232
Eliza Kitis 'Given that you love your wife, do you beat her because(bcl) you love her?'
In (6ci) epeiSifbcj) is acceptable if the question is about whether you beat her or not on the grounds that you love her, whereas yiati(bc2) is questionable in this case. It appears that whenever the proposition of the causal clause is presented as an undisputed presupposed fact, epeiöi(bc j), rather than yiati(bc2), seems to be the preferred causal connective. We can conclude that only epeiöi(bcj), but not yiati(bc2), can bear both the presuppositional (factual) and the focused causal relation.
5. Syntactic and semantic constraints on causal connectives In previous studies (Kitis 1994, 1996), I identified both syntactic and semantic constraints characterizing the use of (because j)-clauses: epeiöifbecause/^-introduced clauses have potential for topicalization and can be left-dislocated (see section 3), can be embedded, and are included within the scope of the negative operator of the main clause. These constraints do not affect yiaii/Sioti (be ^-introduced clauses. The latter can be neither topicalized or left-dislocated, nor can they be included within the scope of the negative operator of the main clause; yiati/Sioti(be^-introduced clauses also resist embedding. Moreover, these constraints seem to affect, or rather reflect, the type of relation that holds between the conjoined clauses. As we have seen, the two causal connectives (because because2) are not freely interchangeable, even in structure (ii) (fig. 1), that is, when they occur in postposed clauses. In many cases, quite apart from presuppositional aspects of the introduced clause, the substitution of epeiSifbc j) for yiati(bc2) is unacceptable: (7)
anapsan fotja yiati/*epeiöi vlepo kapno Lit-3PL fire, because2/*bcj see-lSG smoke. 'They lit fire, (be)'cause I see smoke', (fabricated)
Causality and subjectivity
233
While epeiöi(because j) cannot occur in (7), in some other cases its substitution for γίαίί(because2) or Sioti(because2) incurs a dramatic change in the function of the adverbial clause it introduces: (8)
an δ en tis etroyes, θα peQenes. If not them ate-2SG would died-2SG. 'If you did not eat them, you would die.' Sen ihe simasia an i patatoflouSes itan vromikes, Not did matter if the potatopeel was dirty, 'It did not matter if the potato peel was dirty,' yiati Sen tis eplenan. because2 not them washed-3PL. 'because they did not wash them'.
It is interesting to note that in (8) the because-clause - as each time it follows a comma or a significant pause - occurs as an intonationally separate unit. And as such this adverbial clause can admit yiati or Sioti (because2) but not epeiSi (bcj). If the latter causal subordinator replaces yiati (be 2), then it has to be intonationally incorporated within the main clause, and the meaning of the structure q because ρ is completely changed: The causal adverbial epeiSi (be;,)-clause is included within the scope of the negative operator. So, what is negated is the causal connection obtaining between the main and the subordinate clause, as can be shown in the following notation: (a)
epeiSi ~ ((p —> q)
because j~r)
As I have demonstrated here and elsewhere (Kitis 1996), the proposition of the adverbial epeiSi (be yj-clause is considered factual. In the actual example, (8), however, what is negated is the proposition of the main clause only, while that of the yiati (because 2)-c\a\ise, is excluded from the negative scope as its function is simply to explain why the peel was dirty. It focuses on the lexical item 'dirty': (b)
yiati
~ (p
q) because2 ~ r
234
Eliza Kitis
yiati(because2)i therefore, seems to introduce in a separate intonational unit a clause that functions as an explanation, justification or metalinguistic comment. In all cases of the use of epeiSi(bcj) the causal connection is pronounced or strong. In other words, the function of the construction q epeiöi(bcj) ρ is to actually assert the causal connection between ρ and q. This is at least partly the reason why an epeiöi(bc/^-introduced clause can be included within the scope of the negative operator. This also explains why thus introduced clauses can be embedded in reporting verbs. What is embedded or reported is the causal relation between the two propositions of the clauses (Kitis 1996). So the question that seems to emerge regards the kind of causal connection that is signified by the two types of causal connective. Or else, if a causal connection is asserted by epeiöi(bcj), what is the function of yiati/dioti(bc2)l Kitis (1994: 313) writes in this connection: The causal connection in all cases [of yiati(bc2)\ has to be inferred because it is indirect. By 'indirect' I mean that the explanation arrived at on the basis of what is stated is not subsumed under a regularity or a generalization. On the face of it, therefore, sentences do not exhibit the same degree of explanatory coherence as they would if the accession of the explanation did not require any extra inferential effort.
And Kitis (1996) underlines that only epeiöifbcj), but not yiati(bc2), can signify factuality (see Keller 1995), presupposionality (see Chafe 1984) and causal relations. The claim that the former connective (becausej) only signifies a causal relation between the two clauses is also proven by its potential to be embedded and be included within the scope of the negative operator. That epeiSi-clauses only connote factuality and presuppositionality is shown by their intolerance of epistemic modalizers suspending the factual character of the epeiöi(bc^-introduced clause, such as modal verbs and adverbs, while they can be modified by adverbs strengthening the causal relation (Kitis 1994): (9)
Ti Öerni yiati/?epeiöi malon den tin ayapa her-ACC beat-3SG bc2/?bci rather not her-ACC love-3SG 'He beats her, because^ he rather doesn't love her'.
Causality and subjectivity (10) Tis öini her-ACC give-3SG
Sora presents
23 5
sinehos constantly
yiati/*epeiöi prepifepist.) na tin αγαρα bc2/*bcj must[epist]-3SG to her-ACC love 'He gives her presents all the time because2 he must(epist.) love her', (fabricated, epeiSifbc j) in 10 will render 'must' deontic) As the propositional content of yiatifbc^-dzuses is not considered factual or presuppositional, but is rather asserted, these clauses have acquired a more or less autonomous status; as a result yiatifbc^ clauses can at times be regarded as (near-)paratactic rather than subordinate. In conclusion, yiati/Sioti(bc2) clauses resist embedding, intonational incorporation with the main and inclusion within the main clause's negative scope. Moreover, as yiati/Siotifbc^ is not the prototypical exponent of a causal relation between the two clauses, it cannot be topicalized. In short, yiati/Siotifbc 2) clauses cannot be considered presupposed or factual and, as they are independently asserted and their causal connection to the main is weak, they are not dependent on the nucleus (main) sentence. It follows then that while epeiSifbc /J-clauses must be truth-evaluable within the complex sentence, yiati/öioti(bc2) clauses need not affect the truth-conditions of the conjunction.
6. Sweetser's solution In this section I will turn to Sweetser's (1990) account as an obvious source for a solution to our problem. She claims that we can explain the function of causal connectives in terms of the three domains she identifies, at which language supposedly functions, of content-world, epistemicity and speechacts. The examples she cites in relation to causal connectives will be best translated as follows: (11a) (lib) (11c) (11a)
John came back because he loved her. John loved her, because he came back. What are you doing tonight, because there's a good movie on. ο yianis yirise epeiSifbcj) tin ayapouse
236
Eliza Kitis
(lib)
ο yianis tin ayapouse, yiatifbcj) yirise
(11c)
ti kanis to vrabi, yiatiibcj) ehi ena kalo eryo
Let it be noted that (11a) will accept yiati(bc2), too. However, the contentworld reading is clearly rendered only with epeiöi(bc j); yiati(bc2) signals a rather subjective (the speaker's) interpretation of the situation (see below). At first sight it appears that epeiöifbcj) functions as a content-world connective whereas yiati(bc2) has an epistemic and speech-act use. Indeed, Sweetser (1990: 82) writes: "My final argument for the existence of these domains is that there are languages whose vocabularies distinguish more clearly among the domains than is the case in English." However attractive Sweetser's solution may appear to be, it loses its explanatory rigor as soon as we apply her trichotomization to real data (Kitis 1996). The reason for the collapse of the theory is that there can hardly be any neat trichotomization of domains, as postulated by Sweetser, especially between the content and the epistemic domains. As was shown in Kitis (1994, 1996) in many cases, yiati(bc2) functions as an attenuated causal connective introducing a reason explanation or a justification or even a comment that is very loosely connected with the main clause or with some lexical item, or it can even have a meta-linguistic or metadiscursive commentary function. For example, yiati(bc2)-c\auses are nowadays overwhelmingly used in advertisements and in most cases there is no clear connection between the yiati(bc2)-c\ause and the preceding main one. It usually functions as a comment justifying why we should buy a product as admonished in the main clause or even in a preceding NP: (12)
Beauty shop, Beauty shop,
yiati because2
i aniksi the spring
deli wants
ananeosi change.
'Beauty shop, because2 spring calls for change'. The connection between a directive speech act, which is presumably performed in the elliptical NP-main clause, and the adverbial clause hardly warrants a speech-act interpretation of the function of this connective, as its proposition does not refer to "the relevance or irrelevance of a state of affairs as causing or impeding the speaker's action" (Sweetser 1990: 81). Consider the gloss: ? We are inviting you to come round to the Beauty shop because spring calls for change.
Causality and subjectivity
23 7
If we concede that the main clause performs the speech act of inviting (directive), then one of the (rather peripheral) felicity conditions for this speech act might be the stating of a reason for speaker's performing the act (Searle 1975). However, whereas in speech act theory this reason is incorporated in the speech act performed, either directly ("Why don't you be quiet?"), or by embedding the speech act clause ("You ought to be more polite to your mother"), or by making it dependent as an object-clause on the main expressing the reason ("It might help if you shut up"),9 in our case we would have to stretch the notion of 'reason' beyond any generally acceptable, patterned connection between the two clauses. This 'peculiarity' is by no means characteristic of Greek advertising only: (13)
Everybody recognizes status. Because not everybody has it. (American Bank advertisement)
(13) admits yiati(bc2), but not epeiöifbcj), in its Greek translation.10 If we read this as a content-world because, which seems the only option available in this case, then our world will become a very strange place to live in. Moreover, as we read this advertisement, indeed we feel an authorial authoritative voice (persona). Is it then an epistemic or speech-act because? We seem to be on a wild-goose chase. Another 'peculiar' function of the same connective, which does not seem amenable to a Sweetserian interpretation, is the following: (14)
Tha psakso na So mipos eho kamia fototipia eyo, Will search-1SG to see whether have- 1SG any copy I 'I'll have a look just in case I have a copy', yiati to iha persi. because it-clitic had-lSG last year 'because2(yiati[bc]2/*epei5i[bci]) I had it last year'. (a colleague and I looking for a document; she suggests that she should have a look in her office, 'unmarked' [falling] intonation is assumed, for rising intonation, see Kitis 1994)
(14) cannot be a content-world connective (if it were its paraphrase would run as follows: Because I had it last year, I may have it this year too; but not: ?Because I had it last year, Γ 11 have a look inside), neither can it be an epistemic one.11 The because-clause cannot be the premise for the main as
238
Eliza Kitis
conclusion. (Gloss: ?Since/Because I had it last year I can conclude that I'll have a look inside! "The because-clause is fully sufficient as a cause for the act of concluding" Sweetser, 1990: 80). The only viable interpretation might be the speech-act one; but then the adverbial clause does not justify the speech act performed (stating), but rather its propositional content, and therefore it can be better explained as providing a reason explanation (a non-nomic cause, Itkonen, 1983) for it (for the action which is described as imminent). If (14) can be somehow made to fit Sweetser's thesis, consider (15): (15)
ihame, SilaSi, ti hiroteri Had-1 PL, namely, the worst 'We had, that is, the worst view,'
Be a view,
yiati i kaliteri Oea evlepe stin αγία sofia. because2 the better view looked-3SG to the St. Sofia 'because the best view was over St. Sofia (church)', (writer talking on the radio about his house in Thessaloniki) epeiöi(bcj) is totally unacceptable in this context precisely because there is no causal connection that is being asserted between the two clauses;12 yiati(bc2) does little more in (15) than actually conjoin the two clauses in an argumentative or elaborative manner. The yiati (be2)-c\a.use elaborates on the main one. Whatever tests we may apply to (15), we will not detect any causal or evidential, or reason relation between the two clauses in whatever domain we look for its interpretation: (15) (15) (15) (15)
a. *The reason for having the worst view was that the best view was over St. Sofia. b. * Because the best view was over St. Sofia, we had the worst view. c. *Since the best view was over St. Sofia, we had the worst view. d. *I am in a position to say/assert/know/conclude that we had the worst view, because the best view was over St. Sofia Church.
Such examples that do not fit Sweetser's thesis, even at the propositional level considered here, are legion in my corpus. Initially, her thesis seems to make sense in the case of the Greek causal connectives: epeiSifbcj) is the causal connective used for the content-world
Causality and subjectivity
239
domain and yiati(bc2) is the connective used for the other two domains, epistemicity and speech acts. However, Sweetser's thesis collapses under the strain of real data occurrences of the Greek causal connectives. Although we might initially claim that the identification of the domains is a way of getting off the ground, the problem with the meaning and function of yiati/0ioti(bc2) looms large. Moreover, epeiöi(bcj) can be used as a speech act connective in initial placement even when an explicit performative verb is absent (Kalokerinos 1999; Kitis 1994, 1996). Quite clearly, we have to look in a different direction for an adequate explanatory account. Before turning to other sources that might provide an explanatory account though, it would be useful to state the findings of the above sections:
7. The findings To conclude this rather descriptive, section we may summarize the main findings in the following table: Table 2. Functions of MG causal connectives i.
epeiSi(bcj),
on the one hand, and yiati/öioti(bc2),
on the other, are not
ix.
freely interchangeable, even in post-posed position. yiati/0ioti(bc2) only can occur sentence/paragraph initially, but always following the main clause. yiati/Sioti(bc2) only can be considered near-paratactic connectives. epeiSi(bcj) only is factual. epeiöi(bcj)-clauses only can be cleft-constructed. epeiSi(bcj J-clauses are always subordinate to the main clause. epeiSi(bcj^-clauses only can be included within the scope of the negative operator, can be embedded and intonationally integrated within the main. epeiSi(bcj) in q because ρ structures always asserts a causal relation between the two clauses. epeiSi(bcj) predominantly can assert causal relations that are considered
x.
direct (also see section 9). epeiöi(bcj) only can be the focus by bearing the main stress.
xi.
epeiSi(bcj)
ii. iii. iv. v. vi. vii. viii.
(initial/final).
signals a cause or a reason irrespective of its placement
240 xii.
Eliza Kitis yiati/öioti(bc2)
can signal a cause or a reason, but it predominantly
appears to signal a rather attenuated form of an indirect causal relation or sometimes it appears to have an argumentative or elaborative function (also see section 9).
8. The explanatory account 8. 1. The grammaticalization of 'epeidi(bc,)' While these findings are interesting, we have not explained why we have two distinct types of causal subordinating connective each being characterized by distinct formal and functional characteristics. How can we explain features such as the assertiveness, but non-factuality, of yiati/Sioti(bc2), or the factuality and initial placement of epeiöi(bcj) even in conversational data? Tracing back the history of epeiöifbcj),
we see that this connective was
primarily a temporal conjunction and a causal one already in Homer. But originally it probably was a deictic expression having a demonstrative-local meaning (Schwyzer 1939: 659): επ-εί (= dar-auf) ep-ei (= dar-auf)
> επεί > epei
> επεί δή > επειδή13 > epei δϊ > epeiSi
There seems to have been a unidirectional course in the evolution of the meaning of epeidi from the more concrete domains of locality and temporality to causality: demonstrative-local
> temporal > causal
Heine, Claudi, and Hünnemeyer (1991: 50) write that "spatial concepts are more basic than other concepts and therefore provide an obvious template for the latter." As has been amply demonstrated in the literature, meanings that were initially conveyed as conversational implicatures were later established as conventional ones and later entrenched as semantic meanings. It can be surmised that AG epei, which was a temporal connective, also acquired meanings of causality, initially by conversational implicature, that were later mutated to stable conventional implicatures and semantic meaning. Rijksbaron (1976), who claims that epeidi-clauses of
Causality and subjectivity
241
AG have to be interpreted along the same lines as epez-clauses (95), reports causal implications of temporal epei and epeidi ('seeing that', 'now that') in Herodotus, although he claims that this observation does not necessitate assignment of causal meanings to the conjunction (76, 93) (cf. Powel 1949 for a different view). Besides, the most frequently used causal connective of A G was γσρ/yar (Rijksbaron 1976: 185).
This evolution of epeiöi's meaning, then, may explain a great deal of its characteristics, namely its initial placement, its discourse-grounding function, its factuality, its signifying a causal relation between the two clauses, its potential for presuppositionality and the speaker's uncommitted14 status regarding the introduced clause. The e/?e/' δι' ότι (deshalb weil') > διότι > γιατί
We must also note that ότι/oti in AG is not only a conjunction serving as a non-factive complementizer to assertive verbs, but primarily to mental and psychological verbs (Monro 2000: 242). In other words, the origin of ότι/oti leads us onto paths of subjectivity. As a causal conjunction, ότι/oti appears to function in AG in exactly the same fashion as does yiati/öioti(bc2) in MG. 19 All this can explain why yiati/Sioti(bc2) may function as a subjectivity marker introducing arguments (which of course are asserted but subjective). This subjectivity element inherent in öiotifbc^'s evolutionary history and etymological make-up can account for its use as a speech-act connective or an epistemic one, since in both cases there is no explicitation of the inferential premises drawn upon by the speaker, but rather the interpreter needs to retrieve implicit inferential configurations for its interpretation. This causal connective is the one to modulate causes, internalize and mutate them to reasons, and introduce illocutionary forces. Its argumentative assertiveness is also explained on the grounds of its subjectivity. What is filtered through the speaker's cognitive interface comes out as an argument or is looked upon as having the force of one. Its argumentative force also explains its tendency to become a paratactic connective:
244
Eliza Kitis subordinate > hypotactic (Hopper and Traugott 1993)
>
paratactic
Since the yiati/öioti(be^-clause's primary function is to assert the speaker's belief or to state his/her argument, the causal relation is accorded secondary importance. Moreover, the causal connection can be modalized in a number of ways, since the proposition does not iconically reflect the out-there order of the real world or necessarily a nomic proposition. Since yiati/öioti (be2)clauses are not 'objectively' or necessarily causal, the conjunction enters a process of desemanticization, shedding part of its meaning narrowly described as causal. And as the causal relation is not focused, the conjunction cannot be focused upon (no-clefting potential). Indeed, yiati/öioti(bc2) may best be described as a broadly functioning argumentative discourse marker rather than a causal connective.20 Moreover, yiati/öioti (be2), just like another temporal/causal connective of MG, αφοΰ/afou (since) (cf. Kitis 2000b), exhibits a high degree of position variability, as shown in the examples below: (16)
5ose mou ta yramata, fevyo yiati. Give-IMP to me the letters, leave-lSG because2 'Give me the letters, because2 I'm leaving'.
(17)
Α: ίο afises yiana to plino [to sakaki]? it-clitic left-2SG to it-clitic wash-lSG [the jacket]? Θα to pas ston rafti? Will it-clitic take-2SG to the tailor? 'Did you leave it [the jacket] for me to wash? Will you take it to the tailor?' B: mpori 'Maybe' A: ine yiati skismeno kses [kserisj. Is-3SG because2 torn know-2SG 'Because it is torn, you know'. S' ayapao afou. You-ACC love-lSG since 'But/Because I love you'.
(18)
Causality and subjectivity
245
yiati, just like afou (since), can occur medially (17), or can be 'appended' at the end of a clause (main, 16) transforming it into an explanation, justification or argument, thus communicatively incorporating the clause in which they occur into the preceding discourse. This behavior of these two connectives, which is identified initially at the conceptual or propositional level of their function, can thus be interpreted as evidence for a process of grammaticalization and desemanticization, whereby they emerge as discourse particles. Reanalysis of these connectives will show a shift in both semantic and syntactic categories affecting the status of both the proposition and the sentence in which they occur. Thus marked (by appending or interposing yiati and afou) utterances seem to have an echoic status, in that they do not further the discourse in respect of its informational increment. All this leads us to regard epeiSifbcj) as the prototypical causal connective, on the one hand, and, on the other, to view 0ioti(bc2), but primarily yiati(bc2), not only as an expressive marker, but also as a device (particle) for organizing conversation, since it functions as a semantically rather empty co-ordinating connective (Kitis 1994). So Traugott's (1989) schema, is not only applicable in the case of yiati, but it would also have to reflect yiati's organizational potential as a discourse marker and as a conversation-organizing device:21 Propositional > textual > expressive22 >interactional organizational) (Traugott's schema extended)
(or
conversation
While epeiöi seems to merge the temporal with the causal domains, yiati/Sioti(bc2), on the other hand, is a clear case where the supposedly 'out-there' content-world is caught in the net of the subjective world that filters everything that comes its way, internalizing external stimuli as causes and reason explanations, or simply co-ordinating them as barely relevant. Indeed, yiati (be2) can be said to be a purely 'relevance connective' (Blakemore 1987; Moeschler 1993), with procedural function, orientating the hearer to access the clause it introduces as relevant to what has preceded it, or as relevant to the main directionality of the discourse (Koutoupis-Kitis 1982). The acute question that emerges is whether in our linguistic world, which both reflects and is reflected by the extra-linguistic world, we can have clear-cut cases and neat categorization such as causality and temporality, epistemicity and factuality, subjectivity and objectivity.
246
Eliza Kitis
If Keller (1995) is right in his claim that we must distinguish only between epistemic and factual causal connectives, 23 then we can assume that we are left with the onus of having to account for these two latter types of connectives. Surely, Sweetser's model, neat and tidy as it may be, is not adequate for explaining their function, as I have shown. What needs to be taken on board is their evolutionary course, their etymological make-up and a careful examination of the versatility of their functions which will be extracted from a big corpus. What is most essential, though, is to blot out a neat demarcating line between a content-world domain and an epistemic one, because, as we have seen with respect to MG causal connectives, the postulation of these domains cannot afford us an adequately explanatory account of real data occurrences of these connectives. As I have shown here by way of considering the main MG causal connectives, cognitive domains such as the factual or objective and the internal-subjective rub shoulders producing disconcerting friction as do domains, such as temporality and causality, which cannot be ripped apart. In the following section I will discuss the issue of objectivity-subjectivity that is brought to bear on the use of the connectives under discussion. The notions of subjectivity, speaker involvement and perspective have often been identified as underlying factors regulating the use and function of causal connectives (Pander Maat and Degand 2001; Pit 2003; Verhagen 2000, amongst others). However, my notion of subjectivity is quite distinct as it relates to types of knowledge and reasoning. In what follows I will elaborate on this issue and will propose a principled (non-discrete) distinction between the two domains of subjectivity and objectivity.
9. Subjectivity vs. objectivity I would now like to draw our attention to a well known and well appreciated fact of more recent linguistic research, that audible and "visible language is only the tip of the iceberg of invisible meaning construction" as Fauconnier (1997: 1) put it so succinctly. The main bulk of the iceberg that stays underground, or rather underwater, to carry the metaphor further, is the more or less well organized system of structured background knowledge in memory on the basis of which both language and reasoning function. This background knowledge, as I claimed elsewhere (Kitis 1982, 1987a, b, 1995, 1999, 2000a), can be represented in frame and script structures stored in memory (Minsky 1975; Nelson 1996; Schank and Abelson 1977).
Causality and subjectivity
247
In my view, the generic informational content (the flesh) of these structures can be stated primarily, if not exclusively, in the form of the 'primordial' conditional. It should be recalled that conditionals are assumed to be incorporated in the formal structure of universal statements, too ('For every x, if χ is S, χ is P'). Both law and non-law statements may be expressed in this general form. But law statements warrant inferences of the form 'if a, which is S, then a is P\ The latter inference is an instantiation, or a token, with specific values of the former general law-like statement, or the type, which is expressed in variable form.24 Now, because-clauses can be considered to be the reverse of ifp... then q (p —> q) conditional statements that are structured in thematic chunks in our encyclopaedic memory. Moreover, they might on occasion be regarded as instantiations of more or less general (law-like) statements, that is, statements regarding specific cases: (i) The water boiled because it reached 100° C, (If water reaches 100° C, water boils [generic]). It does not seem unfathomable to assume a principled, but fuzzy, division between two types of reasoning, one firmly based on general background knowledge-and-belief systems as stored in memory (example [i]), and another based on more individualized subjective quasi-knowledge and beliefs. If background knowledge-and-belief systems are generally accepted, they form the backdrop of language use and language comprehension, as such systems support inferencing processes. This type of knowledge can be called, rather schematically, 'objective knowledge'. What the term 'objective knowledge' needs to bring to the fore is the intersubjective character of this type of knowledge and beliefs that is the subject matter of semantic and broad encyclopaedic memory. In the French (post)structuralist tradition intersubjectivity would be construed as a kind of intertextuality (Barthes 1970; Kristeva 1969), but not all knowledge and beliefs need to be the product of other texts (Lyons 1979). In KoutoupisKitis (1982) I postulated two types of background knowledge and beliefs: Standing Background Knowledge and Beliefs (SBKBs), corresponding mostly to rather intersubjective, objectivized knowledge, and Current Mutual Contextual Assumptions (CMCAs), which include local knowledge and rather subjective beliefs (cf. Kitis 2000a; see also Allen 1995 and Nelson 1996 for similar distinctions). This type of more individualized quasi-knowledge and beliefs can be called 'subjective knowledge'. Further, we can represent these two domains of knowledge and beliefs as two bipolar extremes on an epistemic continuum. At one extreme there will be placed what might be called the objective or intersubjective domain and at its antipode the subjective domain. Indeed, Searle (1995, 1997, 2002,
248
Eliza Kitis
2004) distinguishes between objective and subjective epistemic claims. These two extremes are the end points of a conceptual epistemic continuum connecting the knowledge-substrata of our propositions and, more generally, of reasoning. Figure 2 is a schematic representation of the proposed cline. The top level of the cline reflects the world level, both objective and subjective; the bottom level represents the epistemic continuum between the two types of knowledge that interact (dotted lines) with their counterparts at the world level (domains), both feeding them and being fed by them: cline
objective domain
objective knowledge
subjective domain
subjective quasiknowledge and beliefs
Figure 2. The epistemic subjectivity-objectivity cline
The 'objective' polar extreme of the proposed cline would 'attract' objective knowledge, nomic regularities and common knowledge, all expressed in the form of law-like conditional statements (Kitis 1995, 1999, 2000a). In short, at this extreme there would cluster common knowledge as objective configurations of subjective knowledge, and, consequently, statements of direct causality, interpretable as nomic or non-nomic regularities, norms and internalized rationality principles. Objective knowledge then can be captured in if...then conditional clauses. Interpretation of linguistic contributions supported by objective knowledge would also require a more direct type of reasoning. In short, it would not ordinarily involve inferential leaps that would require active inferencing procedures on the part of the interpreter. At this end of objective knowledge, one would expect both such if... then general statements, but also their instantiations concerning specific cases. Because-clauses expressing what I called direct causality (Kitis
Causality and subjectivity
249
1994) are expected to be converted into such general conditional statements: because p, q or q because ρ statements would be directly paraphraseable as if p, (then) q26 ones, where both ρ and q would retain their propositional form and content, or would be their generalized entailments.27 Consequently, these structures can be instantiated both in epeidi(bcj) and Yiati/öioti(bc2) clauses (the latter only in postposed position). At the other extreme, labeled subjectivity, one would expect a clustering of non-general knowledge (subjective quasi-knowledge, or beliefs, not supported by common knowledge); also, uncertain knowledge and individualized configurations of reasoning patterns which have not acquired a general status and are not, therefore, regarded as objective or factual, or at times, even acceptable. This type of non-'objective' (in the sense of nonintersubjective, non-generalized and non-recurring)28 quasi-knowledge and beliefs cannot be represented in generic ρ —> q statements. Intentional causality (Searle 1983) tends to be supported by this latter type of knowledge structures. Capturing this type of knowledge as conceptually underpinning the use of causal statements often entails reasoning that requires maximal inferential leaps on the part of the interpreter; such inferential leaps are not readily predicted or anticipated on account of systematically stored-inmemory knowledge types. In other words, this proposal subsumes epistemic and speech act uses of connectives under an inferential procedure that involves inferential leaps, which, however, once explicitated will reveal conceptual links at the semantic propositional level. On this view then epistemic and speech act uses of connectives can be regarded as short-circuited conceptual ones. It is this type of short-circuited or 'gapped' reasoning then that would not admit epeidifbc j) but license yiati/Sioti(bc2) (cf. examples 8, 12, 14, 15). Quite apart from speech act and epistemic uses of causal connectives, this proposal will also encompass many other cases similar to the ones identified here that do not fall within the one or the other category. Therefore, this account seems to be more adequate and elegant since it subsumes under its purview all cases of causal connection. The suggested cline between the two domains of objectivity and subjectivity (adopting rather general fuzzy terms) may recall Sweetser's (1990) postulation of the domains of content-world and epistemicity. However, quite apart from the absence of discreteness in the postulated continuum, Sweetser's domain of epistemicity is the product of mapping operations from the sociophysical world, whereas the proposed subjectivity-objectivity dimension relates to the content of our linguistic contributions, its knowledge-substrata residing in memory, and the brand of reasoning used.
250
Eliza Kitis
Returning to the case of the Greek causal connectives, I would like to propose, in a rather programmatic fashion, that epeiöifbcj) is the causal connective functioning primarily at the objectivity end, whereas both yiati and 0ioti(bc2) can handle cases that would cluster around the subjectivity pole, although they could also occur at or near the objectivity end, especially in fuzzy cases of causality, factuality and presuppositionality. Indeed, I suggest that this latter connective (yiati/5ioti(bc2)) should be seen as a lexical development indexing a domain of subjective individualized quasiknowledge. It can be further proposed that, not only its morphology, but also its syntactic behaviour (final placement or post-item placement [following the item it qualifies] examples 16, 17) is a development of the conceptual subjective domain from which it derives its function. Subjective, individualized, non-general quasi-causation is rendered as explanation rather than as objective causation. The latter (objective causation) need not be filtered through the subjective interface as it has been amassed and entrenched in 'public' memory. It need not be expanded into an inferential chain that is assumed but not directly stated in our linguistic contributions. Objective causation can be linguistically rendered as directly linked clauses of conditional statements. Moreover, the former (subjective explanation or justification) ordinarily follows its explicandum, as it relies on subjective internalized quasi-causation precepts, not readily translated in conditional statements. It appears, therefore, that all the characteristics of the two connectives discussed here fall out of the bipolar continuum between subjectivity-objectivity proposed here. More specifically, epeiöifbc]) can be preposed in an iconic fashion, just as causes in the objective domain (representing the 'out-there' world) are conceived and perceived as preceding their effects, while yiati/öioti(bc2) always follow their main clause, mirroring our subjectively interpreted causation or reason explanation which ordinarily follows the exlanandum; epeiöifbcjys potential for grounding the discourse, for factuality and presuppositionality falls out of its firm capacity to frame propositions in a 'speaker-detached', so to speak, fashion and present them as objective and factual states or events, unrelated to the subjectivity of the speaker. In sum, what lies behind the notion of a content or propositional causal connective (primarily epeiöifbcj)) is our ability to subsume the relation between the two clauses within a nomic or non-nomic patterned, or less patterned, regularity. Such a regularity can in principle be symbolized, as has been noted, in the form of a conditional statement, ρ q, stored in
Causality and subjectivity
251
memory; epeiöifbcl), therefore, indexes causally 'compact', coherent propositions; epeiöifbcj) guarantees, so to speak, causal and explanatory coherence. Explanatory coherence entails that there are no huge inferential leaps to be made that will not be readily warranted by broad encyclopaedic knowledge stored in or mutated into memory. This statement also explains the occurrence of epeiöifbc]) in both speech-act and epistemic cases when the speech act or the epistemic verb is explicitly articulated. If the speech act verb or the epistemic predicate is encoded, then there is no need for reasoning by the interpreter that would require inferential leaps. Moreover, enablement uses of because, which require longer inferential chains, as in He will come and spend Christmas with us because I'm paying for his flight, where a free ticket enables him, but does not cause him, to spend Christmas with us, are rendered with yiati/öiotifbc2), but not with epeiöifbc]). The reason for this seems to be the gapped causal inferential chain that is necessarily generated by the propositions encoded in the two linked clauses and their entailments.29 Now, if we take on board Keller's suggestion regarding speech-act uses of connectives, then we can identify the main problem in the data we examined at the intersection of what Sweetser calls content-world and epistemic domains. In this section, I suggested that a division of labour between an objective and a subjective world as a basis for an initial account of the function of causal connectives might be a better solution than Sweetser's domains. The transition from norms to nomic ρ q regularities, to non-nomic regularities, to more individualistic, interiorized and internalized inferencing patterns interpreted as explanation or justification does not warrant its theoretical translation into the Sweetserian postulated discrete domains of objective and epistemic worlds. Moreover, we have seen that the postulation of a subjectivity-objectivity cline does not only account for, but also seems to be necessitated by, the distinct functions of causal connectives in Greek. In the next section, I will compare English because with each one of the two Greek connectives. The findings are expected to explain some of the facts that have been pointed out.
10. Comparison of 'because' with Sioti (bc2) and epeiSi (bcj) Whereas because in English has not been reported to occur in initial position in because p, q structures in conversational data, epeiöifbcj), as I have
252
Eliza Kitis
already noted, occurs very frequently in this position in the genre of conversation. On the contrary, just like because, yiati/öiotifbc^ does not occupy initial position in a cataphoric relation to the main clause either in spoken or written language. That is, it does not occur in because p, q structures.30 I would like to propose now that this similarity in distribution between öioti(bc2) and the English causal connective because (in the genre of conversation) reflects a similarity in function, which in its turn is explained by the similar etymological make-up of the two connectives. Because originates from the preposition, by, and the substantive, cause; originally it was used as a phrase, by cause, followed by the cause or purpose expressed by a substantive governed by of or a subordinate clause introduced by that or why (OED). According to the same source, such subordinate clauses fell into two classes: (a) they expressed cause or reason, and (b) they expressed purpose. When the subordinate clause signified cause or reason, that was often omitted, whereas in (b) cases that prevailed in modern usage. öioti (bc2) (yiati being its low variety variant, but both used interchangeably without significant stylistic variation in MG) originates, just like because, from the conjunction of the preposition öia(=by) and the neuter of the demonstrative pronoun in the accusative, touto, which takes the place of a substantive, followed by the conjunction oti, the equivalent of that, nowadays called the complementizer: δια + touto + oti > δια + oti > öioti Whereas in the case of öioti, the substantive touto gets omitted, in the case of because it is that that is left out in cause or reason clauses: by+cause + that > by+cause > because epeiöi (be]), on the other hand, originates form the compounding of the preposition epi/επί, (meaning '[up]on'), and the doric ei/ει, a locative adverb meaning 'where' (this point, however, is not confirmed or discussed in the literature); δί/δή is an emphatic particle which was occasionally appended for emphasis and later lost its emphatic function and became a constant part of the conjunction: epi + ei > epei, epei öi > epeiöi
Causality and subjectivity
253
Whereas epeiöi grounds its reason or cause as a temporal anteriority in its spatial anchoring, this grounding is non-existent in both Sioti and because; rather these conjunctions have been the product of a substantive (English: cause, Greek: touto) introducing a cause or purpose clause (English: that/of, Greek: oti) by way of a preposition (English: by, Greek: δια). And thus introduced the propositions of these clauses were probably placed closer to the sphere of teleology than that of factivity;31 due to their teleological character then, such because or yiati/öioti structures would ordinarily be postposed, since they signified final cause, purpose or design (teleology