224 107 10MB
English Pages 311 [316] Year 1995
Focus and Coherence in Discourse Processing
Research in Text Theory Untersuchungen zur Texttheorie Editor Jänos S. Petöfi, Macerata Advisory Board Irena Bellert, Montreal Antonio Garcia-Berrio, Madrid Maria-Elisabeth Conte, Pavia Teun A. van Dijk, Amsterdam Wolfgang U. Dressler, Wien Nils Erik Enkvist, Abo Robert E. Longacre, Dallas Roland Posner, Berlin Hannes Rieser, Bielefeld Hartmut Schröder, Vaasa Volume 22
w DE
G Walter de Gruyter · Berlin · New York 1995
Focus and Coherence in Discourse Processing Edited by Gert Rickheit and Christopher Habel
w DE
G Walter de Gruyter · Berlin · New York 1995
© Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability.
Library of Congress
Cataloging-in-Publication-Data
Focus and coherence in discourse processing / edited by Gert Rickheit and Christopher Habel. p. cm. - (Research in text theory ; v. 22) Includes bibliographical references and index. ISBN 3-11-014466-2 (alk. paper) 1. Discourse analysis - Psychological aspects. 2. Focus (Linguistics) 3. Cohesion (Linguistics) 4. Cognitive science. I. Rickheit, Gert. II. Habel, Christopher. III. Series. P302.8.F63 1995 401'.41-dc20 95-22835 CIP
Die Deutsche Bibliothek -
Cataloging-in-Publication-Data
Focus and coherence in discourse processing / ed. by Gert Rickheit and Christopher Habel. - Berlin ; New York : de Gruyter, 1995 (Research in text theory ; Vol. 22) ISBN 3-11-014466-2 NE: Rickheit, Gert [Hrsg.]; GT
ISSN 0179-4167 © Copyright 1995 by Walter de Gruyter & Co., D-10785 Berlin All rights reserved, including those of translation into foreign languages. N o part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Printed in Germany Typesetting: Utesch Satztechnik GmbH, Hamburg Printing: Oskar Zach, Berlin Binding: Lüderitz & Bauer-GmbH, Berlin
Contents Introduction
VII
Part I: Focus Simon Garrod Distinguishing between Explicit and Implicit Focus during Text Comprehension
3
A. J. Sanford and L. M. Moxey Notes on Plural Reference and the Scenario-Mapping Principle in Comprehension
18
Jochen Müsseier, Martina Hielscher and Gert Rickheit Focussing in Spatial Mental Models
35
Jochen Müsseier Focussing and the Process of Pronominal Resolution
53
Andrea Schopp Focussing and the Use of German beide
75
Part II: Coherence Simon Garrod and Gwyneth Doherty Special Determinants of Coherence in Spoken Dialogue Hans-Jürgen Eikmeyer, Walther Kindt, Uwe Laubenstein, Lisken, Hannes Rieser and Ulrich Schade Coherence Regained
97 Sebastian 115
Jutta Kreyß Comprehension Processes as a Means for Text Generation
143
Gert Rickheit, Lorenz Sichelschmidt and Hans Strohner Economical Principles in Coherence Management: A Cognitive Systems Approach
170
Christina Hellman The Notion of Coherence in Discourse
190
VI
Contents
A. J. Sanford, S. Β. Barton, L. Μ. Moxey and Κ. Paterson Cohesion Processes, Coherence, and Anomaly Detection
203
Udo Hahn Distributed Text Structure Parsing - Computing Thematic Progressions in Expository Texts
214
Osten Dahl Causality in Discourse
251
Jörg Brummel Focus and Anaphora. A Selected Bibliography (1985-1993)
261
Subject Index
299
Introduction This volume comprises papers presented at two conferences held in the ZiF, the Centre for Interdisciplinary Research at the University of Bielefeld, in November 1990 and in July 1991. Five of them are contributions on the topic of "Focussing in Text Comprehension and Generation" and eight of them on questions of "Coherence in Discourse Processing". Accordingly, the book consists of two main parts: Problems of focus in discourse processing are discussed in the first part, and aspects of coherence in discourse processing are outlined in the second. Finally, there is a selected bibliography of these topics, which gives an overview of current research. Focus and coherence processes are fundamental to all processes of discourse processing, the basis of which is a mental model, i. e. a dynamic cognitive representation of discourse worlds. The function of the mental model is to enable the construction of sense by integrating verbal information and individual knowledge. Mental models can be modified by means of utterances. Theories of discourse comprehension based upon the concept of mental models attempt to embrace the whole world of discourse in a single step. In the course of processing, a model of the text, i. e. a propositional representation of the verbal expression in question, and a model of the world, i. e. a set of individual schemata from the knowledge domain in question, are condensed and integrated to produce the current model of the discourse, i. e. a representation of the state of affairs conveyed by the discourse. Consequently, readers' or listeners' mental models are open to further specification, differentiation or modification. For the creative recipient the processing of a continuous text implies the ongoing and dynamic maintenance of his or her current state of knowledge. On the basis of the semantic information available, the recipient keeps updating his or her current mental model and creating new semantic information above and beyond what has actually been said. In a framework of mental models reference is managed by pointing out that discourse models serve as interfaces between language and the world, because discourse models evolve from linguistic as well as extralinguistic sources of knowledge. In addition, semantics is treated in an integrative, procedural way. In contrast to many linguistic approaches, the concept of mental models emphasises the interaction of the language and knowledge based components of meaning. In those sciences concerned with language production and discourse processing, focus is a well established and rather commonplace notion. Derived from optics, the term focus is frequently used to denote that part of a sentence or a text, which carries the most relevant semantic information.
VIII
Introduction
Researchers widely agree that relevance is signalled verbally by word order, stress, or other means of emphasis; dependence on the context is also generally recognised throughout. There is, however, considerable disagreement as to which part of an utterance is actually the most relevant. According to some researchers focus is to be understood as new information, i. e. the information least available to a reader or a listener at a given time. In contrast, others argue that focus is to be understood as the information most prominent in the recipient's mind, i. e. the information most readily accessible at a given time. In view of these differing opinions, a colloquium in the Centre for Interdisciplinary Research at the University of Bielefeld was organised to provide researchers with an interdisciplinary forum for an exchange of views on focus. The aim of the colloquium was to discuss the various definitions of focus and to explore the use of the concept of focus in linguistics, cognitive psychology, and artificial intelligence. As these disciplines are at present engaged in the establishment of an integrative cognitive science, the opportunity to thoroughly examine one of the most fundamental concepts in cognition during a three-day colloquium received great response. Most of the papers read during the colloquium dealt with focussing in discourse comprehension. The topics were not restricted to structural aspects such as the surface form of particular verbal utterances or the format of individual mental representations, but also embraced cognitive aspects such as the mental operations required to establish those structures. Altogether, three major issues were examined during the colloquium. (i) H o w does focussing become manifest during text comprehension? From the results of several psychological and linguistic studies, the methodological problems involved in an empirical assessment of focussing processes came repeatedly under discussion. It was agreed that a variety of empirical techniques can effectively be employed in studies of focus, since different approaches require different methods. (ii) H o w do people maintain focus during discourse processing? A number of contributions concentrated on coreferential mechanisms in language processing in general and on the cognitive strategies readers employ in the resolution of anaphora in particular. During the colloquium, psycholinguistic experiments were presented in which contextual variation demonstrated that topicalisation and foregrounding are important determinants of the cognitive effort required in reference resolution. (iii) H o w should focussing processes be modelled? To answer this question, contributions from the artificial intelligence domain primarily dealt with the problem whether or not focussing phenomena could be simulated in an all-ornothing fashion. Moreover, the theoretical status of the focus concept was examined from different points of view: Approaches which emphasised the explanatory value of focus were presented side by side with approaches which considered focus as a construct requiring further explanation.
Introduction
IX
Perhaps the most important results of the colloquium were specific suggestions for future research. On the one hand, it was argued that it might be helpful to distinguish between focussed representations in the working memory and nonfocussed representations in the long-term memory. Similar distinctions are made in artificial intelligence to keep the search of referent domains within manageable proportions. On the other hand, it was suggested that a distinction should be made between those focussed representations that arise from explicit information and those that arise from implicit information. While explicit focus is based on the discourse itself, implicit focus is based on the recipient's individual knowledge, on presuppositions and inferences. The distinction between explicit and implicit focus was made by Sanford and Garrod (1981) in their focus framework for discourse comprehension. In the first contribution to this volume Garrod discusses this distinction in the light of more recent evidence and considers its broader implications for the comprehension process. The theory of text comprehension presented by Sanford and Garrod (1981) is also the basis for Sanford s and Moxey s investigation of plural reference and the scenario mapping principle in text comprehension. There is a special part in this model where the most prominent objects are represented and can be accessed by anaphora. The representation gives an adequate solution in the singular, but applied to the plural it leads to difficulties. Therefore, the authors propose additional construction processes which can build up complex objects during text comprehension. In the third contribution, Müsseier, Hielscher, and Rickheit discuss the function of spatial focussing in establishing a mental model. From a psycholinguistic point of view, a dynamic model of focus adaption is favoured, which describes the focussed parts of the mental text model as the most prominent parts which are easy to refer to. Müsseier discusses different ways of pronoun processing in discourse. On the one hand, the pronominal resolution can be accomplished by means of a search mechanism. On the other hand, he postulates a pro-active process, which accompanies reception of the potential antecedent, so that the connection between reference concept and pronoun already exists when the pronoun is encoded. A special problem of language production is examined by Schopp, who focusses on the selection of some plural pronouns, which are a means of referring to different kinds of complex objects. The construction of a complex entity can be triggered by a plural antecedent phrase or a coordination of singular entities, or it can be reached by making inferences when a plural pronoun occurs. She works out different functions of the German anaphora "beide" ("both") in the production of discourse. The second part of this volume depicts aspects of coherence in discourse processing. Coherence is a term often used in the linguistic community to refer to various facets of linguistic connectedness. However, there is a considerable lack of agreement about the exact definition of the term. While some researchers
X
Introduction
differentiate between the notion of 'coherence' and 'cohesion' and confine the former to semantic aspects, others prefer a broader application of the term. Coherence, in the broader sense, means the connectedness between parts of the linguistic system at all levels, i. e. at the phonological, morphological, syntactic, semantic and pragmatic levels. The conference aimed at an integrated definition which allows for the investigation of coherence processes within a cognitive framework. In accordance with this position it is possible not only (i) to treat all aspects of coherence as aspects of the language processing system, but also (ii) to relate the linguistic concept of coherence to a general concept of coherence in cognitive systems. Only if the linguistic aspects of coherence are described as specific instances of the general coherence phenomena can the specific characteristics of linguistic coherence be understood precisely. Garrod and Doherty describe some differences between coherence processes of spoken dialogues and those of written texts. Whereas coherence in a written text reflects the way in which the sentences stick together and connect in a consistent way, coherence in spoken dialogue reflects the way in which the utterances from each participant mesh together to form a co-ordinated exchange. Eikmeyer, Kindt, Laubenstein, Lisken, Rieser and Schade outline within the framework of dynamic systems how to regain coherence in special cases of spoken dialogue. They illustrate four applications for their theoretical conception of coherence, two referring to language production, one to parsing and the last one to a dynamic theory of grammar. All applications mentioned concern a special class of construction found in spoken discourse. The interaction between comprehension processes and processes of text generation is studied by Kreyß. According to Grice's conversational maxim that the hearer has to pay full attention to the speaker, she attempts to maintain attention by violating the expectation of the hearer. The violations used in the study are of such a type that the hearers were able to re-establish the coherence of the discourse by special inferences. To realise this, she chose the anticipation feedback loop as a means of integrating the violation of expectations into a planning component of a generation system. Rickheit, Sichelschmidt, and Strohner describe some theoretical, methodological, and empirical aspects of discourse processing within a 'systems' framework. According to a cognitive view of linguistic coherence, coherence is part of the dynamics of a cognitive system which processes language. Such a system consists of sensori motor, syntactic and pragmatic components. These components are modular but highly interactive parts of the language processing system. New discourse information is stored in the working memory and connected to preceding discourse information and general knowledge. The discourse information seems to be immediately processed not only at the sensori motor and syntactic levels, but also at the semantic and pragmatic levels of linguistic knowledge. The cognitive system is able to solve problems of linguistic coherence by means of constructing coherent information networks in the working memory.
Introduction
XI
Processes of cohesion, coherence and anomaly detection are described by Sanford, Barton, Moxey, and Paterson. They discuss some studies which show that the impact an element of a discourse has on establishing a coherent representation can be minimal. O n the basis of the empirical data, they investigate how selective processing works. While a soft-constraint satisfaction framework is capable of arbitrarily producing results of this sort, the details of constraints are still an empirical question. They suggest that far from computing coherence relations, which correspond to a system yielding deductive truth values, much of coherence in discourse may be an illusion. In their opinion the principles differ from those of present computational-linguistic methods of establishing the process of coherence. Hahn considers the computing of global patterns of thematic text organisation within the methodological framework of a distributed model of text understanding. As an alternative to processing the macro structure of a text, which has a long tradition in descriptive text linguistics, he develops a formal characterisation of thematic progression patterns within an integrated conceptual framework of text coherence parsing. H e shows how the recognition of this class of macro structure depends upon the resolution of cohesion patterns at the local level of text organisation. An integrated model of text structure computation, which incorporates both the micro and macro structure parsing of text phenomena, is therefore required. Hahn presents an implemented system which works in terms of a lexically distributed text parser and which is capable of analysing a written text with respect to text cohesion as well as to text coherence. This model belongs to the strongly interactive models of natural language understanding. The article by Dahl concerns a special type of coherence. Dahl points out that causal relations tend to play an important role in theories of discourse structure and understanding. H e classifies and characterises different types of causality in discourse, whereby two broad types of causality are distinguished. An explicit or implicit causal relation between two events or states of affairs is, in the first type, fully developed in a defined rhetorical function. In the second type, which is especially frequent in narratives, a causal relation between two events does not have a well-defined rhetorical function, but is rather part of the general episode structure built up during the discourse. In these cases, it is often difficult to determine exactly the nature of the relations involved. The editors would like to thank the authors for allowing their papers to be published in this volume. We are indebted to the Centre for Interdisciplinary Research for financing the two workshops. The comfortable atmosphere of the Centre stimulated an intensive and profitable discussion of individual contributions as well as general aspects of focus and coherence in discourse processing. The editors would be pleased if this volume were to stimulate further research in the domain of focus and coherence in discourse processing. Bielefeld and Hamburg, April 1995
Gert Rickheit and Christopher Habel
Part I: Focus
SIMON GARROD
Distinguishing between Explicit and Implicit Focus during Text Comprehension
1. Introduction One of the goals of a skilled narrative writer is to direct the reader's flow of attention. The reader is presented with a text which typically portrays a number of characters interacting in various situations, and as the text unfolds his focus of attention needs to reflect the changing relevance of the characters and situations within the narrative. It was in order to capture this intuition that Sanford & Garrod (1981) outlined what they called a Focus Framework for discourse comprehension. This psychological framework was influenced in some ways by existing proposals in linguistics (e.g. Chafe's 1976 theory of Foregrounding) and in Artificial Intelligence (e.g. Grosz's 1977 theory of Focus) which were also motivated by the same kind of intuitions about the role of attention in understanding. However, unlike in the linguistic and computational accounts we argued for two dynamic attentional components which were termed Explicit and Implicit Focus. The purpose of this chapter is to review and refine this distinction in the light of more recent evidence and consider its broader implications for the comprehension process. The two partitions in the Focus Framework were designed to reflect two ways in which a text (prototypically narrative) unfolds. In the first place there are the various characters and other discourse entities around which the story revolves. At any point, some of these characters will be in the foreground of the story, and others in the background, and it is part of the dynamic of narrative that the state of foregrounding will change from time to time. To follow the story the reader therefore has to keep track of the foregrounded characters and other prominent discourse entities and so be able to anchor the various parts of the story to them, typically, by resolving anaphoric links. The second way in which a narrative unfolds is in relation to the scenes portrayed. Just as attention to the different characters may wax and wane as the story unfolds so there is a flux of changing scenes (or scenarios) in which these characters play different roles. Again the reader will have to be able to keep track of both the currently relevant scenarios and the roles that the characters are playing within them in order to draw the intended inferences about the characters and their setting.
4
S. Garrod
T h e Focus Framework takes account of both these dynamic aspects of narrative structure by recognising a partition. O n the one hand there is EXPLICIT Focus which tracks the currently relevant discourse entities, and on the other IMPLICIT Focus where the currently relevant scenarios are represented. Though distinct the two systems are nevertheless related to each other b y mappings between the discourse referent representation for the characters and the representation of the scenarios. So the whole structure corresponds to a 'model' of the current state of the discourse world, distinguishing both the relevant entities in that world and the situations in which they play a part. T h e first way in which the two focus partitions are differentiated is therefore in terms of the type of information which they contain at any time. A second basis for distinguishing them is in relation to their psychological properties. Thus we argued that Explicit Focus unlike Implicit Focus is capacity limited, that is it is only capable of holding a small number of distinctions at any one time. This means that only a very limited number of characters can be held "in foreground" at any time. In contrast it was assumed that the only limitation on 'scenarios' available through Implicit Focus comes from constraints on their logical compatibility and the mapping possibilities they afford for the entities represented in Explicit Focus. T h e third basis for the distinction and in many ways the original rationale behind it is in terms of the function played by the two in referential processing. It was suggested that Explicit Focus plays a key role in the interpretation of pronouns whereas fuller definite descriptions are interpreted mainly in relation to distinctions represented in Implicit Focus. In this chapter I intend to explore in more detail the original basis for discriminating between the two focus partitions and consider how more recent evidence supports the distinction. In the process of doing this I will suggest that the main function of the focus system is to support text inference and it is in relation to this that the two parts of the system can be seen to play very different roles.
2. The anatomy of the focus system T h e basic organisation of the Focus system can be illustrated b y considering how a piece of text might establish a particular focus. Consider the following text fragment: (1) J o h n was late for school as usual (2) H e was worried about the math lesson T h e focus representation set up b y sentence (1) would among other things contain one main character token in Explicit Focus corresponding to J o h n and a pointer in Implicit Focus to a 'school' scenario. T h e token corresponding to ' J o h n ' would then be mapped into the 'school' scenario with the default that J o h n be identified with the role slot 'school-boy'. We know this because when you give a continuation sentence like (3) below readers typically encounter problems:
Distinguishing between Explicit and Implicit Focus
5
(3) He always had trouble controlling the class. This seems to be due to the fact that there has been an inference that 'John' is a schoolboy and being in control of the class is inconsistent with our expectations about that role. When the second sentence is encountered it is then assumed that 'He' maps onto the same main character token in Explicit Focus and 'the math lesson' is identified as instantiating another role slot in the 'school' scenario, and so the process continues as more references are encountered. Text referents are anchored to the entity representation through Explicit Focus and then the entity tokens are anchored to the situational representation through the role mappings in Implicit Focus. This simple example illustrates how a text may serve to set up a focus state, but it also points to the functionality of the system for text understanding. The Focus system is basically required for the efficient control of textual inference and it is for this reason that we need to distinguish its two components. In understanding sentences 1 to 3 two types of inference are called for, first anaphoric inferences which establish in this case that the two 'he' pronouns refer to the same character John, and then role assignment inferences of the "John = 'schoolboy'" type which give a basis for our expectations about what this character is likely to do in that scenario. The distinction between Explicit and Implicit Focus is ultimately motivated by differences in the way in which these two kinds of inference seem to be dynamically controlled. The contents of Explicit Focus in the form of entity tokens is essentially episodic in that it comes from the particular information that has recently been foregrounded in the text. Whereas the content of Implicit Focus is essentially semantic/pragmatic information reflecting the reader's knowledge of the types of situations and types of entity being portrayed. This also means that the way in which the same text referent may be represented in the two systems is rather different. Explicit Focus is basically entity or character individuating, in that what is being distinguished is the identity of the characters and other entities referred to in that section of text. So anything that is mapped onto the same token, such as the two 'he' pronouns in sentences 2-3 must identify one and only one individual in the story. In contrast, mappings between tokens and role slots do not carry the same force. 'John' in our story fragment could fill many different roles as the story unfolds, and the same role can be filled by many different individuals in the story. To this extent Implicit Focus is role or type individuating for the referents in the story. A processing consequence of this representational distinction arises in the interpretation of definite descriptions versus pronouns. Whereas referential pronouns seem to require a unique token representation for their interpretation, definite descriptions will commonly be used to identify discourse roles. Consider for instance the following text: (4) Mary usually goes to Valentino's restaurant for her lunch. She fancies the
6
S. Garrod
waiter there. For dinner she prefers to eat at La Grande Bouffe where the food is better but the waiter/*he is not nearly so handsome. It is clear that the italicised definite description 'the waiter' is not inter-substitutible with the pronoun 'he'. The reason for this seems to be that the definite description can be contextually resolved through the role slot 'waiter' in the currently active 'restaurant scenario' for La Grande Bouffe, so it does not have to identify the antecedent waiter in the first sentence. This role resolution for the definite description is not however available for the pronoun which has to identify the only matching antecedent referent. What is particularly striking about such examples is how the role identifying function afforded by the definite description readily overrides any discourse referent identifying function it may have. Of course this is consistent with what is known about the distribution of definite descriptions in text, where they most commonly occur as first mentions without any discourse antecedents (see Fraurud, 1990). This example also illustrates why it is important to recognise two aspects of Focus. Clearly the interpretation of the role 'waiter' is conditioned by the particular restaurant scenario which is active at the time of encountering the description. This means that in relation to the role representation for 'waiter', 'La Grande Bouffe' is currently focussed whereas in relation to the entity representation it is only 'the waiter at Valentino's' which is in focus. Hence we get the discrepancy between the pronominal interpretation and that of the full description. A similar effect occurs with demonstrative noun phrases, which tend to only be resolvable in relation to Explicit Focus representations. Consider for instance the following examples from Webber (1988): (5) Some files are superfiles. To screw up someone's directory, look at the files/*those If one of them is a superfile, delete it.
files/*them.
versus (6) Some files are superfiles. To screw up someone's directory, look at those files/them/*the files. They will tell you which of his files is absolutely vital to him. Again it seems that the demonstrative N P like the pronoun can only be resolved against explicitly mentioned discourse antecedents, corresponding to tokens in our Explicit Focus system. The definite description on the other hand is resolved in relation to the role 'file' in the currently active scenario 'computer directory'. So the interpretation of the definite description seems to be controlled very much more by 'scenario focussing' than 'entity focussing' and any dynamic text representation system will have to capture these differences in interpretation. The nub of the dual Focus system is therefore in the form of information represented in the two partitions and the way in which different types of referring expression have primary access to the two different representations. To this
Distinguishing between Explicit and Implicit Focus
7
extent a two part Focus system accounts for differences in the interpretation of the referring expressions. However, we would also want to argue that it has wider ranging consequences for the control of text inference and that is the issue to which we now turn.
3. Psychological properties of focus and inference
control
From a psychological viewpoint the Focus system is a kind of working memory and as such must be subject to severe capacity constraints. However, we have argued that the capacity constraints are realised in different ways for the two parts of the system (Sanford and Garrod, 1981). Information in Explicit Focus has to be established locally to reflect the particular configuration of discourse entities encountered in that stretch of text and so only a few distinctions can be held at any one time. In relation to Chafe's notion of foregrounding this means that only a few characters could be in foreground at any moment. The situation with Implicit Focus is somewhat different. Implicit Focus can be thought of as giving privileged access to certain partitions of our pre-existing knowledge, knowledge which in itself does not have to held in working memory. The only locally established information which has to be explicitly represented is the configuration of pointers which relate tokens to knowledge-based scenarios. To this extent the information limit on the Focus system as a whole applies to the number of tokens held at any time and the number of role or type mappings that they can sustain (a more detailed discussion of this aspect of Focus is given in Sanford & Moxey, this volume). This difference in the capacity constraint on the Explicit and Implicit Focus systems carries over to a difference in how the two systems may constrain textual inference. To explain this difference it is helpful to think of text inferences as involving two components a topic of inference, typically the entity that the inference relates to and is about and a content, the information that is inferred about that topic which goes beyond what is explicitly expressed in the text itself. Basically, I will be arguing that Explicit Focus plays a role in constraining the topic of the inference, whereas Implicit Focus plays a role in constraining its content. 3.1 Inference topic control Text inference has always presented something of a paradox for processing accounts. On the one hand it is apparent that readers will often draw far reaching inferences on the basis of what they read, but on the other hand there is also evidence that on many occasions rather straightforward inferences are just not drawn. We have already seen an example of the former in the inference that John in sentences (1-2) is taken to be a schoolboy, even though this information is in no way expressed by the sentences themselves. An example of the latter occurs with the following sentence, taken from Barton and Sanford (1990):
8
S. Garrod
(7) If there is a plane crash, where should the survivors be buried? About 7 0 % of skilled readers do not notice the anomaly in this question. In other words they have not drawn the very straightforward inference that survivors being alive should not be buried at all. Furthermore it does not need much modification of the text for the inference to come to the fore. Consider for instance the following variant of (7): (7') Imagine that there is a plane crash with many survivors. Where should they be buried? Presented with (7') the majority of readers will immediately draw the correct inference and notice the anomaly. What I want to argue is that such inferences are subject to an inference topic constraint, which reflects the state of that topic in Explicit Focus. Readers fail to notice the anomaly in (7) because 'the survivors' are not sufficiently focussed, while they draw the far reaching inference about 'John' in (1-2) because he is a highly focussed principal character in the story. The situation with sentence 7' lies somewhere between these two extremes. Introducing the survivors at the end of a sentence and using a subsequent pronoun to refer to them seems to be sufficient to secure them a more prominent place in Focus and hence improve the chance that they will be taken as the topic of inference. Going beyond these observations there is some experimental evidence which confirms the inference topic control function of Explicit Focus. This comes from investigating what I will call inference attribution. Consider the following text: (8) Bill tried to attract the waitress' attention. The atmosphere was hot and sticky... In the second sentence there is a comment about the state of the restaurant which reflects a certain perspective on the scene. Yet there is no explicit indication of whose, if anyone's, perspective is being taken. Now if you subsequently ask readers whether 'Bill' or 'the waitress' found the atmosphere hot and sticky they will typically only be prepared to attribute the judgement to 'Bill' the main character of the story. In fact Sanford and Al Ahmar (reported in Garrod & Sanford, 1988) found that there was a systematic relationship between the likelihood of attributional inference and the degree to which the character was introduced as a thematic subject of the story. Thus introducing a character with a proper name, a key focussing device (see Sanford, Moar 8t Garrod, 1988), insured a high probability of attributional inference. However, it might be argued that probing for the inference with an explicit question after the text has been read is in some way forcing the issue. For instance, the reader's confidence in the attribution might reflect the memorability of the main character once the text has been read. A more direct test of attributional inference was therefore designed which involved measuring the reading
Distinguishing between Explicit and Implicit Focus
9
time for a subsequent sentence whose interpretation should be differentially affected by the inference. The materials used were based on those from the earlier Sanford and Al Ahmar study but contained a critical sentence after the atmosphere statement which described an event which was consistent with the character noticing the state. For instance following a passage such as (8) above there would be this sentence: (9) He / She mopped his / her brow. The event described in (9) is consistent with either the waitress or Bill in (8) finding the atmosphere in the restaurant hot and sticky. The experiment therefore involved measuring the reading time for such sentences which either referred to the main or secondary character of the story, in a context which either contained the atmosphere statement or did not (see the examples of the materials in Table 1). The results, reported in Garrod and Sanford (1988), clearly indicate that the attributional inference only occurs for the named main character of the story. The time spent reading the sentence when it referred to a secondary character was unaffected by the presence of the atmosphere statement, while its presence had a dramatic effect on reading time for the sentence referring to the main character.
Table 1: Example of a set of materials and reading time results for the 'atmosphere statement' attribution experiment.
Lunch at the cafeteria Alistair hung up his coat and picked a tray. The waitress smiled as she poured the coffee a [He took] [She offered]
the cup.
He/She mopped his/her brow.b Reading Times for the critical target sentence b above in msecs. Thematic Secondary Subject character
3
With atmosphere statement
1379
1430
Without atmosphere statement
1650
1463
= atmosphere statement.
b
inference target sentence
10
S. Garrod
These studies therefore suggest that a character's state of Focus determines the likelihood that that character will be the topic of inference. To this extent it seems that what we have called Explicit Focus has a role in controlling inference topic although it does not of course constrain the content of the inference. 3.2 Inference Content
Control
As I have already argued one of the main reasons for proposing an Implicit partition of Focus was to capture the fact that readers are able to keep track of the relevant background information associated with the changing text scenario. At the most general level this relevant background information is going to be the source of situational inferences. Once the reader has identified the type of situation portrayed this will constrain the content of many inferences about the various individuals in the story and the events that they take part in. However, the nature of the constraint on the content of inference is rather different from the inference topic constraint discussed above. The difference can be illustrated by contrasting the way in which the following two sentences indicate that a vehicle is used in travelling to London: (9) Keith took his car to London. (9') Keith drove to London. There is a sense in which both 9 and 9' indicate that Keith used a vehicle to travel to London, but in 9 this is explicitly stated whereas in 9' it is only implied as a result of our understanding of what is involved in driving to London. From a purely referential point of view there seems to be little difference between these two ways of indicating that a vehicle was used since giving readers a subsequent sentence containing an anaphoric reference to 'the car' causes no problems for interpretation. Thus no difference in overall comprehension time can be detected for 10 when it follows 9' as opposed to 9 (Garrod & Sanford, 1982, Cotter, 1984). (10) The car kept overheating. On the basis of this and other related findings (see Garrod & Sanford, 1981) it might seem that the reader is able to extend the domain of potential antecedent referents on the basis of knowledge of the type of situation being portrayed. However, there is evidence that 9 and 9' are not completely equivalent in this respect. In the first place, it is only after 9 that one can use a pronoun to refer to 'the car'. Thus 10' is not felicitous following 9', but quite acceptable after 9. (10') It kept overheating. Similarly with 10" readers spend substantially longer reading the sentence following 9' as opposed to 9. (10") The engine kept overheating.
Distinguishing between Explicit and Implicit Focus
11
Thus it seems that both pronouns and indirect anaphors, such as 'the engine' in relation to 'the car', require explicit antecedents for their interpretation. So how does the situational knowledge facilitate the interpretation of situational anaphors such as 'the car' in sentence 10 while not establishing an antecedent discourse referent? To explain these results Garrod and Sanford (1981) proposed that the 'scenarios' in Implicit Focus contained a limited number of 'role slots' which could accommodate references fulfilling the role mapping constraints. So for instance once the reader has established that Keith is driving somewhere this will focus on a 'scenario' which contains a role slot for the vehicle used. H o w ever, the only references that can be accommodated by this role slot will be those that refer to things that comply with the mapping constraint on 'vehicles used for driving' e.g. 'the car', 'the Porsche' etc. Neither pronouns nor indirect references to a vehicle such as with 'the engine' in 10" will comply with the role mapping constraint since they do not identify vehicles as such. In relation to the inference process this kind of role constraint is rather different from the inference topic constraint proposed for Explicit Focus. Whereas constraining the topic of inference is very much an active process reflecting what the reader is actively attending to at the time, role constraints are more deeply embedded as part of the background knowledge that the process can call upon when it is needed to resolve problems of text integration. In this way Implicit Focus can be seen as playing a rather passive role in interpretation as compared to Explicit Focus which in some sense drives the reader's expectations. The different consequences of Explicit versus Implicit Focussing are clearly seen in the results of a recent eye-movement study (Garrod, O'Brien, Morris & Rayner, 1990) which explored the effects of role restriction constraints on the time taken to interpret subsequent references. The basic rationale behind the study was similar to that in the Garrod and Sanford experiment just mentioned. Various contexts were constructed which could impose a potential restriction on the nature of an antecedent referent. For instance a subject could be presented with one of the following four variants: (11) (12) (13) (14)
He He He He
assaulted her with his weapon stabbed her with his weapon assaulted her with his knife stabbed her with his knife
Subsequently they would then read the following possible target sentences: (15) He threw the / a knife into the bushes, took her money and ran away. The point of interest here is the extent to which the different types of contextual restriction on the antecedent knife effect the subsequent reading time for the reference to the knife in sentence 15. In sentences 13-14 the antecedent is explicitly constrained to be a knife through its lexical specification, whereas in sentences 12 and 14 there is an implicit restriction imposed through the use of the specific verb stabbed as opposed to the less specific assaulted. So one question
12
S. Garrod
that the study addressed was how these two forms of restriction affect the amount of time the reader actually fixates on the subsequent N P 'the knife'. However, there was also a control condition included to partial out any general priming advantage that the context might give to the subsequent identification of the word knife. Hence the inclusion of the non-anaphoric matching N P a knife on half of the trials. An example of one material item in all its conditions is shown in Table 2. Table 2: Materials in Garrod, O'Brien, Morris and Rayner (1990)
All the mugger wanted was to steal the woman's money. But when she screamed, he [stabbedl [assaulted] her with his (knife/weapon) in an attempt to quieten her down. He looked to see if anyone had seen him. He threw {the} {a} knife into the bushes, took her money, and ran away. Factors manipulated: (1) Restricting versus non-restricting context for the antecedent, (i.e. stabbed v. assaulted) (2) Explicitly matching antecedent for the target noun knife, (i.e. knife v. weapon) (3) Target in definite or indefinite NP. (i.e. a — v. the ) So subjects were presented with a number of such materials and their eye-movements were recorded while reading. The resulting fixation durations on the critical noun-phrase are shown in Figure 1. As can be seen the pattern of results is very interesting. First if we consider the non-anaphoric control items, there is only a reading advantage when the antecedent exactly matches the lexical specification on the target noun. Contexts containing either 13 or 14 lead to shorter reading times than contexts containing either 11 or 12, but the implicit restriction from the verb has no effect whatsoever. However, with the anaphoric target sentences fixation duration is equally reduced by either implicit restriction from the verb as in sentence 12 or lexical specification on the antecedent N P as in 13. So the only case where there is a reliably longer reading time is when neither restriction applies as in 11. The basic conclusion from these results is that the implicit role inference afforded by having a role restricting verb only has any processing consequences when this information is actively called upon in order to resolve a subsequent anaphoric reference. To this extent the role restriction can be seen as a kind of passive inference only brought to the fore when it is explicitly triggered by encountering the definite noun-phrase. In relation to the general organisation of the Focus system this accords well with the idea that Implicit Focus plays a
Distinguishing between Explicit and Implicit F o c u s
1
13
1
Restricting
Non-restricting
Context Key:
Ο
• • •
Explicit Implicit Explicit Implicit
Definite Indefinite
Figure 1: G a z e durations on the target noun f r o m G , Ο ' Β , Μ & R (1990)
passive role with respect to inference content which is quite different from that played by Explicit Focus in controlling the topic of inference. One final piece of relevant evidence for this contrast comes from another eye-movement study by O'Brien, Shank, Myers & Rayner (1988). This study used similar materials to impose implicit restrictions on antecedents but with an optional intermediate sentence to focus attention explicitly on the inference. A n example of one material set is given in Table 3 with the critical target noun being
zebra.
14
S. Garrod
Table 3: Materials from O'Brien, Shank, Myers and Rayner (1988)
It was little Alex's first trip to the zoo and he was amazed at the exotic sights. He giggled with delight when he saw a [pony sized (zebra/animal) with black and white stripes] [funny looking (zebra/animal) slowly chewing on the grass]. He asked his mother what it could be. Seeing a zebra was really a special event. Factors manipulated (1) Restricting versus non-restricting context, (i.e. pony-sized... w. black and white stripes v. funny looking etc.) (2) Presence or absence of explicitly focussing sentence, (i.e. He asked his mother what it could be) So these materials again contained a contrast between implicit and explicit restriction on the nature of the antecedent but on half the occasions also included a focussing sentence to make the reader explicitly concentrate on this specification, as in 16 below: (16) He asked his mother what it could be. Finally, the critical target sentences predominantly (in 22 out of 28 cases) contained non-anaphoric references to the antecedent. The results from this study (experiments 2 and 3 in O'Brien et al. 1988) showed that in the context of the focussing sentence there was a reading advantage even for the predominantly non-anaphoric NPs associated with the implicit restriction on the antecedent, but no such advantage occurred in its absence. I would suggest that this advantage comes from the way that explicitly focussing on the nature of the antecedent promotes this as a topic of inference but one whose content comes from the Implicit Focus constraint. In other words it is only when there is some form of explicit trigger in the text that the Implicit Focus constraint on inference content is realised.
4. Summary and
conclusions
I began this chapter by arguing that Focus theories are an attempt to formalise the way in which a reader's focus of attention shifts from character to character and scene to scene as a story unfolds. These shifts in attentional focus are not just an epiphenomenon of reading but seem to play an important role in controlling inference and establishing a coherent and cohesive mental representation of the story. The Sanford-Garrod focussing framework which I discussed revolves around two aspects of attentional focus, what we called Explicit and Implicit
Distinguishing between Explicit and Implicit Focus
15
Focus. Explicit Focus, as we envisage it, is similar in many ways to what Grosz called Focus, and Chafe Foregrounding. It keeps track of a limited set of discourse entities and acts as a kind of filter in selecting antecedents for pronouns and certain other anaphors such as demonstratives. Implicit Focus on the other hand is very different in that it keeps track of a limited amount of relevant background information about the situations in which the characters play their changing roles. To the extent that Implicit Focus affects anaphoric interpretation it is more in terms of giving a context to anchor the referents of definite descriptions. Table 4 shows the main differences between the two systems discussed in this Table 4: Differentiating Explicit from Implicit Focus
By Contents Explicit
Implicit
1. Contains foreground information which arises directly from the interpretation of the text. TOKENS
Contains background information which reflects general knowledge of the situations and types of entity portrayed. SCENARIOS
2. Specific Entity or Character individuating
2. Role and Type individuating
In terms of Referential Processing 1. Implicated mainly in the interpretation of Pronouns and demonstrative descriptions and in a somewhat different way Proper Names and certain Quantifiers.
1. Implicated mainly in the interpretation of Definite Descriptions.
By Psychological Properties 1. Limited capacity in terms of the number of distinctions that can be encoded at any time (e.g. Tokens)
1. Not capacity limited as such, (only limited by the logical compatibility of different scenarios)
16
S. Garrod
chapter. The top t w o sections of the table summarise the differences in relation to the content of the Focus systems and in relation to the referential processing consequences. However, the key difference which I have tried to concentrate on here concerns h o w the t w o systems are implicated in the control of text inference processes. I have argued that it is important to recognise a distinction between the topic of any inference and its content. Explicit Focus is then assumed to restrict the likely topic of inference at any time during reading whereas Implicit Focus imposes restrictions on inference content. In this w a y the two systems represent the attentional foreground and background of the reader and so impose rather different restrictions on what is going to be inferred.
References Barton, S. B. & Sanford, A. J. 1990 Failures to notice semantic anomalies in discourse: incompleteness of processing in the machinery of cohesion establishment. Paper presented at the Meeting of the European Cognitive Psychology Society. C o m o , Italy. Chafe, W. 1976 Givenness, contrastiveness, definiteness, subjects, topics and points of view. In C . N. Li (Ed.). Language comprehension and the acquisition of knowledge. Washington: Winston. Cotter, C. A. 1984 Inferring indirect objects in sentences: Some implications for the semantics of verbs. Language and Speech, 27, 1,25—45. Fraurud, K. 1990 Definiteness and the processing of N P s in natural discourse. Journal of Semantics, 7, 395^35. Garrod, S., O'Brien, E. J., Morris, R. J., & Rayner, Κ. 1990 Elaborative inferencing as an active or passive process .Journal of Experimental Psychology: Learning, Memory and Cognition, 16, 250-257. Garrod, S. & Sanford, A. J. 1982 Bridging inferences in the extended domain of reference. In A. Baddeley & J. Long (Eds.), Attention and performance IX. Hillsdale, N J : LEA, 331-346. Garrod, S. & Sanford, A. J. 1983 Topic dependent effects in language processing. In G. B. Flores D'Arcais & R. Jarvella (Eds.), The process of language comprehension. Chichester: John Wiley & Sons, 271-295. Garrod, S. & Sanford, A. J. 1983 Thematic subjecthood and cognitive constraints on discourse structure. Journal of Pragmatics, 12, 357-372. Grosz, B. 1977 The representation and use of focus in dialogue understanding. (Technical Note 15), SRI International Artificial Intelligence Center. O'Brien, E. J., Shank, D. M., Myers, J. L. & Rayner, Κ. 1988 Elaborative inferences during reading: D o they occur on-line? Journal of Experimental Psychology: Learning, Memory and Cognition, 14, 410-420. Sanford, A. J. & Garrod, S. 1981 Understanding written language: Explorations in comprehension beyond the sentence. Chichester: John Wiley & Sons.
Distinguishing between Explicit and Implicit Focus Sanford, A. J., 1988 Sanford, A. J., 1995 Weber, B. L. 1988
17
Moar, K. & Garrod, S. Proper names as controllers of focus. Language and Speech, 31, 1, 43-56. & Moxey, L. M. Notes on plural reference and the scenario-mapping principle in comprehension. In G. Rickheit & C. Habel (Eds.). Focus and coherence in discourse processing. Berlin: de Gruytcr (this volume). Discourse deixis and discourse processing. sity of Pennsylvania.
(Technical Report MS-CIS-88-75), Univer-
A. J. SANFORD and L. Μ. MOXEY
Notes on Plural Reference and the Scenario-Mapping Principle in Comprehension"' 1. The basic problem While much research into singular anaphora has been carried out in the disciplines of cognitive science, rather less work has been carried out on plural anaphora. A particularly tricky problem is the case of split antecedents, where two or more antecedents are introduced separately, but are subsequently referred to together by means of a plural pronoun. The problem is difficult, because as (1-4) show, singular and plural reference may sometimes be equally possible: (1) (2) (3) (4)
John and Mary took the train to town. He wanted to buy some new clothes. She wanted to buy some new clothes. They wanted to buy some new clothes.
The general problem is to determine under what conditions it is or is not possible to use plural and singular references. Not all examples allow both sorts of reference to be made with equal ease, as we shall see. Also, there is a specific problem of how plurals might get represented within the focussed memory theory of Sanford and Garrod (1981). We shall begin with a sketch of the scenario-mapping account of text comprehension.
1.1 Scenario-mapping
as a basis of
comprehension
Schema-based theories of text comprehension are based on the idea that a text serves to identify situational descriptions in memory of which the discourse is a partial description. New facts, derived from the discourse, are then interpreted with respect to the current situational description. If a suitable situational description is known, then new information merely adds to or modifies the information in the situational description. This continues until signals within the text lead to a new situational model being identified. Such theories have been suggested in AI (Schänk and Abelson's script account, 1977), and in psychology This research was initiated under ESRC research grant number R0002315592 awarded to the first author, and was stimulated by ZIF seminars at The University of Bielefeld organised by Gert Rickheit and Christopher Habel. We are grateful to J o y Aked, Maria Eschenbach, Kari Fraurud, Simon Garrod, Fiona Lockhart, Klaus Rehkämper, and Keith Stenning and for comments.
Notes on a Theory of Plural Reference
19
(van Dijk & Kintsch, 1983; Sanford & Garrod, 1981). The Sanford and Garrod version of the account terms the situational information scenarios, and it is this theory which will be discussed in some detail.1 Sanford and Garrod (1981; Garrod & Sanford, 1982; also Garrod, this volume) propose a comprehension system based upon a functional partitioning of memory. Scenarios are assumed to be datastructures in long-term memory (LTM) which when they become active through relevant discourse are said to be in implicit focus. The term focus is used to portray the essentially dynamic property of implicit focus, and the fact that it serves to provide a restricted (or focussed) background against which utterances are interpreted. Discourse itself is temporarily held in a short term store labelled explicit focus. This store is assumed to be able to hold a limited number of distinctions, or entities of interest. Thus, we might suppose the sentence John hit the girl as activating the scenario for hitting (a verb schema), and forming, in explicit focus, a structure as in figure 1(a), in which a token for John and a token for the girl are mapped into specific roles (in this case agent and patient roles) in a scenario. The scenario token is only intended to point to the appropriate part of LTM (explicit focus), not to represent all of the distinctions (properties) associated with it. In short, with the Sanford-Garrod framework, explicit focus contains tokens of explicitly mentioned entities with mapping relations onto scenario tokens. No further justification for this part of the theory will be given here; they are given in Sanford and Garrod (1981) and Garrod (this volume). Now consider a second sentence added to the situation depicted in figure 1(a): Then he ran a way. This calls up another scenario in which X is running away, and maps the token for John into the role-slot for X in the new scenario, adding the structures indicated in figure 1(b). So processing continues, with old distinctions in explicit focus fading as new information is added. Sanford and Garrod (1981; Garrod and Sanford, 1985, 1988) claimed that (unstressed) pronouns serve to recover tokens of entities in explicit focus, while fuller noun phrases tend to identify structures in implicit focus. Although pronouns do not exclusively recover tokens in explicit focus, they are tailored for doing this. Furthermore, tokens are most likely to occur in explicit focus through an entity being explicitly mentioned, though again, not exclusively through this means (cf. Yule, 1982; Sanford, Garrod, Lucas & Henderson, 1983; Bosch, 1988). Rehkämper (personal communication) has suggested that these details produce a problem for the Sanford-Garrod theory in that it is difficult to see what is explicitly introduced in split antecedents introducing two individuals, like John and Mary. Are they separate individuals who have been introduced, or are 1
Although we have presented this section in terms suggesting schemas to be conventional datastructures, this is not a presupposition which we are making. Problems of inflexibility and modularity inherent in older views of schemata are being overcome through treatment in parallel distributed processing (e.g., Rumelhart, Smolensky, McClelland, & Hinton, 1986).
20
A. J. Sanford, L. Μ. Moxey
Explicit
Focus
Structure
Circles denote tokens for either individuals (left) or states or scenarios (right). Left and right tokens are linked by role relations. Scenartio and state tokens can be thought of as pointers to information in inplicit focus. Thus the scenario 1 token points to a scenario for hitting. Similarly, the "meaning" of being John, or felamle, can only be understood by recourse to implicit focus. Figure 1
they a plural complex (or both)? A s Rehkämper notes, the problem is not trivial because sometimes singular anaphors only, sometimes plurals only, and sometimes both, will be felicitous in the face of an antecedent. The solution to this problem which we would like to offer here relies upon two mechanisms: distinction through role-mapping, and constraints on focus. The remainder of the paper will elaborate upon these two ideas.
2. Individuation, plurality, and role-mapping In the world at large there are situations where it makes sense to distinguish individuals through the roles which they play, and other situations where a distinction would not be useful. For example, it makes sense to distinguish between the individuals in an interaction where someone (an agent) hits someone else (a patient). However, when we wish to assert that two people make a winning team in a doubles game, we do not wish to differentiate them for many purposes; rather, we think of them as playing a common role in a situation. In any model of long-term memory where information is organised along situ-
N o t e s on a Theory of Plural Reference
21
ationally-determined lines, there will be structures which tend to put individuals into a common role, and those which tend to put them into different roles, simply because for practical purposes sometimes it matters to distinguish between or amongst them, and sometimes it does not. Sometimes individuals are treated as undifferentiated from the outset, being labelled by a set-denoting name, like the class or the the partners or the House of Lords. In these cases, it is scarcely surprising that we do not wish to differentiate amongst individuals. Indeed, these specify sets which may have unknown numbers of elements. 2.1 Differentiation
and
discourse
In a discourse setting, similar considerations of differentiation apply. There is no point in differentiating amongst things which we want to think of in the same way: we treat them as a group of identical elements, to all intents and purposes. This lays the foundation for our first and main claim that plural pronominal reference is possible to more than one independently introduced individual if and only if those individuals play a common role in a scenario elicited by the discourse. Let us reflect briefly upon what this might mean. It might mean a representation at the level of role-mappings in the sense of thematic roles. So if John hits Mary, then John Agent and Mary —> patient, and plural anaphora is not possible. But such a view would be overrestrictive. The psychological impact of utterances like this is to lead to questions like why John would do such a thing, and one answer might be that they often fight. If this possibility is realised as a mental structure, then both Mary and John are mapped into the same role of often fighting. On that basis, plural anaphora is possible. In this way, rather than restrict the role-mapping notion to structures close to the sentences of a discourse, it may be necessary to include more elaborated representations, which we shall term "inferential fields". In psychology, the extent of such elaborative inference-making is a matter of debate and ongoing enquiry (see, e.g. Sanford, 1990). For the present, our claim is that some utterances will produce relatively rich inferential fields, and some relatively poor ones. The plausibility of this point of view can be illustrated by comparing (5) and (6): (5) John kissed Mary. (6) John kissed Mary in the woodshed. We would argue that because of the stereotype attached to the location in the woodshed, in the context of a man and woman being there, (6) invites a more substantial inferential field than (5). In general, we would argue that some sentences for reasons of experience will produce rich inferential fields in which individuals play multiple roles. So when evaluating the common role-mapping condition for plurals, it is this richer inferential field notion which we have in mind. Let us take as a concrete example possible inferential activity in the wake of the message:
22
A. J. Sanford, L. Μ. Moxey
(7) Harry met Mary at the bar before the Opera. Some of the possible inferences to which this might give rise are indicated in table 1.
Table 1: Possible inference field resulting f r o m encountering " H a r r y met M a r y at the bar b e f o r e the Opera"
Consequences of labels used: 1. H a r r y is called " H a r r y " (role 1) H a r r y is a male (role 2) Mary is called " M a r y " (role 3) Mary is a female (role 4) (Differentiates Harry and Mary) Basic action using agent/patient syntax: 2. H a r r y (role 5): agent in meeting Mary (role 6): patient in meeting (is met) (Differentiates Harry and Mary) Result of basic action occurring: 3. Harry, Mary (role 7): in same place for some reason (Harry and Mary undifferentiated) Reason for action?: 4. Harry, Mary (role 8): Joint plan to go the opera? (Harry and Mary undifferentiated) Bold letters denote tokens. Some of these possible mappings have Harry and Mary mapped into common roles, and some have them in separate roles. Such a situation means that it is possible to refer to these two characters with respect to a common role-mapping (plural), or separately (singular). Furthermore, we might even conjecture that the relative dominance of each of the role-mappings corresponds to the degree to which those mappings are foregrounded (Chafe, 1972; Sanford & Garrod, 1981), so although both singular and plural reference patterns may be possible, they are not necessarily equally easily carried out. Our second principle, then, is that singular and plural reference patterns may be simultaneously licensed without requiring extra "plural" tokens in explicit focus: this is made possible by mapping relations into preexisting structures.
Notes on a Theory of Plural Reference
3. Some constraints on plural
23
reference
Having indicated how a mapping theory might license plural and singular "representations", we are in a position to see what is needed to explain observed and new constraints on when plural reference is possible. O n e approach to this set of problems has been produced by Eschenbach, Habel, Herweg, and Rehkämper (1989). They cast the problem as one of discovering the constraints under which a complex reference object (complex RefO) is formed or blocked from forming. A complex RefO is assumed to result from the combination of two or more atomic reference objects through a process described as sum-formation. Thus, the expression Bill and Ben, containing atomic reference objects Bill and Ben may produce a third object, Bill + Ben. Eschenbach et al. note that some syntactic forms, like complex noun phrases using the conjunction and, seem to be candidates for licensing complex refO formation.
3.1 Method of conjoining Consider the following case, in which two individuals are introduced: (8) Mr. Smith dictated very rapidly. (9) The secretary had difficulty keeping up. While there is nothing to prevent a plural continuation (e.g., They had been working hard for hours), the incidence of such continuations tends to be rather low. In a study by Sanford, Moar, and Garrod (1987), the average incidence of a plural continuation was less than 5 % . We might conclude that the most likely content of continuations was not compatible with having the two individuals play a common role. Rather, they were kept differentiated. However, there are constructions for introducing individuals which do indeed seem to allow higher incidences of plural continuations. Thus Sanford and Lockhart (1990) showed that a sentence like (10) led to more continuations using plural pronouns than did a version like (11): (10) (11)
John and Mary went to the cinema. John went to the cinema with Mary.
It is apparent that these sentences are not mere syntactic variants, but admit to possible different semantic interpretations, with (11) allowing separate roles for John and Mary, which is consistent with the scenario-mapping theory (if trivially so). However, even in the and condition, with materials designed not to exclude common role-mapping interpretations, the proportion of spontaneously generated plural references never exceeded 5 0 % , so there is nothing necessary about plurals; they are merely licensed. Other work by Hielscher and Müsseier (1990) compared the continuations produced when individuals were conjoined by a variety of connective devices. These were and, as well as, neither... nor, with, without, and instead of. The continuations to each of the
24
A. J. Sanford, L. Μ. Moxey
sentences were restricted in that they had to begin with the the pronoun Sie, which has the interesting property of being either a plural or a singular pronoun, depending on the inflection of the verb which it precedes. Continuations were thus classifiable as singular or plural. A very high number of continuations with plural interpretations of Sie occurred for all but with, without, and instead of (with being an intermediate case), the latter producing mainly singular interpretations. Within the present framework, we would argue that without is a connective that tends to differentiate individuals, while instead of assumes a high level of differentiation.
3.2 Ontological similarity and CABs The major requirement for the formation of a complex object, according to Eschenbach et al. is that the atomic elements have a common association base, or C A B (following work by Lang, 1984). An example of a feature which leads to a C A B is if the atomic objects are of the same "ontological type". Eschenbach et al. do not define ontological type in their paper, but simply illustrate it by saying that two humans are more similar than a man and an animal, so are more likely to lead to a C A B and hence to a complex RefO. They argue that it is difficult to see how a plural pronoun could be used to refer to two things as different as Harry and his frisbee, as in: (12)
Harry took his frisbee to the park.
O n the face of it, this does appear to be a real constraint. However, on the scenario-mapping account, the question is not really about ontological sameness. Sentence (12) describes Harry and his frisbee in a typical situation where they both might typically be mentioned. But, typically, one would not want to treat them as undifferentiated: sterotypically, they will play totally different roles. However, it clearly is possible to refer to them in an undifferentiated fashion, where they occupy a common role-slot in a model. Indeed, in the present paragraph, there turn out to be five instances in which Harry and his frisbee are referred to using a plural pronoun, with no resultant strain. So in Eschenbach et al. s terminology atomic elements have a C A B if they can occupy a common role slot in a model. Indeed in our view the Common Association Base which Eschenbach et al. require to form a complex object is easily explained within a scenario-mapping account. O n our formulation, the ontological type constraint thus arises because "similar" things easily play common roles in normally-encountered scenarios. Thus it is easier to put Jack and Jill in a common role than Jack and a dog, or Harry and his frisbee. But there is nothing intrinsic about the difference between Harry and frisbee, it is just that it is very difficult to find a situation where one would wish them to be undifferentiated. Consider the following examples which have been used in relation to the present problem:
Notes on a Theory of Plural Reference
25
(13) I am very interested in Nelson Goodman and his theory. (13') They are both short and to the point. (Source: Klaus Rehkämper) (14) Mary likes her new boyfriend and her job too. (14') They are both important in her life. (Source: Kari Fraurud) (15) Harry polished his car lovingly. (15') They had had many adventures together.
(Source: Kari Fraurud)
What these examples show is that with enough imagination, even things which are ontologically quite different can be referred to by a plural pronoun. H o w ever, the problem becomes much more acute when the number of entities which we wish to consider as antecedents goes up. Thus, it is hard to see how to combine an assortment of individuals like a tin whistle, ghostly shrieks, chewing gum, and the hypothetical king of France. They are just too different. But the last sentence shows how in a natural text even these can be captured by a plural reference! The only common role they can play is to form a set of different things. It is just a variant on possible answers to the question "what have x, y, and ζ got in common?". On the scenario-mapping perspective, plural reference is not possible if a common role-mapping cannot be found, and of course, some sets of individuals are unlikely ever to play a common role in reality or in fantasy, and will not have a memory representation in which they do either. The final example given above, which holds for any size of set with any variety of elements, is a reductio showing that rather than thinking of plural reference as ever being impossible because of ontological differences in atomic individuals alone, perhaps we should think in terms of it becoming increasingly difficult until it is virtually impossible. Thus the problem is one of there being a vanishingly small probability of finding a scenario with a common role-slot as differences in type and number of different elements increases. It is apparent that the scenario mapping theory has no direct, explicit need of the ontological type constraint. Indeed, when we ask what ontological types might be, the scenario-mapping theory might provide an explanation: ontological similarity is defined by the relative density of scenarios in which a common role for the two elements in question is actually possible. Since scenario structure is based on representing real-world interactions, such a solution is not arbitrary. The alternative way of dealing with ontological similarity as a constraint on complex formation would seem to require an explicitly accessible metric for such similarity. The scenario-mapping account does not need such explicit machinery. At the time of writing, there is no continuation data with which to scrutinise the empirical effect of different types on continuation. As we shall argue later, the results of such experiments will likely be a function of the interaction be-
26
A. J. Sanford, L. Μ. Moxey
tween atomic individuals in any case. But there is some data by Sanford and Lockhart (1990) which has a bearing on the issue. These authors carried out continuation tasks using materials in which two atomic referents were people who could be described by either a noun-phrase or a proper name, as in: (16) Mr. Bloggs/The boss was dictating a letter. (16') Mary/The secretary was taking shorthand. Sanford and Lockhart found that if both referents were names, or both were roles, then the likelihood of a plural continuation was higher than when mixtures were used. So similar description seemed to be a variable in determining plural reference. However, it is arguable that this is because a proper name serves as a cue to differentiate the individual so introduced, specifically as a main character (see Sanford, Moar & Garrod, 1988, for details of this idea), thus serving as a constraint against structures in which there is a common role-mapping. Indeed, it tended to be the proper named individual who was referred to in the singular continuations. Different descriptions may therefore influence the likelihood of plural reference, even when the stated interaction is carefully controlled. Furthermore, it appears that the mixed description effect occurs precisely because of the processor detecting that a proper name has a special status, and using this as a constraint in determining focus. Although capitalising on the fact that main and secondary characters usually play different roles in scenarios, the differentiation is not a result of the mapping process itself (Sanford & Lockhart, 1990). It is clearly of interest to determine whether other description-type differences influence the tendency to form plural or singular representations, but it should be noted that this question is different from the broad notion of ontological types. Finally, it should be noted that many other questions about what favours plurals can be readily understood in terms of common role-mapping. For instance, the idea that two things which are together in space are more likely to be referred to together (Müsseier, Hielscher & Rickheit, this volume). We doubt whether simple Euclidean space would have anything to do with this: rather, the notion of close proximity would seem to be conditional on situations (for instance, a 'plane near the airport', and a 'mouse near the mousetrap') and control relations (see, e.g., Herskovits, 1986; Garrod & Sanford, 1988). Although there may be a correlation between proximity descriptions and plurals, it may well turn out to have more to do with common role opportunities than Euclidean d i s t a n c e s e . 3.3 Asserted plurals and constraints
on
anaphora
The discussion to date has been concerned with the development of mappings which are the result of introducing individuals as singular entities. The situation becomes more complex when one turns to the introduction of plural pronoun anaphors through assertion. Within the terms of the scenario-mapping theory, it should be entirely possible to make a plural reference to a split antecedent if
Notes on a Theory of Plural Reference
27
the sentence introducing it puts the individuals in the same role, even if there was no existing representation of this sort. However, it is more complicated than this, because the result may not be felicitous. Many of the examples cited by Rehkämper and others to show the impossibility or difficulty of plural reference may be analysed within such a framework as this. Let us consider, for example, the following case, suggested by Kari Fraurud (personal communication) as demonstrating the extreme awkwardness of plural reference with two particular ontologically different antecedents: (17) In the garden, I saw a young girl kicking a tree. ?(1 7') I looked at them for a while. The difficulty here is that them does not seem to sound right. But first note that there is no question about what it is to which them refers. So reference of this sort is possible, it is just grossly infelicitous. We believe that to understand this and many other tricky examples, it is necessary to consider more carefully some aspects of text cohesion. Consider first a case where there is a strongly foregrounded plural mapping on the basis of a first sentence, and then a sentence containing a plural anaphor and an action consistent with the scenario is presented. Integration is easily achieved, so we would expect no problem with the plural anaphor. Now suppose instead that a singular reference was made, and that a scenario with the singular mapping was absent from the prior representation, or only weakly foregrounded. In this case, we might expect problems. It is a simple matter to argue for all possible combinations in this way. Let us now turn to some of Rehkämper's examples which illustrate various interesting problems. (18)
Fiona went to meet John and Lucy at the station.
His observation was that although John is mentioned explicitly, this character does not seem to be available for plural reference, since it is infelicitous to follow (18) with (19), although the referent of he is quite clear: ?(19) He had moved to London recently. We think this sounds peculiar because in (18) we have put John and Lucy in a common role, and to differentiate John specifically with respect to where he lives is not a coherent thing to do after the common role assertion. For such a differentiation to make sense, it is necessary to have some purpose for doing it within the standing framework offered by the text. For instance, the following is quite reasonable in comparison to (19): (18) (20)
Fiona went to meet John and Lucy at the station. When the train arrived, he was in the first carriage and she was in the back one.
Here, the differentiation admits to being new (since both John and Lucy are
28
A. J. Sanford, L. Μ. Moxey
mentioned in a coordinated fashion), and it makes sense, since it invites development: there must be a reason for them being separated. The possibility of differentiation, and its dependence on scenario-development, is shown clearly in the following variation: (21) John and Lucy went to meet Fiona at the station. ?(22) He had recently moved to London. John may have moved to London, but the only way this can have any real interpretation in the present context is that it has something to do with why he was at the station. But within that framework, there are no grounds for differentiating John and Lucy, so the utterance is strained in terms of cohesion. Rather, it only makes sense to say: (23)
They had recently moved to London.
However, it is not singling/oAw out per se which is the problem, as shown in the following: (21) John and Lucy went to meet Fiona at the station. (24) He parked the car near the concourse. What differentiates (22) from the more acceptable (24)? John taking the car to the car park is a perfectly acceptable differential highlightling of John within the general setting of meeting someone at the station. Only one person can drive, and parking is coherent with respect to the prior scenario. These cases show that with a little thought, it is possible to produce singular and plural references to antecedents which depend for their acceptability on the content of the second sentence, specifically how it relates to the first sentence. But can this argument handle Fraurud s example (17)? We believe that it can: the essential problem is to find a scenario in which the girl and the tree could be placed in a common role slot. This seems to be remarkably difficult on the basis of the asserted situation (watching girl kick tree). But this is not because of anything intrinsic about girls and trees. A scenario in which they might be allocated a common role slot with some future utility is one where the on-looker is surmising about the situation in a rather detached manner: (17) (25)
In the garden, I saw a young girl kicking a tree. I suspected that they were both in some way victims of life's cruel blows, but my sympathies rested firmly with the tree.
Another continuation might be: (26)
I could sense the frustration, their relative sizes making it quite clear which of them would win.
In (25) the on-looker makes a comparison between the tree's misfortune and the misfortune which causes the girl's action. Note that the latter is founded in the
Notes on a Theory of Plural Reference
29
'kicking' scenario, and that the comparison de-humanises the girl to draw attention to the on-looker's pity for the tree. In (26) the on-looker is again detached. He uses the girl's aggression as a trigger for a 'fight' scenario in which she and the tree play the role of the fighters. As Fraurud pointed out (personal communication), in the original example, one might suppose that me seeing a girl kicking a tree already puts them into a common role slot as part of the representation, thus licensing the use of they, in terms of the current theory. This is true, but although they share a common role-mapping, they will also be differentiated with respect to the action, and this may well be the dominant set of relationships. As has just been demonstrated, the anaphor-bearing continuation has to be consistent with a dominant scenario that makes it reasonable to consider the individuals together in the future. This is what happens in the case of the misfortune scenario and in the case of the fight scenario, but not in the simple watching setting. 3.4 Other constraints on plurals As Eschenbach et al. point out, the number of feasible groupings of atomic objects for plural referents seems to be in practice smaller than the mathematical maximum, and to be somewhat systematic. In examples (22-24) there is a differentiation between pairings and individuals. According to Eschenbach et al., just how pronominal attachment works is dependent on two principles: the principle of permanence and the principle of maximality. Permanence is expressed in the following way: Unless a text explicitly requires it, do not link a pronoun to a proper sub-RefO of a complex RefO in focus. Reference to a sub RefO is only possible if it was introduced into the discourse model by a previous inference. The other principle, maximality, is expressed in the following way: The plural anaphoric pronoun should be linked to the maximal sum of the appropriate RefO's with respect to a CAB unless the text contains evidence to the contrary. The idea behind permanence seems to be that once a complex object is formed, it is not to be further subpartitioned without some form of special cueing from the text (this is unspecified, however). Behind maximality is the idea if elements are bound by a common CAB, do not refer to subsets of these without special cueing. These principles seem reasonable when related to the scenario-mapping theory. Assuming that a CAB is actually a common role, then the maximality principle makes sense. If two or more individuals are mapped into a common role, then we would not wish to distinguish them except by some special process. From a processing point of view, we would want to claim that starting from role-slot pointers, all atomic referents (tokens) having a common role slot are recovered as constituting the referent of a plural pronoun (equivalent to maximality). To pick out one element, it should be necessary to use a differentiating mapping (such as the proper name of one of the elements). Consider the following case: (27) Charles and Victoria saw Andrew and Lucy talking at the station.
30
A. J. Sanford, L. Μ. Moxey
(27') None of them liked Lucy, so they couldn't understand why he was bothering to talk to her. {them —» C + V + A; they C + V; he —> A; her L} Simple examples can be used to show that (27) lets plural reference be made readily to C+V, A + L , and C + V + A + L . B y differentiating Lucy through use of a proper name, it becomes possible to use the pronoun her to refer to her. This example also serves to show how it is possible to refer to three atomic elements originally mapped two into one common role, and one (with another element) into another. This occurs only because of the way Lucy is differentiated by use of her proper name, in a scenario where it makes sense to differentiate her in this way. In our way of thinking, the principles of permanence and maxim a l l y seem to correspond to the retrieval mechanism for plurals which we are suggesting, and the question of what constitutes a special text feature calling for refocussing and remapping remains to be established, though either a proper name or situational differentiation would seem to be good places to start looking for such a mechanism.
4. Conclusions and implications We have attempted to show how the idea of mapping tokens for descriptions of individuals into scenarios might be used as a basis for understanding the constraints on the occurrence of plural reference. The main motivation behind this is that the scenario-mapping theory is intended to represent a coherent and broad account of fundamental processes of interpretation which is useful in several domains (Sanford, 1987). Elsewhere, a similar basis has been suggested for the interpretation of spatial expressions (Garrod & Sanford, 1988), and is being developed as an account of quantification and measurement properties of aspects of situations (Moxey and Sanford, 1993, (a) and (b)). In exploring the usefulness of the scenario theory for plurals, we have tried to address the problems raised by Rehkämper (personal communication), which vary from concerns that the theory is in principle unable to treat plurals, to more general issues about what is and is not acceptable plural anaphora. This stimulus has led us to postulate a set of principles which are summarised below. First, we assume that plural reference is possible if either a plural is asserted, or two or more singular entities map into a common role-slot in a scenario. These two things are structurally different, in that the elements of the former enjoy no individuation, while those of the latter do (even if only by the labels through which they were introduced). Notice that the procedure for finding a plural representation in the former case is through a plural tag, and in the latter case through a discovery of role pointers having the same role slot but different tokens. It is possible that individuals may have mappings into more than one scenario, in which case, some may enable plural reference and some singular.
N o t e s o n a T h e o r y of Plural Reference
31
So, a major component of our view is that of a many—»one mapping between "tokens" and role-slots. This way of looking at things raises many questions. For instance, while it is possible to understand how two people can play the same role in a representation of an action like "having a meal", we at once appreciate that two people having a meal implies that they put different tokens of (possibly) the same food-type into their mouths. It is likely therefore that scenarios for having a meal (singular) and having a meal (plural) are not quite the same. N o w this is reasonable, since they are in fact very different situations. Some verbs are likely to induce plural interpretations at the mapping stage. For instance, X talked to Y, although essentially an actor-patient split, is likely to invoke a common-actor scenario for the conversation act. The mechanism by which ontological status plays a role in the account is through restrictions on the actions which may be described and the scenarios which may be invoked in which the atomic individuals in question could possibly play a common role. Thus the impact of ontological status on the ease of plural reference is a function of the scenario underlying the interaction depicted, and cannot be determined from the atomic individuals themselves. In our account, ontological status is secondary rather than primary. The greatest " o n tological difference" is where the probability of finding a c o m m o n role structure is lowest. We believe this account to be compatible with the view that representations stored in long-term memory reflect useful distinctions made about the world. In discourse, we may introduce individuals that may at some time have to be differentiated, but at the moment might be treated together. This can be achieved by introducing the individuals with distinct labels, while at the same time cueing the common role aspect by using a suitable coordinating term (like and, neither... nor..., and so on). However, there is no sense in which these exclude differentiation made possible through the individual labels. 2 In a discourse, a plural reference may be put forward for interpretation, and sometimes this is acceptable, and sometimes not. Going through a variety of examples led us to suggest that making an assertion which puts two or more individuals into the same sentence-based case-role is not enough. It is necessary that the assertion being made is coherent with w h a t was already said, and that it maintains distinctions already made between individuals, or else motivates c o m m o n role-mapping. This principle, or something like it, is required for many of the contrasts discussed above. So, in making claims about what it is that constrains plural reference, it is not just individuals and their ontological status, and it is not just the way they interact. It is also the coherence of subsequent discourse including a reason for changing the differentiation of individuals so as to form a coherent message. This, clearly, has computational implica2
G a r r o d and S a n f o r d (1982) s h o w e d that in s o m e circumstances, differentiation m a y be sufficiently difficult that a p r o p e r name might have to be used to felicitously cue one of a pair, however.
32
A. J. Sanford, L. Μ. Moxey
tions and has implications for the interpretation of psychological studies in which the prevalence of plural or singular representations is assessed by anaphoric reference probe. In terms of earlier accounts of pronoun reference resolution produced by Sanford and Garrod (1981; Garrod & Sanford, 1982), we should make the following points. First, we are suggesting a scheme in which plural reference resolution does not depend upon the existence of plural tokens (except for conceptual plurals like "the class" which we do not treat here). Rather, it depends upon the mapping of differentiable individuals into common role-slots. Mapping plural pronouns onto these representations depends upon finding a bunch of pointers terminating on a common role-slot. The entire bunch identifies the referent (maximality). With clearly defined focus, there is no reason why this should not be an immediate process, but unless the new or developed scenario is a good development of the current state of differentiation, there may well be an unsatisfactory (though correct) assignment, as in (17) above. This means that the main task for the future is to say what kind of system accepts a particular assertion as "satisfactory". From a computational standpoint, a working system based on the principles described here would rely centrally on scenario-mapping, and so require an implementation with scenarios in LTM, and for utterence-to-scenario-mapping machinery to be in place. In addition, it relies upon some calculation of the coherence of successive statements in which references to individuals are plural or singular, and this in turn relies upon the utilisation of situational knowledge. Without such coherence-establishing machinery, it is still possible to make intelligible plural references which may simply be felicitous, but it is not possible to guarantee felicity. This heavy reliance not only upon background knowledge, but also upon computationally-expensive coherence operations may appear to be a disadvantage over an account which operates at a more superficially linguistic level. For example, it may be possible to get part-way with the kinds of principles being suggested by Eschenbach et al., although more implementational details are needed there too. However, the ultimate utility of the present type of approach depends upon what else it buys. We believe that the present approach has a certain psychological viability, which would be one point in its favour! More specifically, scenario-mapping as a starting point has the attraction of providing a potential basis for the explanation of many phenomena, including situation-based anaphora (Garrod & Sanford, 1983), conditional reasoning (e.g. Cox & Griggs, 1982), judgement by representativeness (Kahneman, Slovic & Tversky, 1982) and other aspects of problem solving (Sanford, 1987). Returning to the present problem, we are attempting to characterise aspects of quantification and vague expressions within the same framework. Although problematic in many ways, the scenario-mapping approach may provide an ultimately unifying model within which some very messy linguistic and cognitive phenomena may become simpler.
Notes on a Theory of Plural Reference
33
References Bosch, P. 1988
Representing and accessing focussed referents. Language and Cognitive Processes, 3, 207-231. Cox, J. R. & Griggs, R. A. 1982 The effects of experience on performance in Wason's selection task. Memory and Cognition, 10, 496-502. Chafe, W. 1972 Discourse structure and human knowledge. In J. B. Carroll & R. O . Freedle (Eds.), Language comprehension and the acquisition of knowledge. Washington: Winston. Eschenbach, C. C., Habel, C., Herweg, Μ. & Rehkämper, Κ. 1989 Remarks on plural anaphora. Proceedings of the Fourth Conference of the European Chapter of the Association for Computational Linguistics. 161-167, Manchester. Garrod, S. C. 1995 Distingnishing between explicit and implicit focus during text comprehension in G. Rickheit & C. Habel (Eds.). Focus and coherence in discourse processing. Berlin: de Gruyter (this volume). Garrod, S. C. & Sanford, A. J. 1982 The mental representation of discourse in a focussed memory system: implications for the interpretation of anaphoric noun phrases. Journal of Semantics, 1, 21—41. Garrod, S. C. & Sanford, A. J. 1983 Topic-dependent effects in language processing. In G. B. Flores D'Arcais & R. J. Jarvella (Eds.) Processes and language understanding. Chichester: John Wiley & Sons. Garrod, S. C. & Sanford, A. J. 1985 O n the real-time character of interpretation during reading. Language and Cognitive Processes, 1,43-61. Garrod, S. C. & Sanford, A. J. 1988 Discourse models as interfaces between language and the spatial world. Journal of Semantics, 6, 147-160. Herskovits, A. 1986 Language and spatial cognition. Cambridge: Cambridge University Press. Hielscher, M. & Müsseier, J. 1990 Anaphoric resolution of singular and plural pronouns: The reference to persons being introduced by different coordinating structures. Journal of Semantics, 7, 347364. Kahneman, D., Slovic, P. & Tversky, A. 1982 Judgement under uncertainty: Heuristics and biases. Cambridge: Cambridge University Press. Moxey, L. M. & Sanford, A. J. 1993 Communicating Quantities: A Psychological Perspective, Lawrence Erlbaum Associates, UK. Moxey, L. M. & Sanford, A. J. 1993 Prior expectation and the interpretation of natural language quantifiers. European Journal of Cognitive Psychology, 5, 73-91. Müsseier, J., Hielscher, M. & Rickheit, G. 1995 Focussing in spatial mental models. In G. Rickheit & C. Habel (Eds.), Focus and coherence in discourse processing. Berlin: de Gruyter (this volume). Rumelhart, D. E., Smolensky, P., McClelland, J. L. & Hinton, G. E. 1986 Schemata and sequential thought processes in PDP models. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing, 2. Cambridge, MA: M I T Press.
34
A. J. Sanford, L. Μ. Moxey
Sanford, A. J. 1987 The mind of man. Brighton: Harvester Press. Sanford, A. J. 1990 On the nature of text-driven inference. In D. Balota, G. Flores D'Arcais 8c K. Rayner (Eds.), Comprehension processes in reading. Hillsdale, NJ: Erlbaum. Sanford, A. J. 8c Garrod, S. C. 1981 Understanding written language. Chichester: John Wiley & Sons. Sanford, A. J., Garrod, S. C., Lucas, A. & Henderson, R. J. 1993 Pronouns without antecedents. Journal of Semantics, 2, 303-318. Sanford, A. J. & Lockhart, F. 1990 Description types and method of conjoining as factors influencing plural anaphora: A continuation study of focus .Journal of Semantics, 7, 365-378. Sanford, A. J., Moar, Κ. Sc Garrod, S. C. 1988 Proper names as controllers of discourse focus. Language and Speech, 31, 43-56. Schänk, R. &c Abelson, R. 1977 Scripts, plans, goals and understanding. Hillsdale, NJ: Erlbaum. Yule, G. 1982 Interpreting anaphora without identifying reference. Journal of Semantics, 1, 315322.
J O C H E N MÜSSELER, MARTINA HIELSCHER and GERT RICKHEIT
Focussing in Spatial Mental Models
1. Introduction While reading a text people develop a mental representation of it that does not have much in common with the linguistic surface structure. Instead, it can be seen as a constructive representation resulting from the interaction between the text and the reader's linguistic and pragmatic abilities as well as his worldknowledge. In theories of text processing this view is described as the construction of a mental model (e.g. Johnson-Laird, 1983, 1989; Garrod Sc Sanford, 1988a). Such a model can be regarded as being a surrogate of the world being described and as consisting of concepts, that represent the entities of the real world and their relations. During text reception the model is continuously updated: every piece of new information modifies and - as the case may be reorganizes the model established so far and guides referential and inferential processes itself. It is assumed that cognitive structures within a mental model more or less reflect the underlying physical structures. The topological structures of space and time are modelled in the mental representation. In this respect spatial models represent the spatial relations between the entities of the real world and their qualities. In this context the central question is: To what extent does this spatial representation have a quasi-pictoral quality and does the mental metric correspond to the physical one? If so, the concept of a mental model can be seen here as a cognitive map (McNamara, 1986; Wender & Wagener, 1990). The assumption of an analogue format of representation is, for example, deduced from the so-called symbolic distance effect (Shepard, 1984): the time needed for a mental comparison between two objects seems to depend on the distance of the objects on the corresponding dimension. This effect was also found to be induced by language (Bower, 1981). Taylor and Tversky (1992) recently concluded that readers construct a similar spatial model irrespective of receiving a verbal description or a map from a spatial scene. Moreover, other empirical results show that the spatial structure of a current mental model influences text comprehension and production (e.g. Sanford & Garrod, 1981; Anderson, Garrod & Sanford, 1983; Oakhill & Johnson-Laird, 1984; Oakhill & Garnham, 1985; Wagener & Wender, 1985; Morrow, Greenspan, & Bower, 1987; Garrod & Sanford, 1988a; Morrow, Bower, & Greenspan, 1989; Byrne &
36
J. Müsseier, Μ . Hielscher, G. Rickheit
Johnson-Laird, 1989; Bower & Morrow, 1990; Morrow, 1990; Herrmann & Grabowski, 1994, Chapter 3; Rinck & Bower, 1994). These spatial relations can be proved by the difference in the accessibility oifocussed objects and persons in such a mental model. In psycholinguistics 'focus' is conceptualized as a part of the mental discourse representation which is limited in size and accessible for referential consultation. 1 It results from the text comprehension process and is subject to continual adjustment on the part of the receiver (Sidner, 1983a, b). Various related concepts of focussing are, for example, the discourse pointer described by Carpenter and Just (1977), the buffer (working memory) of Kintsch and van Dijk (1978; also see van Dijk & Kintsch, 1983; Glanzer, Dorfman, & Kaplan, 1981; Monsell, 1984), the focus theory of Sidner and Grosz (Sidner, 1983a, b; Grosz & Sidner, 1986) or the explicit focus of Sanford and Garrod (1981; Garrod & Sanford, 1982, 1985; Sanford, Garrod, Lucas, & Henderson, 1983). All of these concepts assume a specialized reference domain of higher activation and preference for access within the constructed text representation. This contribution considers mental models under the aspect of focussing by making use of the assumed spatial relations within a mental model. In line with our own research program, we will proceed as follows: first, we will ask whether the accessibility in a momentarily existing spatial model facilitates the processing of the relevant text information. Then we will turn to the question whether and - if yes - when objects are grouped within the mental representation. Finally, inferences in spatial models are discussed, that allow for conclusions about spatial relations not explicitly mentioned.
2. Accessibility of text concepts in spatial mental models To integrate just received text information into an existing mental model, links are needed which could, for example, be provided by the spatial structure. It was found that spatial classifications of objects as well as anaphoric (coreference) processes were easier when the object in question was already part of the spatially associated mental model (Glenberg, Meyer, & Lindem, 1987; Morrow, Greenspan, & Bower, 1987; Rinck & Bower, 1994). The spatial structure seems to be one component that was able to produce focussing or foregrounding. In a study by Morrow, Greenspan, and Bower (1987, cf. also Wilson, Rinck, McNamara, Bower, & Morrow, 1993) subjects first had to memorize a layout of a building (a diagram of a research center) with objects in different rooms. Then they read narratives describing the protagonist s tour around the building in 1
This use of the term 'focus' differs f r o m the linguistic one. Hajicova and Sgall (1984) and others try to isolate the focus on the text surface. In contrast to the definition given above they understand 'focus' as new information that cannot be derived f r o m the preceding discourse context and which explicates the topic, i.e. the subject of a discourse passage.
Spatial Mental Models
37
order to accomplish a task (e.g. 'Joan made sure everything was arranged for her presentation... Then she walked from the laboratory to the library...'). To obtain a measure for accessibility of the critical objects, subjects had to decide, whether the objects were present in the room as mentioned in the last sentence or not. Results showed an advantage for access to objects from the room mentioned in the last sentence as opposed to objects from other rooms in the building. These findings confirm the assumption that the focus reflects spatial relations of the described situation: "Protagonist location and the temporal development of the protagonist's actions help organize understanding by governing the accessibility of information from the developing situation model" (Morrow, Greenspan, & Bower, 1987, p. 185). In another study Glenberg, Meyer, and Lindem (1987) presented short passages like the following: Example 1: 1 setting 2' associated 2"
dissociated
3
filler
John was preparing for a marathon in August. After doing a few warm-up exercises, he put on his sweatshirt and went on jogging. After doing a few warm-up exercises, he took off his sweatshirt and went on jogging. He jogged...
The first sentence was a setting sentence followed by a critical associated sentence or a dissociated sentence. They differ only with respect to the protagonist 'putting on his sweatshirt' or 'taking off his sweatshirt'. In the first case, the critical noun 'sweatshirt' was associated with the further scenario; in the second case, the critical noun was dissociated. Subsequently the text was completed with filler sentences. By using a referential task or an item recognition task the authors Glenberg, Meyer, and Lindem (1987) found increased response times for the critical noun 'sweatshirt' or the fitting pronoun 'it' when it was dissociated in the second sentence; it was recognized faster in the associated text version. To be more precise, this response time advantage became significant after one or two filler sentences. With as yet unpublished data we replicated and extended this general result in our own laboratory (Hielscher, Müsseier, Reuther, & Rickheit, 1990). We used associated vs. dissociated texts similar to those of Glenberg, Meyer, and Lindem (1987). But in contrast to their texts we always introduced two persons, either one of them or both associated with the spatial scenario. This variation was exclusively attained by different verbal relations between the two persons. 2 Im2
The fact that verbs are involved, for example, in reference processes has been already k n o w n for some time (e.g. Caramazza, Grober, Garvey, & Yates, 1977; G a r n h a m & Oakhill, 1985; M c K o o n & Ratcliff, 1993). Also, in the above example by Glenberg, Meyer, and Lindem (1987) it is only the verb that varies, but it does not become obvious if the associated/dissociated variation is established by the verb only. This is the case in our texts.
38
J. Müsseier, Μ. Hielscher, G. Rickheit
agine a situation in a restaurant where 'John is flirting with Mary' as compared to 'John is thinking of Mary'. Only the variation of the verb in the dissociated version ('thinking o f ) makes 'Mary' an indirect part of the spatial model; she may be anywhere but in the restaurant. In the associated sentence ('flirting with') both persons are interacting as explicit parts of the spatial scene. According to Glenberg, Meyer, and Lindem (1987) we expected a facilitated reference resolution for 'Mary' in the associated case. One associative vs. dissociative verb3 was embedded in five-sentence passages like the following one:
Example 2: 1 setting
The restaurant is quite empty.
2'
At the table in the corner Mary is flirting with John.
associated
2" dissociated 3 filler 4 filler 5 coreference
At the table in the corner Mary is thinking of John. She eats some vegetables with meat. The drink tastes very good. He/John wants to go to the theatre with her.
The first sentence provided the setting information determining the scenario, in this case a restaurant. The second sentence was either an associated or dissociated one; to avoid ambiguous coreference relations even for the pronoun (in sentence 5), it always introduced a female and a male person. The two text versions differed only in the use of the verb in sentence 2. In the two following filler sentences the female person was focussed on. Here it was important that she stayed part of the scenario. In the critical fifth sentence a coreference to the male person mentioned in sentence 2 had to be established. This co-referential process was initiated either by a pronoun or by repeating the Christian name. As the results of Glenberg, Meyer, and Lindem (1987) and others suggest, the male person is still in focus if he is spatially associated. Thus, the prediction would be that the resolution of the reference pronoun 'he' should be easier if 'he' is part of the spatial surrounding (as compared to the condition where 'he' is not). According to Sanford and Garrod (1981; Garrod & Sanford, 1982, 1985) 'he' is still present in the explicit focus. However, in the dissociated version a time-consuming focus shift should be necessary to establish the pronominal reference. We used a word-incremental reading technique to gather time latencies which are assumed to reflect underlying cognitive processes: a text is displayed on a computer screen incrementing word by word, with reading times individually 3
About 100 different verbs were tested in a verb inventory, a questionnaire with sentences containing those verbs, we assumed to associate or dissociate the two persons. Subjects had to rate the spatial distance between the two interacting people on 5-point-Likert-scales. The results of this study therefore allowed us to select extreme groups of spatial associative (e.g. 'to kiss s.o.', 'to meet s.o.' or 'to dance with s.o.') and dissociative verbs (e.g. 'to characterize s.o.', 'to write to s.o.' or 'to look for s.o.'). Of course, nothing could be said about the cognitive relevance of this spatial verb dimension in text comprehension.
Spatial Mental Models
39
controlled by the reader (for validity of this technique compare, e.g., Günther, 1989). The reading times of the critical reference words 4 were statistically analysed (Fig. 1).
υ
(I) w Ε Οί Ε
c TD (0
600
V
noun, associated
Β
noun,
dissociated
Δ
pronoun, associated
0
pronoun,
dissociated
550
500
Φ L.
Ο
450
He/John (Er/John
wants wollte
...)
Figure 1: Word reading times of the critical reference (pronoun or noun) and the following verb (here 'wollte') depending on the associative/dissociative introduction of the reference person (cf. text example 2).
In line with our hypothesis, the reading times of the pronouns showed significant differences. As predicted the pronoun in the associated version was read about 50 msec faster than the pronoun in the dissociated version. Here, our predictions are correct. Our assumptions for noun references (with the full definite noun phrase, i.e. the repetition of the Christian name 'John' in the above example) were not that clear. Following the same reasoning, one could expect results similar to pronominal coreference. However, according to Garrod and Sanford (1982; 1988b; Sanford, Garrod, Lucas, & Henderson, 1983), a full definite noun phrase induces 4
In this experimental design we included the reading times of the anaphor and the verb of the critical reference sentence. This additional independent variable tests whether the anaphor is assigned to the referent on reading the pronoun (e.g. Just & Carpenter, 1980) or whether even the reading time of the verb is influenced by the different experimental conditions. The latter would assume a cognitive lag between reception and processing of the pronoun (e.g. Rayner, 1978; Ehrlich & Rayner, 1983; also cf. the 'postponatian' hypothesis by Vonk, 1985; Sanford & Garrod, 1989; Sichelschmidt & Günther, 1990). As one can see from the results, this does not seem to be the case in Figure 1 (but cf. Figure 2 and 3).
40
J. Müsseier, Μ. Hielscher, G. Rickheit
a longer search right from the beginning in the explicit and implicit focus. Moreover, if the reference person is only loosely connected with the spatial scenario, it is more than likely that after two filler sentences he/she is no longer represented in the explicit focus anyway. In that case a pronoun would be linguistically inadequate. Then the repetition of the Christian name would be the more appropriate form in the dissociative version than in the associative version. If 'John' is not part of the focus in the spatial model it may be more common to repeat the whole noun phrase (i.e. his name). As a consequence, the pronominal resolution should be even slower than the resolution by repetition of the name in the dissociative text. The first striking result in this respect was the enhanced reading time for the Christian name - less time was used for the pronoun. Similar results are reported by other researchers (Garrod & Sanford, 1982, 1988b; Sanford, Garrod, Lucas, & Henderson, 1983). At first sight the result is astonishing because pronominal resolution should be an additional and therefore time-consuming process in text comprehension. But - on the contrary - using a pronoun seems to facilitate comprehension! It is beyond the scope of this paper to explain this phenomenon and solve the problem, but a simple explanation would be the difference in the number of letters: pronouns are shorter than names and usually longer words need longer reading times (see Rickheit, Günther, & Sichelschmidt, 1992). Another explanation starts from the assumption that there are different processes involved for resolving pronouns and noun anaphors (Cloitre & Bever, 1988). A further explanation is related to a functional point of view: a pronoun may signal a focus maintenance whereas a noun anaphor may signal a focus shift (Schnotz, 1988; Müsseier, 1994). This distinction could have produced the reading time differences, too. Taking all data into account, we cannot definitely decide between the alternative explanations at this point in time. Let us now turn to the difference between the associated and dissociated version. Repeating the Christian name was found to facilitate the resolution in the dissociated version as compared to the associative version. As expected the reverse pattern of results was found for pronouns. This difference was expressed in a significant three-way-interaction between the factors 'spatial reference' (associated vs. dissociated), 'antecedent' (noun vs. pronoun), and the 'critical word' in the fifth sentence (noun/pronoun vs. verb). Based on this result we assume a prominent difference between processes involved in pronominal and nominal coreference, indicating again the inadequate use of a pronoun to refer to a nonfocussed object. To summarize, the experiment confirmed the results of former studies and indicated that the spatial model influenced processes of reference resolution. It made a difference whether a person or an object was an explicit part of the spatial mental model. This spatial relation seemed to determine the status of the person in focus. Being in focus was an important criterion for adequacy and facility to interpret a noun or pronoun referring to this person.
Spatial Mental Models
3. Complex concepts in spatial mental
41
models
In the text example above we wanted a reference resolution for only one person. Yet, another question results from the concept of plural reference: it is possible to refer to both, i.e. 'John' and 'Mary' via the plural pronoun; how is this pronominal coreference influenced by the associative vs. dissociative introduction of both persons? To test this we modified the filler and the critical reference sentences: Example 3: 1 setting 2' associated 2" dissociated 3 filler 4 filler 5 coreference
The restaurant is quite empty. At the table in the corner Mary is flirting with John. At the table in the corner Mary is thinking of John. He is a very goodlooking man. Sometimes he is a little bit shy. They/She want(s) to go to the theatre. (German original: Sie wollte(n) noch ins Theater gehen.)
In the critical reference sentence either the singular pronoun 'she' was used or the plural pronoun 'they' referring to both, 'Mary and John'. Note, that in German the meaning of the pronoun 'sie' is ambiguous: 'sie' can refer to 'Mary' alone or to both 'Mary and John'. Only the verb inflexion assigns the references to the singular or plural concept. To comprehend the plural pronoun the reader has to integrate the two singular persons to a whole plural entity (that fits the pronoun). We will refer to the process yielding such a plural entity as 'installing a complex concept' (Müsseier & Rickheit, 1990a; Kaup, 1994; for a more formal description cf. Eschenbach, Habel, Herweg, & Rehkämper, 1990; Schopp, 1995). Theprocess of installing such a concept is an additional component in the process of text comprehension which therefore should take additional time, as opposed to the resolution of the singular version. Over the last years we have carried out several studies concerning this question (Hielscher & Müsseier, 1990; Müsseier & Rickheit, 1990b). Instead of associative or dissociative spatial contexts we used different coordinations for these experiments to introduce the persons, such as 'John and Mary...', 'John as well as Mary...' or 'Neither John nor Mary...'. For the point of time in the comprehension process when the complex concept is established, our predictions were analogous to these experiments: Firstly, it can be argued that the plural complex is not installed until it becomes unavoidable for text reception, i.e., as soon as a person reads the pronoun 'sie' followed by the plural verb, the pronoun has to be related to both 'John and Mary'. Thus, at least at this point in time, the complex concept has to be generated. Consequently, one can assume that the singular pronoun is processed quicker than the plural pronoun because there is no additional component for installing a plural reference complex. An alternative prediction implies that establishing the complex concept is independent of reading the pronoun. The plural complex is installed while read-
42
J . Müsseier, Μ . Hielscher, G . Rickheit
ing about the two persons in the introductory second sentence - especially in the associated text version. If this is the case, one can think of two possible alternatives for the pronominal resolution process: the first one claims that the singular concept fits the pronominal resolution just as well as the plural one. N o processing time differences should occur. The second possibility claims that by installing the plural concept the singular concepts are deactivated. In the latter case, a refocussing of the singular concept is necessary for resolving the singular pronoun. This additional mechanism should be expressed in a processing time disadvantage for the singular pronoun. The results showed absolutely no effect for our critical variation, the variation of the associated vs. dissociated version. Thus, contrary to the experiments reported above the resolution of the critical pronoun is not influenced by the spatial context. This may be due to a difference between example 2 and 3: In example 2 the male person within the verbal phrase of sentence 2 is used as coreference, whereas in example 3 this is the female person within the nominal phrase, or both persons. It seems that spatial effects are modified by such syntactical positions. Independent of text version there is a tendency to an advantage of the plural pronoun at the verb and the following word if no filler sentences were used (Fig.2) 5 . Although the plural pronoun has to be related to both 'John and Mary' the resolution seems to be facilitated. This points to an installation of the plural concept while reading the prepronominal sentence. Otherwise, the reverse effect should be observable. The difference here is small but points in the same direction, which preceding experiments in our laboratory have confirmed. What we found with a completely different method - using decision times instead of word incremental reading - was that the advantage of the plural pronoun depended on the coordination we used (Hielscher & Müsseier, 1990; Müsseier & Rickheit, 1990b). With coordinations like 'John and Mary' or 'John as well as Mary' there is an advantage in reaction time for the plural pronoun of about 50 msec. This contrasts with coordinations like 'John without Mary' or 'John instead of Mary' where no such differences were found. In the present experiment the dependent variable was word-reading time and even here the facilitation for the plural pronoun was found. These results imply that additional processing capacity for installing the plural concept was expended in the pre pronominal sentence. Unfortunately, we have no means to test this idea on the basis of our data because our prepronominal sentences always introduce two persons. Still, it would be necessary to compare the processing times of singular and plural prepronominal sentences. In a recently published experiment Sichelschmidt and Günther (1990) did exactly 5
N o t e that in this experiment the reading times of the following w o r d s were also affected b y the experimental manipulation. Evidence f o r the cognitive lag or p o s t p o n a t i o n hypothesis, cf. note 4.
Spatial Mental Models
υ
φ (Λ £
•
plural pronoun, no filler
Β
singular
43
sentences
pronoun, no filler
sentences
550
This ordering on a scale provides an explanation for the distribution of uses of beide and sie, which was observed in the section above. Here, beide is definitely specified with regard to the semantic feature of distributivity, which says that the predication can be applied to the proper parts of the denoted set of referents. Beide is therefore restricted to combine only with distributives and certain reciprocals. Less restricted is the pronoun that follows on the scale. Sie is open to a combination with collective, reciprocal and distributive predicates as well. With respect to predicates that are underspecified as to having a collective versus distributive reading, the described scalar arrangement of the pronouns allows the listener to make a prediction about the reading that should be preferred for the subsequent predicate: 36a) Beide brachten ein Geschenk mit.
Both brought along a present. 36b) Sie brachten ein Geschenk mit.
They brought along a present. O n the one hand sentence 36a implies that sentence 36b is also valid. O n the other hand, 36b forces a collective reading, i.e. one where there is only one present introduced into the discourse universe. So the distributivity implicature provides a pragmatic overlay on the semantic content of sie, whenever possible the predicate that follows the pronoun is interpreted collectively. Since it is only a pragmatically justified conclusion but not a semantic implication, it can always be cancelled. 37)
Sie brachten ein Geschenk mit, und zwar jeder einzelne von ihnen.
They brought along a present, in fact each one of them. While the analysis proposed for beide so far does not favour one of the approaches to distributivity, because the character of beide, which by now seems to be distributive for semantic reasons, is compatible with any of the mentioned theories. The pragmatic mechanism suggested for guiding preferable readings of a sentence is best explained in the framework of Link's proposal. It can be assumed that the selection of a pronoun already gives an indication to the listener on how to interpret the following predicate. Thus the applicability conditions of a distributivity operator could be formulated. Intonation also plays a crucial role in the determination of a suitable reading of a sentence. In the examples discussed so far, beide always occurred in sentence initial position as a singleton constituent. In Löbner's (1987) discussion of indefinites, where he distinguishes a quantificational and a non-quantificational reading of indefinite determiners, several characteristics that correspond to the respective uses are discussed. One of them concerns the stress pattern of indefinite determiners, which receive obligatory stress when used as a quantifier. To draw a parallel to the uses of beide, one can argue, that it must also be stressed
88
Α. Schopp
to be interpreted as a quantifier. To test the behaviour of beide in this case, an unstressed use should be forced. When a focus introducing movement such as topicalization occurs, the phrase in the topicalized position receives the main stress of the sentence. In analogy to Löbner's observations, a sentence like 38)
EINIG waren sich aber beide,... [Agree were REFL but both,...] But both agreed that...
is perfectly well-formed, although beide is combined with a collective predicate. Here, beide is deaccented, and the implication that the sentence has a distributive reading does not occur. This observation forces the hypothesis that accent plays a crucial role in determining the semantic content of beide. So in addition to the use of beide as a quantifier, which was discussed exclusively in the literature (cf. Barwise & Cooper, 1981; Bhatt, 1990; Roberts, 1987; Eschenbach, 1993), it is also possible to use it as a pronoun that just expresses definiteness and a cardinality constraint16. This also explains the difference between using beide and jeder under the contextual restrictions mentioned above. For jeder, it is definitely impossible to suspend its quantificational properties. This is also reflected in its syntactic agreement restrictions to singular. In the final section, the proposal discussed so far will be taken as a starting point for the discussion of some discourse functional implications that a use of beide versus sie might have.
6. Cognitive aspects of pronoun
production
In addition to the semantic content of pronominal beide, which implies a cardinality constraint and restricts its use to referents that are either already introduced into the discourse universe or otherwise definitely specified by uniqueness, the pragmatic relation between this pronoun and the personal pronoun sie contributes to some discourse-specific functions. It was already discussed how the use of beide is marked in contrast to other anaphorical devices. This linguistic markedness serves to select this pronoun in the course of language production when special requirements have to be considered. The behaviour discussed earlier not only serves to disambiguate possible sentence readings, but may also be used to explain mechanisms which are at work in a longer discourse segment. During the process of language production, the entities in the mental model must be structured to account for special verbalisation requirements. When something has to be predicated about a complex of entities, it might be necessary to create this complex out of some individuals present in the mental model. This inferential process of building a complex refer16
I will not address the question of whether it is a matter of lexical ambiguity that explains the different uses of beide.
Focussing on the Use of German beide
89
ent out of individuals is called grouping. The distributivity constraint on beide can be taken as a first hint at how the strength of the grouping17 of referents on a cognitive level that have to be picked up anaphorically by this pronoun is reflected by linguistic devices. A first hypothesis, which accounts for the capacity of beide to express a certain status of the denoted complex of referents, is given in the following statement: H I ) In contrast to the personal pronoun sie, the distributivity implication on beide allows the referents that have to be denoted to be grouped only temporarily. So expressing distributivity is a means of indicating that the objects form only a temporary complex in the mental model. To demonstrate this behaviour, particularly as a contrast to other possible anaphors, an example from a newspaper report will be considered: 39)
Beide, die Mafia und Amerika, haben ihre Wurzeln in Europa. Im Grunde halten sich b e i d e für wohltätige Einrichtungen und beide haben ihre Hände mit Blut besudelt, bei dem, was notwendig war, um ihre Macht und ihre Interessen zu schützen. Wir sehen in unserem Land unseren Beschützer und es hält uns zum Narren und belügt uns. (F. F. Copolla's comment on "The Godfather", Part 1) Both (of them), the Mafia and America, have their roots in Europe. Fundamentally, both consider themselves to be charitable institutions and both have stained their hands with blood for that which was necessary to protect their power and their interests. We see our protector in our country, and it makes a fool of us and lies to us.
What can be found in this example is a grouping of two referents that stand in some way in opposition to each other. The coordination of the two entities is at first strange or at least unexpected. If in all three cases, beide is substituted by the personal pronoun, the text becomes a little odd. One reason for this is that the preferred collective interpretation, which goes along with the use of the personal pronoun, would lead to inferences concerning the uniform roots of the Mafia and America for example, which strongly contradict our knowledge about the world. The complex that is built out of two conceptually more or less remote entities endures only with respect to the following predicate. So even by changing only the second and third occurrence of beide with the personal pronoun, the text remains bad. What causes this acceptability difference is that by using beide when the whole sentence has been processed, the complex will be suspended. This leads to another hypothesis concerning the presupposed salience of referents that can be denoted by using beide: 17
A discussion of the semantic and cognitive implications of the formation of complex referents and the restrictions that guide their anaphorical capacity can be found in Herweg (1988).
90
Α. Schopp
Η2) The disposition for grouping is provided by the predicate. So the grouping takes place only for processing the corresponding sentence, but the salience of the described entities remains equally strong. This salience presupposition points to a crucial discourse structural function of beide, concerning the quality of the referents that can be picked up by using this pronoun. To illustrate this issue, central parts of a newspaper report about a tennis match are presented: 40)
... Furios der Beginn. Beide spielten auf unglaublich hohem Niveau, keine Spur mehr von betulichem Damenspiel, bei dem unglücklicherweise Schläger und Bälle mit von der Partie sind. Graf dominierte...
... The beginning was furious. Both (of them) played on an incredibly high level, not a sign of a leisurely ladies game, in which unfortunately rackets and balls are involved. Graf dominated...
2 sentences about Seles 3 sentences about Graf 3 sentences about Seles Kein Zweifel: Es war wohl nichts als Schrei ohnmächt'ger Empörung, daß beide in frauenbewegten Kurzröcken antraten.
No doubt: It was nothing but a cry of indignation that both (of them) entered the court in short skirts.
1 sentence situation description 2 sentences about Seles Und wieder beide...
beharkten
sich
And again, both let each other have it...
4 sentences about Graf 2 sentences about Seles 3 sentences situation description Es deutet nichts darauf hin, daß beide in Geberinnenlaune sein werden.
Nothing indicates that both (of them) will be in the mood to give gifts-
Were beide picks up the main referents that the text is about, even when one of them was not mentioned over a distance of five sentences. Substituting the personal pronoun sie for beide in this text is nearly impossible, or would require a complicated search in memory for the listener to find a suitable referent. While the personal pronoun is able to express that a complex referent is already present in the mental model, or that grouping of referents that have just been mentioned is to take place, beide can also express the grouping of entities that have not been mentioned in a longer section of text. Beide can therefore typically be used for the description of the main characters of a discourse. This implies some cognitive
Focussing on the Use of German beide
91
constraints which go along with some possible uses of beide. For the text type of narratives, it has often been pointed out that a distinction can be drawn between different kinds of actors involved, e.g. main versus minor or subsidiary characters (cf. Rumelhart, 1975; Sanford & Garrod, 1988). It is assumed that the organizational principles that determine the structure of narratives correspond in some way to the organizational strategies that individuals use to memorize the story information. Background knowledge of the structure of narratives has to be activated in order to comprehend and encode new information. As a cognitive counterpart to the textual distinction between main and minor characters, a distinction has been assumed between key entities and dependent entities in the cognitive representation of a story's content. Key entities are assumed to be relatively stable and therefore immune from topic shifts during the processing of at least one larger discourse segment. A third hypothesis shall relate this distinction to a property that distinguishes the use of beide from other anaphorical devices. H3) The pronoun beide with its previously described grouping facilities is a way of expressing thematic subjecthood, in case there are two important characters who may be interacting, or are compared or contrasted. So the use of beide is a way of conveying the global coherence relations between different parts of a text. This finding expresses another contrast between beide and the personal pronoun sie; the latter cannot be used in the same way to convey the global coherence relations in a discourse. The use of beide in the course of producing a text provides a way to refer to salient entities which have to be compared or contrasted, thereby establishing coherence.
7. Conclusion With this brief survey of the semantics, pragmatics and resulting discourse functional features of a certain plural pronoun, the interactive role that different levels of linguistic representation play in the use of pronominal beide was shown. In addition to the semantic content of pronominal beide, which implies a cardinality constraint and restricts its use to referents which are somehow present in the discourse universe, a pragmatic markedness relation between beide and the proper personal pronoun sie was established to account for the distribution of uses dependent upon the character of the corresponding verbal predicate. A particular variant of the notion of Scalar Implicature was used to model this markedness relation between different anaphorical devices with respect to their possibilities to be combined with certain kinds of predicates. The assumption that different linguistic levels interact can be supported by the psychologically validated fact of incremental processing in the course of language production. While the processing of the initial part of a sentence may
92
Α. Schopp
be nearly finished, the pragmatic implications of the already processed part may still influence the semantic processing of what follows. According to this hypothesis it can be assumed that uttering the personal pronoun sie in an early stage of sentence processing leads to the possibility that its pragmatic features influence the semantic processing of the following predicate, so that an inference forced on a pragmatic level triggers the applicability of a semantic operator. It was suggested that the ability to guide the applicability of a distributivity operator has also some discourse functional implications. On the one hand, the use of beide is dependent on the status of the entity that has to be described, in that a complex is formed only temporarily. Moreover, this grouping facility allows the main characters of a discourse to be taken up. So beide is a means of making some global coherence relations explicit, because it is able to pick up referents who have not been mentioned for some time in the discourse.
References Bhatt, C. 1990
Die syntaktische Struktur der Nominalphrase im Deutschen. Tübingen: Günter Narr Verlag. Eschenbach, C. 1993 Semantics of number. Journal of Semantics, 1 0 , 1 - 3 1 . Eschenbach, C. C., Habel, C., Herweg, Μ. & Rehkämper, Κ. 1989 Remarks on plural anaphora. In Proceedings of the Fourth Conference of the European Chapter of the Association for Computational Linguistics. 161-167, Manchester. Garrod, S. & Sanford, A. J. 1988 Thematic subjecthood and cognitive constraints on discourse structure. Journal of Pragmatics, 12, 519-534. Gillon, B. 1987 The readings of plural noun phrases in English. Linguistics and Philosophy, 10,199— 219. Herweg, Μ. 1988 Ansätze zu einer semantischen und pragmatischen Theorie der Interpretation pluraler Anaphern. G A P - A P 2, Hamburg. Heim, I., Lasnik, R. & May, H. 1991 Reciprocity and plurality. Linguistic Enquiry, 22, 63-101. Hobbs, J. 1978 Coherence and coreference. SRI Technical Note 168, SRI International, Menlo Park. Langendoen, D. T. 1978 The logic of reciprocity. Linguistic Inquiry, 9 , 1 7 7 - 1 9 7 . Levinson, S. C. 1983 Pragmatics. Cambridge: Cambridge University Press. Link, G. 1984 Plural. Manuscript to appear in: D. Wunderlich & A. v. Stechow (Eds.), Handbuch der Semantik. Link, G. 1984b Hydras. O n the logic of relative constructions with multiple heads. In F. Landman & F. Veltman (Eds.), Varieties of formal semantics. Proceedings of the 4th Amsterdam Colloquium, 245-258.
Focussing on the Use of German beide Löbner, S. 1987
93
Natural language and generalized quantifier theory. In P. Gärdenfors (Ed.), Generalised quantifiers. Dordrecht, 181-201.
Roberts, C. 1987 Distributivity. Proceedings of the 6th Amsterdam Colloquium, 291-309. Rumelhart, D. E. 1975 Notes on a schema for stories. In D. G. Bobrow & A. Collins (Eds.), Representing and understanding: Studies in cognitive science. New York, NY: Academic Press. Sanford, A. J. & Lockhart, F. 1990 Description types and method of conjoining as factors influencing plural anaphora: A continuation study of focu s. Journal of Semantics, 7, 365-378.
Part II: Coherence
S I M O N G A R R O D and G W Y N E T H D O H E R T Y
Special Determinants of Coherence in Spoken Dialogue " C o m m u n i c a t i o n " comes f r o m the latin " c o m m u n i c o " meaning " t o s h a r e " . . . C o m munication is essentially a social process. Sharing does not mean simply passing s o m e thing, s o m e sign, f r o m one person to another, it implies also that this sign is mutually accepted, recognized and held in c o m m o n ownership or use by each person. Colin Cherry, 1971
1.
Introduction
This paper addresses certain questions about discourse processing and coherence in relation to spoken dialogue. As a starting point we take the position that a dialogue can only be judged coherent with respect to the interactional context of communication. "Whereas coherence in a written text reflects the way in which the different sentences stick together and connect in a consistent fashion1, coherence in spoken dialogue has to reflect the way in which the utterances from each participant mesh together to form a co-ordinated exchange. A coherent dialogue is therefore one which requires co-ordinated language processing where the participants achieve the kind of mutual understanding which Cherry identifies as the hallmark of communication. The first section of the paper explores the notion of co-ordinated language processing and discusses one of the co-ordination principles identified in some of our earlier work (Garrod & Anderson, 1987). In the remainder of the paper, the application of this principle is explored in relation to three dialogue experiments and we conclude that it can be seen as a general processing principle for achieving coherent dialogues and possibly also as one of the principles underlying the social origins of semantic conventions in the language.
1
Chambers dictionary defines coherent speech"
as "sticking together; connected; consistent in thought or
98
S. Garrod, G. Doherty
2. Co-ordinated processing and the output/input
co-ordination principle
The idea of co-ordinated language processing can be illustrated in a simple way through the schematic communication cycle diagram in Figure 1. The cycle represents the various processes involved in producing and interpreting an utterance at a number of levels. It is assumed that the speaker starts out with an intention to communicate something which is then given a linguistic realisation as a result of making semantic decisions in relation to a particular view of the relevant context. This series of processes then leads to the production of a string of sounds that make up the utterance. For his part the listener then has to recover the form of the utterance from these sounds and derive via his view of the context the semantic decisions made by the speaker and hence the speaker's intention in making that utterance in that context. If the cycle is to succeed, there therefore has to be a degree of co-ordination between the production processes and those involved in comprehension. Communication failures can arise at any level. If speaker and listener fail to co-ordinate on the intended form of the utterance (i.e. it is misheard) then failure occurs at the top level of the system. But even if the speaker and listener co-or-
Communication Cycle Diagram Utterance
Sentence
Sentence
Meaning + Context
Intention
Meaning + Context
Perceived Intention
Mutual Intention Figure 1: A schematic diagram of the processing cycle underlying spoken communication.
Special Determinants of Coherence in Spoken Dialogue
99
dinate their processing at the form level this does not rule out misunderstandings through failure to co-ordinate on the same view of the context (conceptual disco-ordination) or on the referential meaning (semantic disco-ordination) or on the intention of utterance (pragmatic disco-ordination). We suggest that a coherent dialogue is one which reflects co-ordinated processing among the participants at these various levels. In turn any processing principles which might underlie the execution of coherent dialogues must in some way promote this kind of co-ordinated language processing. It was with such considerations in mind that Garrod and Anderson ( 1 9 8 7 ) proposed what they call the output/input co-ordination principle (O/ICP). The principle states that in generating an utterance the speaker should always aim to match as closely as possible the pertinent lexical, structural, semantic and conceptual decisions that were made in interpreting the most recent relevant utterance from the interlocutor. The rationale behind the principle is straightforward, the co-ordination of production with comprehension processes within an individual communicator should promote the related co-ordination of these processes between communicators. In processing terms this can lead to efficiencies both in terms of the individuals concerned and their collaboration. Individual efficiency comes from simplifying the production and comprehension processes. For instance, lexical selection in speaking and lexical interpretation in listening are made more straightforward if there already exists a single co-ordinated lexico-semantic procedure to map a description to a referent. Levelt and Kelter ( 1 9 8 2 ) also make this point in relation to syntactic processing, where they have observed a high degree of syntactic entrainment in formulating answers to questions. In fact it is reasonable to expect that there will be individual processing efficiencies of this kind that arise at all levels of the system. Benefits in terms of collaborative efficiency deserve more careful examination. As Clark and Wilkes-Gibbs ( 1 9 8 6 ) have argued, it is more appropriate in dialogue to think of processing efficiency as applying across a complete exchange, rather than an isolated utterance. They illustrate this idea in relation to establishing reference. It has sometimes been argued that the optimal reference is made through a single description which is the briefest one sufficient to differentiate the intended referent from any other potential referents in the contextual domain, so respecting Grice's conversational maxims of quantity and quality. Furthermore, such optimal references are considered to be those which expend least processing effort. However, in the heat of real conversation this is very difficult to achieve if even possible. For instance, it may be that the first words that come to mind overspecify the referent, or more seriously that the speaker simply does not know the potential domain of referents that the listener might have in mind from which the intended one is to be differentiated. So it is not surprising that on most occasions reference is established in dialogue not through a single utterance but over an extended cycle of initiation and repair only completed when there is mutual acceptance of the description by both parties. This means that efficiency in relation to the whole collaboration can only be properly defined in
100
S. Garrod, G. Doherty
terms of balancing the effort of initial formulation against the cost of subsequent repair. Put somewhat differently, collaborative efficiency should be measured in relation to the overall effort involved in achieving mutual understanding and it is here that the O/ICP has a bearing. Matching as closely as possible your partner's choices in lexicon, form, meaning and conception is a good heuristic for insuring mutual intelligibility without having to explicitly engage in the kind of recursive inferencing required to establish mutual knowledge (Clark & Marshall, 1981). In general conforming to the O/ICP should lead to a considerable gain in collaborative efficiency. However, as we shall see later in the paper the co-ordination principle only minimises this kind of collaborative effort when applied at the deepest level possible. The O/ICP is therefore motivated from a functional point of view as a possible dialogue coherence principle. In the remainder of the paper we explore the empirical basis for this claim, first in relation to individual adult dialogues and then dialogues from children of different age and skill. Finally, we consider how the O/ICP might act as a mechanism for fixing conventional meanings within a larger group.
3. Analysis of co-ordination 3.1 Co-ordination
in task constrained
among isolated
dialogue
dyads
3.1.1 The task The original investigation which prompted Garrod and Anderson (1987) to propose the O/ICP was of a large corpus of dialogues elicited through having pairs of speakers engage in a collaborative game. The essence of the game is as follows. Each player is seated in a different room but with an audio communication link and each is confronted by a VDU on which a maze is displayed. The mazes consist of small box-like elements (nodes) connected by paths along which the players move their respective tokens (see Fig.2). The purpose of the game is for the players to move their position tokens alternately through the maze (one path link at a time) until they have both reached their predetermined goals. In doing this each player can only see on the screen his or her own start position, goal position and current token position. The co-operative nature of the game arises from two additional features of the mazes. First, each contains obstacles in the form of gates which block movement along the paths, and second, there are a small number of special nodes marked as switch positions. Both the gates and the switch positions are distributed differently across the two mazes, and it is in overcoming these obstacles that cooperation and communication are required. If a given player (say A) moves into a node where his or her partner (B) has a switch, then all of B's open paths become gated and all the gated ones open. So when a player requires a gate to be
Special Determinants of Coherence in Spoken Dialogue
101
Figure 2: Examples of the maze configurations that subjects (A and B) see in the maze game experiment. Nodes containing Ss indicate switch nodes, whereas those containing an Ο indicate the players' current positions and an * the players' goal positions.
opened they have to enlist the help of their partner, find out where he or she is located and then guide him or her into a switch node only visible on their own screen. Typically a game involves players attempting to move towards their goal with dialogue intervening between moves. This dialogue contains repeated exchanges about each player's location on the maze, switch positions, goal position and so on, and it is these exchanges which are analysed. The advantages of using such a game are that it is possible to elicit 'natural dialogue' but in a situation where the speakers produce repeated spatial descriptions of points known to the investigator. This enables one to carry out a semantically transparent analysis of the descriptions which arise and observe how these develop as part of the larger dialogue. In particular it is a relatively straightforward matter to establish the degree to which the speakers co-ordinate their language use both in terms of the form of description and its underlying semantics. In this paper we consider three studies which explore the development of co-ordinated description schemes when speakers engage in the maze game. The first study looked at dialogues from 22 pairs of adult players and was used to classify descriptions into various schemes and then use the classification as a basis for analysis of inter-speaker co-ordination.
102
S. Garrod, G. Doherty
3.1.2 Study 1: Co-ordination
in isolated dyadic
interactions
This first study is described in detail in Garrod and Anderson (1987) so we shall only concentrate on its main conclusions and the Basic principles behind the analysis which are central to the remaining two studies to be discussed. Each of the 22 pairs played two maze games in sequence and the dialogues were transcribed at the word level. O n the basis of these transcriptions, sequences of description exchanges were extracted for further analysis, where an exchange would include everything from the first initiation (e.g. question such as "Where are you"?) to the final indication of mutual acceptance. This yielded a complete corpus of about one and half thousand location descriptions. The first stage in the analysis was to establish a classification of the description schemes used in the dialogues. At the top level this yielded four main types of scheme, each based upon a somewhat different way of conceptualising the overall maze configuration. Examples of each type and their distribution in the corpus are shown in Table 1. The most popular scheme was what we called a path description where the communicators treat the maze as having a series of paths linking the nodes and describe locations by taking the listener on a tour through the maze from a prominent point to the position to be described. The next most popular were co-ordinate descriptions where the communicators define each point in the maze as at the intersection of two co-ordinates usually defined as rows and columns. Then come line descriptions where the maze is configured as a series of lines arranged in the vertical, horizontal or even diagonal orientation. The description then involves first identifying the line of nodes containing the position and then defining the target node relative to that line. Finally, there were figural descriptions where the maze is viewed as containing sets of different, possibly overlapping, patterns or figures. So for instance it might be broken down into T-shapes, limbs sticking out to the side, corners etc. A description of any position is then given in relation to the breakdown of one of these figures.
Table 1:
Examples of different location description schemes. (For the same point on a maze) (1) Path network ( 3 6 % ) "See the bottom left, go along four and up one that's where I am" (2) Co-ordinate ( 2 4 % ) "I'm on the third row and fifth column" or "I'm at D five" (3) Line ( 2 3 % ) "I'm on the third level, second from the right"
Special Determinants of Coherence in Spoken Dialogue
103
(4) Figural (17%) "See the middle right indicator, well I'm on the left of it" This initial breakdown is therefore based mainly on the underlying spatial model that is being assumed of the maze. However, the model only constitutes one component of the description scheme, there is also the critical issue of how the description language (i.e. lexical and semantic structure of the description) map onto locations in the maze via the underlying model. To take just one example, in giving a line type of description it is possible to refer to the horizontal lines as rows, levels, layers, columns, or even floors while still relying on basically the same underlying spatial model of the maze. Similar considerations apply in fixing the structural semantics of a description, so the same row may be described as the third from the bottom, the third bottom row or just row three and in each case the precise semantic interpretation of terms such as row or bottom may be rather different. So there are at least two dimensions of variation that can be observed in the description corpus, first that of the underlying spatial model and second that of the precise mapping between the description language and elements in the model. Across the corpus as a whole this combined variation leads to a multitude of subtly different description sub-schemes (see Garrod & Anderson, 1987, for details of the different schemes). The second stage in the analysis was more directly concerned with establishing the degree of inter-speaker co-ordination with respect to the descriptions actually used in any dialogue. In contrast to the wide variation of schemes across the sample of dialogues as a whole it was apparent that isolated conversational partners were amazingly consistent about their choice of scheme. At the level of the underlying model, speakers would tend to converge on the same description within the first game played (50 % agreement across the game) and by the second game there was 95 % agreement in choice of scheme. At the same time the descriptions generated did not remain the same across the dialogue, so that there was a steady move from the more concrete descriptions depending upon the perceptual salience of the maze configuration (path and figural) toward more abstract line and co-ordinate descriptions. Looking in more detail at the convergence on a common description language there was also strong evidence of mutual entrainment. Both at the level of lexical choice and in terms of the more subtle structural semantics of the descriptions there was a striking correspondence between each speaker s utterances (see Garrod & Anderson, 1987, for a detailed analysis). Thus dialogue partners could be seen to be collaborating to establish a common unambiguous description language which was usually quite distinct from the languages that were being forged by the other dyads. So the question arises as to how such co-ordination is being achieved. One possibility is that it comes about through explicit negotiation between the participants as to the scheme that they should use. For instance, one might expect to find the players starting out with a discussion of how they can go about
104
S. Garrod, G. Doherty
describing where they are and then sticking to the agreed scheme. However, the pattern of exchanges is not consistent with such a hypothesis. In the first place there were only a small number of cases when any discussion of the scheme occurred (in only 15 out of the 44 dialogues) and then in the majority of these cases it was in the second game played. So negotiation seems to follow the initial co-ordination effort rather than preceding it. Furthermore, in 12 of the 15 cases the explicit negotiation was only associated with convergence on one of the many possible schemes, the co-ordinate. Finally, such brief negotiations had relatively little impact on how the speakers subsequently formulated their descriptions. Thus only 59 % of subsequent descriptions in the dialogues containing negotiation exactly matched the explicitly negotiated scheme. The whole process of co-ordination appears to be much more dynamic and subject to the continuous interaction between the two speakers. For the large majority the co-ordination mechanism seems to reflect in a rather transparent way the O/ICP. Thus the sequence of exchanges in most of the dialogues can be modelled on the assumption that each speaker matches as closely as possible what their partner has just said and in general this leads quite directly to convergence on a mutually acceptable description scheme (see Garrod, Anderson & Sanford, 1984, for a discussion of a computer simulation operating along these lines). There are, however, limits on the simple application of the O/ICP as a heuristic for achieving mutual intelligibility. This is nicely illustrated in the opening section of the following dialogue which comes from the beginning of a second game: B: A: B: A:
... Tell me where you are ? Ehm : Oh God (laughs) (laughs) Right: two along from the bottom one up,: B: A: B: A: B: A: B: A:
Two along from the bottom, which side? The left: going from left to right in the second box. You're in the second box. One up : (1 sec.) I take it we've got identical mazes? Yeah well: right, starting from the left, you're one along: Uh-huh: and one up2? Yeah, and I'm trying to get to... etc.
[28 utterances later] B: You are starting from the left, you're one along, one up3? (2 sec.) A: Two along : I'm not in the first box, I'm in the second box: B: You're two along: A: Two up (1 sec.) counting the : if you take : the first box as being one up : B: (2 sec.) Uh-huh :
Special Determinants of Coherence in Spoken Dialogue
105
A: Well: I'm two along, two up 4 : (1.5 sec.) B: Two up?: A: Yeah (1 sec.) so I can move down one: B: A: B: A:
B:
Yeah I see where you are: [Right: o . k . : where are you, did you say? I'm ehm : see the right hand side : (1 sec.) The right: yeah:
One along : and : one up5. A: B: A: B: A: B: A: B: Α:
B:
So you're on ground level: No: (laughs) You're doing this differently from me (laughs) When I mean one up, I mean : one : One above the bottom? Uh-huh: O h well, I mean one up when I say the bottom : O h I see: Ο. K., right, call that: call that the bottom zero, right?
Right, so where are you now?
A: I'm in zero, two.
This extract contains a number of descriptions for the same point in the maze (descriptions in bold numbered 1-4 all describe point X in Figure 2 while description 5 describes point Y). The first thing that can be said about the various descriptions is that they all share a number of superficial features. Thus A s descriptions 1 ('two along... one up')is similar in form to B's 2 and 3 ('one along, one up') and A's 4 ('two along, two up') and even description 5 from Β ('one along and one up') exactly matches 2 and 3 although it is defining a different position on the maze (see Y in Figure 2). So there is some indication of a general superficial co-ordination between the respective descriptions. However, it is also apparent that the descriptions differ at the deeper levels of the scheme. The first descriptions by A and then Β both constitute path, in A's case(description 1) he is counting the nodes along the path - two horizontally and then one up. In B's case (descriptions 2, 3 and 5) the strategy involves counting the path links rather than the nodes traversed - one horizontal link then one vertical. However, by A's second attempt (description 4) the scheme has moved toward a co-ordinate one. Hence A is now characterising the same position as at the intersection of the second horizontal co-ordinate and the second vertical co-ordinate from the bottom left hand side of the maze. The overall pattern of alternate description is therefore one where speakers maintain a superficial consistency while at the same time introducing minor changes which seem to reflect a lack of confidence as to the exact interpretation of the partner's utterance. It is only at the point when it becomes apparent to both of them that their descriptions are not properly co-ordinated at this deeper
106
S. Garrod, G. Doherty
level that they eventually fall back on explicit negotiation. Yet even then the particular version of the co-ordinate scheme established in the negotiation is not retained for all subsequent descriptions in the dialogue (see above). Examples such as this force us to recognise a distinction between the O / I C P as a formally specified goal of dialogue and the various processing strategies brought to bear in achieving this goal. Simply adopting a process of co-ordinating output with input at a superficial level may on occasion be quite sufficient to insure deeper co-ordination at the level of interpretation. However, this is by no means guaranteed over a short period of time. For instance, in establishing a co-ordinate description scheme the same description (say 'three, three') of the same point in the maze may be consistent with 8 different co-ordinate schemes 2 . Achieving deep co-ordination therefore may require additional processing strategies. One such process relies on monitoring the input and deciding whether it leads in a straightforward way to an unambiguous interpretation. If it does not, then the strategy is to introduce a minor change in the description and then give this as a kind of repair feedback to your partner. A repeated application of such a strategy will usually serve to establish a mutually acceptable scheme within a reasonable period. However, if this "bid - counter bid" process fails as in the example above then the only solution is to enter some form of explicit negotiation in order to patch up the co-ordination failure. However, even in this rare eventuality the speakers usually solve the problem in an essentially distributed fashion. Hence, in the example above Β opens with "When I mean one up I mean: one :" and is then interrupted with A's completion "... one above the bottom?". So the proposed scheme is itself established through a process of co-ordinated speech. What we are suggesting is that O / I C P as a formal principle describes the conversational processing goal, while precisely co-ordinating output with input within any exchange only constitutes the lowest level strategy for achieving co-ordination. It clearly requires more sophisticated interactional processes, even at the limit including explicit negotiation of the semantics of the language. The idea that dialogues may be co-ordinated and hence coherent at a number of levels is explored in more detail in relation to the study of children's maze conversations described below. 3.1.3 Study 2: The development conversationalists
of co-ordination
among school aged
The second study which we discuss used exactly the same elicitation procedure for the dialogues, but with a large sample of child subjects of varying ages. The subject pairs were organised into three groups, 31 pairs taken from primary 4 2
F o r instance in a 5 X 5 structure 'three, three' w o u l d describe the s a m e point with origin top-left, top-right, bottom-left, bottom-right and with the numbers construed in at least two w a y s (e.g. horizontal-vertical, vertical-horizontal) in each case.
Special Determinants of Coherence in Spoken Dialogue
107
classes (7—8 years), 24 from primary 6 ( 9 - 1 0 years) and 25 from secondary 2 (11-12 years) and each pair played three games involving different maze configurations. The details are reported in Garrod and Clark (1993). Again the resulting dialogues were transcribed and each description exchange coded at various levels. However, in addition to tagging the respective description schemes underlying an exchange these were also coded for overall success judged in relation to how the exchange was terminated in the dialogue. A successful exchange was one judged to have been properly resolved with a clear indication of mutual acceptance of the description by both speakers as opposed to one which was clearly abandoned before any kind of resolution had occurred. Two coders independently classified all the exchanges in this way leading to a clear agreed classification in about 80 % of the cases which were then used in the analysis. In many respects the results from the study were surprisingly similar to those from study 1. First, all the various description schemes found in the adult corpus occurred in some proportion in the dialogues of children of all ages, although there was a tendency for the line and co-ordinate types to be used more often by the older children. Secondly, children of all ages showed clear evidence for convergence on a common underlying description scheme as measured both by the general type (i.e. path, figural... etc.) as well as the choice of common lexical items. In fact the inter-speaker consistency of choice is surprisingly high with around 6 0 % of A's descriptions exactly matching the subsequent description produced by Β (this is against a chance baseline of around 3 0 % ) . However, the pattern of description exchanges also show a particularly striking developmental effect in relation to the co-ordination process, an effect which has some bearing on the relationship between the goal of satisfying the O / I C P and the processing mechanisms used to achieve it. In discussing the result of the first study we outlined three increasingly sophisticated co-ordination strategies, (1) superficial co-ordination of output and input, (2) monitoring with minimal repair, and finally as an extreme measure (3) explicit negotiation following the repeated failure of strategies 1 and 2. There is some evidence that for the youngest group of players only the first strategy was being consistently employed. This comes from a detailed analysis of the distribution of failed exchanges in the dialogues from the various groups. N o t surprisingly there was a marked decrease with age in the incidence of failed as opposed to successful exchanges in terms of the criteria outlined above. Whereas the youngest group of players only managed to clearly succeed on 53 % of their exchanges the two older groups succeeded on 89 % and 93 % respectively. But it is not this that is important in terms of overall co-ordination strategy. The important difference that separates the youngest group from the two older ones relates to the consequences of success or failure for the scheme adopted in the next exchange. Whereas for the two older groups an exchange which immediately followed failure was almost twice as likely to involve a shift in scheme, for the youngest conversants it had no effect whatsoever on the
108
S. Garrod, G. Doherty
likelihood of shifting. This means that the youngest group of players did not seem to be able to overcome the pressure for superficial co-ordination even when their descriptions were inadequate in terms of the overall communication (see Garrod & Clark, 1993). In relation to the three co-ordination strategies, it seems that the younger and less skilled speakers are only able to engage in the low level strategy of superficial co-ordination and only to the extent that this works will they achieve a deeper co-ordination. Hence the relatively low number of truly successful exchanges in the younger group. 3.1.4 Discussion of studies 1 and 2 in relation to dialogue coherence The two studies outlined above can be used to illustrate the relation between co-ordination and coherence in dialogue. In general we have argued that coherence reflects the degree to which the processing is co-ordinated and as we have seen this co-ordination may be present at different levels. One way of characterising such differences in level of coherence and co-ordination is in relation to the overall goals of the dialogue. As Brown and Yule (1983) have pointed out, different types of conversation may serve very different goals. For instance, certain types of exchange are primarily used to promote social interaction. Thus a " H o w are you today?" followed by "Fine thanks" is hardly aimed at imparting significant factual information. Such exchanges are essentially interactional in nature and as such only need to be interactionally coherent. Hence the question does require an answer if it is to form part of a coherent exchange, but establishing the precise interpretation of 'being fine' is hardly likely to be at issue. O n the other hand task oriented dialogues of the sort examined in these studies do serve a fundamental information transferring function. The players need to know exactly where the other person is. These dialogues are therefore what Brown and Yule refer to as transactional and accordingly require a level of transactional coherence. Here it is important that speakers co-ordinate on a precise 'agreed' interpretation of the words that they are using. The appropriateness of adopting co-ordination strategies of varying degrees of sophistication may therefore depend very much on the degree of coherence required for that kind of dialogue. For dialogues that serve a primarily social function interactional coherence may be all that is necessary or even appropriate and so adopting low level strategies such as superficial O / I C P will be quite appropriate. However, when the communication requires transactional coherence we would expect the more sophisticated deeper co-ordination strategies to be used. To this extent any notion of coherence in dialogue can be related in a systematic way to processing mechanisms designed to achieve the corresponding level of co-ordination.
Special Determinants of Coherence in Spoken Dialogue
3.2
Co-ordination
among a community
of conversational
109
dyads
Studies 1 and 2 concentrated on how isolated pairs of individuals work together to establish a co-ordinated understanding of the language they are using in a conversation. The results indicated that such pairs working in isolation converge on idiosyncratic uses of the language. For instance, what is taken to be a 'row' by one pair may be treated as a 'column' or 'level' or 'layer' by another. It is tempting to conclude that what such interactions establish is a local set of language conventions which only hold for that particular conversation. However, this raises the question of how such local language conventions might relate to the more global conventions holding in the wider linguistic community. In his influential treatment of convention, David Lewis (1962) suggests that linguistic conventions serve as a community wide solution to the recurrent coordination problems presented in communication. It is therefore sensible to ask whether the same processing principles which seem to underlie coherence in isolated instances of conversation also have a direct bearing on how larger communities establish and maintain language conventions. The final study which we discuss was designed to explore the relationship between the local conventional systems set up by isolated communicators and those that might be established within a larger community. One of the problems with explaining global linguistic conventions as resulting from individual conversational interactions is that any one person's conversations will only give him or her a fragmentary and limited exposure to the linguistic community as a whole. A child 'learning' a language may only seriously interact with a handful of conversational partners, who in turn only interact with a few more and so on. How could such restricted exposure serve to establish and maintain language conventions in the larger community? The experiment was designed to explore this issue through a small scale simulation of a linguistic community who only have this kind of fragmentary conversational experience of each other. This was done by contrasting the development of convergent description schemes in isolated dyads playing several games together with that observed when the games are played with pairs repeatedly drawn from the same pool of players. 3.2.1 Study 3: Co-ordination group
in isolated dyads versus dyads drawn from
a
Again this experiment used the same maze game procedure as that in the first two. However, the subjects were split into two groups of ten players. The first group (control) were paired into five dyads who then played nine games in sequence, so they were essentially like the pairs in the first study but with a much more extended experience of the game. The second group (experimental) also played nine games each, but on each occasion with a different partner. In this way by the end of the study everyone in group 2 had played one and only one game with each other member of the group.
110
S. Garrod, G. Doherty
The basic questions of interest are first how the choice of preferred scheme compares between dyads in the two groups and second how the overall history of co-ordination compares. Again this can be determined by classifying the different description schemes underlying each exchange. Looking first at the control group, their pattern of choice of scheme over the first two games relates to that found in study 1, with instances of path, line and co-ordinate description. For the remaining 7 games the dyads fall naturally into two groups with three of them predominantly using line type descriptions and the remaining two pairs using co-ordinate. Again this is exactly as might be expected from the first study where players tended to shift to the more abstract schemes as they became familiar with the game. The picture which emerges for the experimental group is interestingly different. If we consider the first two games played by any individual we find a pattern not dissimilar to that in the control group, although both co-ordinate and line descriptions predominate there are also a proportion of path and figural descriptions being used. However, beyond the first two games the experimental and control groups diverge. In all of the last seven games played by any pair in the experimental group only one particular version of the co-ordinate scheme is being used. Furthermore, it is a version of the co-ordinate scheme only seen in one pair from the control group. So the distribution of schemes reflect a somewhat different situation for the two groups. While the controls converge on idiosyncratic schemes as in study 1, the experimental group speakers all rapidly converge onto exactly the same scheme. The second question concerned the co-ordination process in the two groups. Again to examine this we considered the degree to which any two consecutive exchanges in a dialogue exactly matched each other. The interesting measure here is with consecutive exchanges initiated by different speakers in the pair. A running average of such inter-speaker consistently was calculated for the first three games then the middle three and finally the last three played by any pair in the two groups. The results are shown in Figure 3. As can be seen in the figure there is a striking and statistically reliable interaction between group and game experience. Thus, while the experimental start out being marginally less consistent with their partners, by the time they have played six games they are reliably more consistent even though they have not encountered that partner in any previous game. This result almost certainly reflects the fact that after only being exposed to three members (i.e. 3 0 % ) of the group all speakers have converged on a single consistent description scheme, more consistent even than the schemes used by the isolated pairs. It therefore seems that the kind of linguistic convergence observed in isolated pairs also occurs within a group on the basis of only fragmentary conversational exposure to the group as a whole. But how does it come about? Is it purely through the same mechanisms that apply in the isolated dyads, i.e. through applying one or more of the co-ordination strategies seen in studies 1 and 2, or does it involve special additional processing strategies? The answer to this question is complicated.
Special Determinants of Coherence in Spoken Dialogue
* •
111
9 control • experiment
Figure 3: Graph showing the running average of consistency in sequences of descriptions w h e r e the speakers change. The chance level of consistency is about .3 and maximum consistency is 1. Data are taken from games 1 - 3 , 4 - 6 , and 7 - 9 , and are shown for both the experimental group and the control.
On the one hand it is clear that at a formal level speakers within the group comply with the O/ICP even more rigourously than speakers in the isolated dyads. On the other hand at a more mechanistic level, there seem to be differences in the strategies employed. Figure 4 shows the proportion of conversations containing instances of explicit negotiation, organised both by game and condition. Looking first at the control group, there is very little evidence of explicit negotiation, just as we would expect on the basis of study 1. Furthermore, all the conversations containing it occur in the first three games played. The situation with the experimental group is very different. Here, the incidence of explicit negotiation seems to increase from a low level in the early games to a peak (100 %) in the middle three followed by a slight fall off towards the end. So it might be argued that the experimental group manages their linguistic co-ordination in a rather different way from the isolated speakers in the control group. But it is not necessarily as straightforward as it seems. For instance, in the first study we found that explicit negotiation was used almost exclusively to fix details of co-ordinate description schemes, and of course this is the type of scheme adopted by all the players in the experimental group. So on that count one might expect the incidence of such explicit negotiation to be much higher for this group. Also when one looks more carefully at the form of the negotiation, it is extra-ordinarily similar across the dialogues. In each case, one of the speakers begins with some statement such as "It's letters across and numbers down, O.K.?" which is then affirmed by the partner. So the negotiation that occurs constitutes in every case a simple confirmation of a scheme that seems
112
S. Garrod, G. Doherty
Game Block Figure 4: Histogram showing the distribution of instances of explicit negotiation across the three sets of games for both the experimental and control groups.
to be in some way presupposed by the pairs in the group. In other words, the negotiation itself depends upon a prior assumption about the scheme, at least to the extent that it should be a form of co-ordinate description. In fact, the circumstances in which the negotiation occurs are in most respects rather different from those in study 1, since it always happens at the beginning of the game and so does not serve the kind of last ditch repair function discussed earlier. On the surface there therefore does seem to be some indication that the details of the co-ordination mechanism may be rather different within a group than between isolated conversants. However, we really need to have more evidence before coming to a stronger conclusion. What is striking is the degree to which being part of a group, albeit with only partially overlapping exposure, does seem to lead to an increase in inter-speaker co-ordination and associated transactional coherence in these dialogues. So there is every reason to believe that the O/ICP as a formal principle may play an important role in maintaining and establishing linguistic conventions, which in turn obviously underpin coherent communication. 4. General conclusions
and extensions of the definition
of dialogue
coherence
At the outset we argued that the notion of coherence requires special treatment in relation to dialogue. The fact that dialogues are by definition interactional means that the connections between utterances and their consistency necessarily reflects the relationship between what each contributor is saying. This leads naturally to the view that coherence in dialogue reflects in some way the degree to which the language processing is co-ordinated between the interlocutors. The paper concentrated on just one low-level co-ordination principle which
Special Determinants of Coherence in Spoken Dialogue
113
we call Output/Input Co-ordination. The basic idea behind this is that speakers should aim to match as closely as possible their processes of production and comprehension at all levels. We argued that such a principle has considerable processing utility both for the individuals concerned and the collaboration as a whole. Exploring the consequences of the O / I C P in relation to the maze game dialogues, we found some evidence to indicate that co-ordination may operate at different levels. Thus, younger conversationalists, seem to be more liable to co-ordinate their utterances at a superficial level, whereas older speakers will attempt to converge on a common language which supports co-ordination at all levels. In turn, this led to the idea that dialogues may be considered coherent in different ways. Internationally coherent dialogues only need to hang together at a superficial level, whereas transactionally coherent dialogues can only come about through a deep co-ordination of the language processing system. Although we have concentrated exclusively on one form of co-ordinated processing, it is clear that executing coherent dialogues requires more than just matching input and output. For example, another side to dialogue co-ordination is seen in the management of what are sometimes called dialogue games. A number of conversational analysts have argued that the utterances in any stretch of dialogue can be likened to moves in a game, where the game can be thought of as dialogue exchange unit associated with one conversational goal. For example, if a speaker's goal is to check their understanding of an ambiguous utterance they would use a CHECK game. This would be initiated with a CHECK move which would then elicit a response move - typically an affirmative REPLY-Y move or possibly a negative REPLY-N. In turn if all was well the game might be terminated by the initiator with a simple ACKNOWLEDGE move. So in this example, the dialogue game consists of three moves which if successfully executed would satisfy the goal of finding out whether or not the CHECKer's interpretation of the original message was correct. A full explanation of this kind of dialogue game analysis can be found in Kowtko, Isard and Doherty (1991). Clearly managing dialogue games of this sort represents a form of co-ordinated language processing and as such may serve as a basis for enriching any definition of coherence. Dialogue is coherent at the game analysis level if appropriate responses are given to initiating moves, and appropriate feedback is given to them. If this does not happen then games are said to have been abandoned. So in terms of interactional coherence simply making legal conversational moves is quite sufficient. However, it is noticeable that even poor communicators may well answer questions that are asked of them and receive some feedback. A stronger criterion of communicative competence would take into account the appropriateness of the question and quality of the answer given. This will depend on how co-ordinated the question answerer is with their dialogue partner, which in turn will affect how well they design their answer. A second determinant of communicative competence may be how efficiently speakers use the
114
S. Garrod, G. Doherty
Dialogue Game repertoire that they have. For example, a speaker may know how to initiate a CHECK game and what to expect as a response to that initiator, but they might not recognise the conversational situations in which that CHECK game should be applied. Thus, it appears that dialogue game structure is more relevant for interactional coherence than transactional coherence. Put another way it is possible to sustain an interaction as long as you don't violate certain game structures (e. g. by ignoring a question that is asked of you). However, this may not be sufficient to sustain a dialogue at the transactional level which would require using the dialogue games as tools to satisfy the transaction. Thus, it would not be sufficient to give a REPLY-Y as a response move to a CHECK, if the respondent did not fully understand the contents of the CHECK move or in some way disagreed with it. Fuller discussion of the relationship between dialogue game structure and coherence is beyond the scope of this paper, but it should be apparent that the general idea of relating coherence to co-ordination has potentially interesting theoretical consequences in this area. Whether one looks at dialogue purely in terms of the content of what is being said or the more deeply embedded intentions underlying dialogue games it is apparent that any notion of coherence is going to have take into account the fundamental interactive and collaborative nature of the activity.
References Clark, Η. H., & Marshall, C. R. 1981 Definite reference and mutual knowledge. In A. K. Joshi, I. A. Sag & B. L. Webber (Eds.), Elements of discourse understanding. Cambridge: Cambridge University Press, 10-64. Clark, Η. H. & Wilkes-Gibbs, D. 1986 Referring as a collaborative process. Cognition, 22, 1-39. Garrod, S. & Anderson, A. 1987 Saying what you mean in dialogue: A study in conceptual and semantic coordination. Cognition, 27, 181-218. Garrod, S., Anderson, A. & Sanford, A. J. 1984 Semantic negotiation and the dynamics of conversational meaning. Tech. Report 1, Glasgow Psychology Department. Garrod, S. & Clark, A. 1993 The development of dialogue coordination skills in school-aged children. Language and cognitive processes. 8(1), 101-126 Levelt, W. J. M. & Kelter, S. 1982 Surface form and memory in question answering. Cognitive Psychology, 14, 78-106. Lewis, D. K. 1986 Convention: A philosophical study. Cambridge, MA: Harvard University Press. Kowtko, J. C., Isard, S. D. & Doherty, G. M. 1991 Conversational games within dialogue. Proceedings of the ESPRIT/DANDI Workshop on Discourse Coherence, University of Edinburgh, 4—6th April.
HANS-JÜRGEN EIKMEYER, WALTHER KINDT, UWE LAUBENSTEIN, SEBASTIAN LISKEN, HANNES RIESER and ULRICH SCHADE
Coherence Regained 1. Methodological
remarks
Coherence is regarded here as a property of dynamic systems such as the language production system or the language reception system. Only the linguistic entities produced or received by these systems can be directly observed. However, one cannot expect that these entities reflect all of the complicated processing going on under their production or reception. They certainly reflect some of the processing and it is assumed here that structural properties of the linguistic entities in some sense mirror their own processing. When dealing with dynamic systems it is generally agreed upon that one should also study the system in a state of disorder or under perturbation. With respect to these states one might be able to find out more about the system than in cases where the system is smoothly operating. In order to examine the nature of matter, physics normally examines systems at phase transitions, i.e. a gas becoming a liquid or a liquid becoming a solid body or the other way round. Analogously - in order to examine the nature of language - it is wise to concentrate upon linguistic data and phenomena which incorporate perturbations or incoherences as integral parts. Such data can be easily found in spoken language, among them are the so-called repairs. 2. Incoherence
in linguistic
entities
The primary data dealt with had been elicited in blocks-world experiments (see Forschergruppe Kohärenz, 1987). The design of these experiments was as follows: Two subjects had to interactively solve an instruction task. They could not see each other, but they could freely communicate. One of the subjects had to instruct the other to build a configuration of building blocks. The resulting spoken interaction was tape recorded and transcribed. These data contain interesting phenomena on both the syntactic and semantic level, i.e. data incorporating perturbations. On the syntactic level one finds different kinds of repairs such as covert repairs, as well as the prototypical member of the class, error repairs. The class of covert repairs contains pauses, hesitations and repetitions
116
Η.-J. Eikmeyer, W. Kindt, U. Laubenstein, S. Lisken, H . Rieser, U. Schade
as well as combinations of them as shown in (1). Error repairs are shown in (2) and (3). Syntactic incoherence is also found in connection with pivot constructions as e.g. (4). An interesting phenomenon on the semantic level are semantically wrong, but pragmatically successful descriptions as in (5) and (6). Although the subjects in the experiments had to manipulate three-dimensional blocks, they frequently refer to them with two-dimensional notions. Sometimes one subject corrects the other (cf. (7)), but most of the time the two-dimensional notion is simply accepted. (1) der is also nich nich nich wie die . Wie die eh von denen mehrere da sind [it is I mean not not not as those . As those eh of which several are there] (2) und den linken eh Quatsch den roten stellst du links hin [and the left one eh nonsense the red one put you to the left] (3) die die Grund die Grundform sind is nich is nich eckig [the the base the base form are is not is not angular] (4) und jetzt is eh oben auf dem . grünen . beziehungsweise blauen viereckigen eh Säulenteil? is oben noch ein gelbes Dach [and now is eh on top of the . green . respectively blue quadrangular eh pillar? is on top a yellow roof] (5) auf dem roten Zylinder hab ich den grünen das grüne Quadrat [on top of the red cylinder have I the [masculine] green the [neuter] green square] (6) und dann zwei . grüne Klötze die kleinen dicken . quadratischen Würfel [and then two . green blocks the small fat . quadratic cubes] (7) P: so dann die nächst kürzeren Klötze das sind Quadrate [so and then the next shorter blocks they are squares] Q: η Würfel [η cubes] Ρ: ja Würfel [yes cubes] We selected sentence-internal self-repairs as our cases of interest. (2) and (3) belong to this class, (7) is from the syntactical point of view an other-repair, but also sentence-internal. According to Levelt (1983) the structure of a prototypical repair is the following: (8a)
Go from
left
again to
reparandum
uh...,
from
editing phase
pink
again to blue
alteration
This structure had some impact on the parsing strategies for repairs touched upon in section 4.3. Apart from this structure Levelt (1983) proposes a classification of repairs into several subclasses some of which have already been mentioned. This classification, however, is both inhomogenous and incomplete. It is inhomogenous since it uses formal categories for the definition of some sub-
Coherence Regained
117
classes of repairs and semantic categories for others. The latter categories, moreover, heavily depend on interpretations of the production process. Since quite a few of the repairs we found in the blocks-world data could not be correctly classified, the Levelt classification is also incomplete. Thus, we propose here a classification subdividing repairs by operational criteria only. It contains two subclasses, namely bridging repairs and supplement repairs (cf. Kindt & Laubenstein, 1991). The two main subclasses of repairs directly correspond to the syntactic classification used in the formulation of the respective repair. Roughly speaking, bridging repairs are characterized by syntactic perturbations found in what we call the reference sequence or in the repair initiation (cf. (8b)) whereas supplement repairs show no syntactic perturbation at all (cf. (9)). (8b) a bridging repair: Go from left again to
reference sequence
uh...,
from pink again to blue
repair initiation
attempt at repair
repair sequence (9)
a supplement repair: die stehn nur die Tiefe des Steins . I Τ reference sequence
ausnnander . also
die Dicke
I
repair attempt at initiation repair
repair sequence
3. A theoretical conception of coherence The theoretical conception of coherence advocated here is based on properties of dynamic systems. The stability of such systems serves as the central explicans for coherence. We call the state of a system a stable state if minor perturbations do not affect the system's behaviour. A reception system, e.g., is in a stable state if it analyzes even moderately noisy inputs successfully. In general, a dynamic system assuming a stable state after some processing is said to be coherent, a system assuming an unstable state is said to be incoherent. The instability or incoherence of the system is caused by some problem which has arisen in the system. What sort of a problem this can be depends on the system looked at, especially whether it is a production or a reception system. Consequently, we will take a closer look
118
H.-J. Eikmeyer, W. Kindt, U . Laubenstein, S. Lisken, H . Rieser, U. Schade
at several problems in section 4. If nothing were to happen, the system would perhaps stay in the incoherent state. We assume that besides the system talked about thus far - the object system - there is a second system - the meta-system which interacts with the object system in the following way: Firstly, the metasystem detects the incoherent state of the object system and, secondly, both the meta-system and the object system try to transform the unstable, incoherent state into a new stable and coherent state (cf. figure 1). system
incoherent state due to a problem Figure 1: A system in an incoherent state.
With respect to the prototypical repair of Levelt (1983) things will roughly look as depicted in figure 2. Coherence is lost by the object system at ' G o from left again to', but - in cooperation with the meta-system - coherence is regained at 'from pink again to'. It has to be mentioned, however, that this picture is oversimplified. The location of the incoherent state in this example depends on whether one describes language production or language reception processes. A speaker, i.e. a production system, will detect the incoherent state earlier than the listener, i.e. the reception system. Moreover, the exact location of the incoherence cannot be determined in either case. The same holds for the regained coherent state as well. meta system object system
incoherent state due to a problem
coherent state
Figure 2: Coherence is regained by cooperation of the object and the meta-system.
This dynamic view of incoherent states being transformed into coherent ones can be projected onto the linguistic entity involved. Consequently, we call the initial part of an utterance like the one above (including the reference sequence) incoherent. When the repair sequence is completed we call the corresponding part of the utterance coherent. Thus, coherence or incoherence of a linguistic entity - the so-called object coherence or incoherence - is derived from the
Coherence Regained
119
dynamic coherence or incoherence of the system the entity is embedded in. In other words, a dynamic notion of coherence is prior to a structural one. As a consequence, however, it is not possible to say that a linguistic entity Ε is coherent in general. One can only say that the entity Ε is coherent in a system S. Thus, there is no objective coherence of a linguistic entity per se.
4.
Exemplifications
In the following sections several exemplifications of the theoretical conception of coherence developed above will be discussed. In all cases both the object and the meta-system will be briefly characterized. It will be described what exactly the incoherent state of the object system looks like and what the reason for assuming such a state is. In sections 4.1 and 4.2 the conception will be applied to the process of language production. With respect to section 4.1 the reason for the system changing into an incoherent state is a problem of search, with respect to section 4.2 itis aproblem of correctness. In both cases it will be shown how the meta-system succeeds in restoring coherence by inducing different kinds of repairs. In section 4.3 the conception will be applied to the process of language perception. Parsing an utterance with a repair leads the system into an incoherent state, the problem being nonwords and/or corrupted syntactic constructions. Coherence is regained in this case by applying special parsing strategies. In the final section 4.4 it will be demonstrated, how sentence internal repairs can be modelled with grammatically regular constructions in a special kind of grammar.
4.1 The process of language production
and problems of search
One type of problem which might occur during the production of an utterance is a search problem resulting in an incoherent state of the production system. A problem of search is at stake, if some subsystem of the production system requires another subsystem to deliver a result, and there is no such result available. A typical example for a problem of search is the "tip of the tongue-effect" (cf. Brown & McNeill, 1966; see also Brown, 1991, Burke et al., 1991, Jones, 1989, Jones & Langford, 1987, Meyer & Bock, 1992). In order to overcome problems of search, the production system might be assumed to simply wait until the required result is available and not produce anything in the meantime. In a dialogical setting, however, a speaker who does not produce something runs the risk of losing his turn, i.e. of losing the right to speak. If the speaker wants to keep the floor, he has to use a "better" strategy. This is where the meta-system comes into play. The incoherent state of the object system in such situations is due to antagonistic requirements: (i) produce something in order to keep the floor and (ii) use all resources in order to solve the problem of search. The meta-system can overcome this problematic situation by initiating the production of a so-called covert repair. The production of
120
H.-J. Eikmeyer, W. Kindt, U. Laubenstein, S. Lisken, H. Rieser, U. Schade
"something", i.e. a covert repair, is likely to be sufficient for keeping the floor. Additionally, the object system can use more of its resources for solving the problem of search. Covert repairs include repetitions, hesitations, pauses, and combinations of these three (cf. (10), (11)). Among these three, repetitions are assumed to be most efficient in keeping the floor. Hesitations are less likely to do this job and pauses are still worse. They can be used only in combination with repetitions and hesitations or in situations where it is very unlikely at all that the speaker might lose his turn (for more details cf. Eikmeyer, 1987 and Schade & Eikmeyer, 1991). (10) wahrscheinlich sind meine Beispiele soo sprunghaft und und und eh ehm zu zu telegraph[probably are my examples so erratic and and and eh ehm too too telegraf-] (11) und jetzt müßteste . nja gleichzeitig . eh . auf die linke Seite auf das lin linke Ende dieses blauen Steins [and now should you . Yeah simultaneously . eh . on the left side on the le left end of this blue block] We have modelled this behaviour of the production system in two different ways: in a symbolic processing model using parallel communicating processes and in a connectionist model exploiting properties of network dynamics. In the symbol processing model the object system consists of several subsystems: - the turn-taking system which realizes the turn-taking rules of Sacks, Schegloff and Jefferson (1974), - the syntactic-semantic planning system - the phonological-coding system and - the motor-programming system (cf. Bock, 1982, for these subsystems). Each of these subsystems is supposed to operate sequentially, but they communicate via buffers with the other subsystems, all operating parallel. That means that one subsystem is required to write its result into a specified buffer where it can be picked up by another subsystem which will further process this result. Consequently, the syntactic-semantic planning component sends its result to a buffer which will be read by the phonological-coding system. This system, in turn, writes its results into a buffer where it will be read by the motor-programming subsystem (cf. figure 3). Communication between the subsytems is supposed to be asynchronous, i.e. a process sending its result to a buffer does not wait until the receiver has read it and a receiver does not wait until the sender has sent its result. However, in a stable state a subsystem expecting an input should always find something in the buffer it reads. Thus, the problem giving rise to the instability is modelled as a synchronization problem between the subsystems involved. More specifically, the problem which finally leads to the production of covert repairs results
Coherence Regained
motor
turn-taking component write
121
programming
>
«J33-
write syntacticsemantic planning
phonological write
read
coding
Figure 3: Communication of subsystems.
from a faulty communication between the planning- and the coding subsystem via buffer (1). The coding system expects a result of the planning process in this buffer, but in case there is no such result, i.e. in case buffer (2) is empty, the problem is there (cf. Eikmeyer, 1989). In other words, the system of parallel communicating subsystems is in an unstable and incoherent state. This incoherent state can be easily detected by the meta-system since it simply has to check whether a buffer is empty when a subsystem tries to read this buffer. In a connectionist model there are no such things as symbols sent from one process to another, there are no buffers and the like. The process of language production is modelled by a flow of activation in a network of nodes or processors. These nodes represent linguistic entities relevant for the production process. Each node is characterized by its activation value which so to speak represents the strength of its being involved in a specific stage of the production process. The activation of the nodes changes in time according to the laws of an activation function which takes into account the current activation of the node in question and the activation of all nodes it is connected to. These connections can either be excitatory, i.e. one node stimulates another one, or they are inhibitory, i.e. one node impedes another one. According to standard techniques in the field of connectionist modelling, the network of nodes is arranged in levels encompassing nodes which represent linguistic entities of the same "size" and type. Thus, the usual linguistic levels as e.g. the syntactic level, the semantic level, the level of syllables or syllable parts,
122
H.-J. Eikmeyer, W. Kindt, U. Laubenstein, S. Lisken, H. Rieser, U . Schade
and the level of phonemes all appear in such networks. Let us concentrate on the level of phonemes here, because the result of a language production process is under a certain perspective - a sequence of phonemes. All nodes of the network representing phonemes are collected in the phoneme level and we assume the network to operate in such a way that always at the end of a certain time interval one of the phoneme nodes is highly active whereas the others have a much lower activation value. The phoneme represented by the node with highest activation counts as being produced by the model. Thus, under this perspective the process of language production is a process of continued selections of the relevant phoneme node. What has been discussed with respect to the phoneme level analogously applies to all the other levels. Thus, we can talk about syllable selection, word selection and so on. The networks we use are constructed in such a way that, with respect to all levels and under normal conditions, i.e. in a stable state, there will always be one node with a considerably higher activation than the other nodes of the level in question. Moreover, stability requires the activation value of this node to exceed a special value, the so-called selection threshold. Now, this concept ultimately allows for a formulation of a problem of search: there is no node in a level with an activation exceeding the selection threshold. In other words, the network has not yet decided upon a certain linguistic entity (cf. Schade & Eikmeyer, 1991). Such a state of the network is correspondingly unstable and it gives rise to the production of a covert repair. Until now we discussed two possible models for incoherent states resulting from problems of search: in the symbolic model a buffer is empty and in the connectionist model there is no node exceeding the selection threshold. In either case the meta-system initiates the production of a covert repair in order to keep the floor (see below). At the same time, however, the object system keeps on working. If this finally leads to success, i.e. if the buffer in question has a value or if some node reaches the selection threshold, it is assumed, that the object system is in a coherent state again. The meta-system will then stop producing covert repairs. The open question is, how the meta-system determines which variant of covert repairs, i.e. pauses, hesitations and repetitions, it should produce. We determine the variant produced by the relation of a parameter p, with respect to two threshold values t, and t 2 (0 — »
kleine tapfere ruhig tief und der mann schläft
vp adj_p adj conj
η η adj
adv adv_p conj
adv
Coherence Regained
129
L is the subset of G that consists of [L1]-[L8]. The categories used in this context-free grammar have the following interpretation: s = sentence, np = nounphrase, vp = verb-phrase, det = determiner, adj_p = complex adjective phrase, η = noun, adj = adjective, conj = conjunction, ν = finite verb, adv = adverb, adv_p = complex adverb-phrase. Using G we get two 'non-deviant' parses (i.e. parses without word-segmentation problems) of string (16) up to 'aeh', which can be analyzed according to the following two sequences of rules: A: [1], [2], [L6], [4], [LI] A': [1], [3], [L6], [LI] We now give an outline of the strategy adopted by the parser to establish these two parsing alternatives. The first step is an unrestricted search for 'first words', i.e. words the forms of which match an initial segment of the input string. This task is performed by the segmentation procedure described above. In our example the only segmentation obtained in this way is that of the determiner "der". This result is used to start upon a classical top-down strategy which requires a partial syntax-tree to derive from it a categorial hypothesis for the segmentation of following words. Thus we have to build such a partial syntax tree for every parsing alternative pursued, hence the notion of a 'parse forest'. In order to build these partial syntax trees we must find 'syntactical embeddings' of the segmented first words, i.e. sequences of rules that start with the given lexical rule for the first word, continue with rules that have the left-hand side of the preceding rule as their 'left corner' (first element on the right-hand side), up to a final "s"-rule. Different possible syntactical embeddings can act as another source of non-determinism. This is the case in our example, since the category "det" is the left corner in rules [2] and [3], the left-hand side of these both being "np", which in turn is the left corner in s-rule [1]. Consequently we arrive at a first 'parse forest' shown in figure 7. The two alternatives A and A' have parsed the word "der" according to the sequences [1], [2] and [1], [3], respectively. They are represented in the implementation by pointers to the two "np"-nodes. Generally speaking, a parsing alternative is implemented as a pointer to that node under which the word last segmented has just been placed. The node pointed at by a parsing alternative is called an 'active node'. (Another item stored for each parsing alternative is the number of graphemes segmented by it from the input.) What follows now is the 'normal procedure', the main part of the object system. First we have to compute a categorial hypothesis for the segmentation of the following word, which is equivalent to finding a place in the syntax tree to place the next segmented word under. The simplest way to achieve this is to look up the next category in the 'active rule', the rule represented by the currently active node. (We should remark here that the nodes in our 'syntax trees' are actually labelled with rules, not just with the categories normally used in a syntax tree and shown in the illustrations.) For alternative A' no more action needs to be taken.
130
Η.-J. Eikmeyer, W. Kindt, U . Laubenstein, S. Lisken, H . Rieser, U . Schade
Figure 7: A parse forest. N o t e how we have attempted to organize the parse forest in an economic way, allowing both ,,ηρ"-nodes dominated by the same ,,s"-rule [1] to share a single dominating ,,s"-node. The double line linking the two ,,np"-nodes indicates this and is best imagined as indicating a third dimension, since a standard syntax tree already occupies two dimensions. It should further be noted that we distinguish,syntactical' nodes, depicted by filled circles, from ,lexical' ones (hollow circles).
But in the case of alternative A the next category is "adj_p", a non-lexical category, and actually one would expect to see an "adj_p"-node before agreeing that we are pointing at the correct node to place the next segmented word under. We must therefore find all "adj_p "-rules - another source of non-determinism and, for each rule found, we must add an "adj_p"-node under the currently active "np"-node and finally establish a parsing alternative pointing at that node, multiplying, as one could say, the hitherto pursued alternative A. In the case of our example only one "adj_p"-rule is found, and we could simply say that alternative A 'relocates its activity' from the node " n p " to a single new "adj_p"-node. After adding that node, both alternatives have the lexical category "adj" as their categorial hypothesis and we can proceed to the second step of the normal procedure, the segmentation procedure restricted by a categorial hypothesis. Matching the lexicon and the categorial hypothesis with the input string after "der", we find the word "kleine" for both A and A'. We add a lexical node just under the active nodes and arrive at the parse forest shown in figure 8. Alternative A points at the node denoted as "adj_p", A ' points at "np". The first step of the normal procedure is applied again and consists now, for both alternatives, of a simple switch to the next category in their respective active nodes, yielding the categorial hypotheses " c o n j " and " n " for A and A ' respectively. But in both cases no successful segmentation is found. The meta-system starts to work and recognizes the hesitation "aeh" which activates the repair handling procedure. This procedure applies the following strategies: (1) continue to use the currently active rule (2) go back to previous constituents step by step up to the beginning of the active rule; further, go back (in one single step) to the beginning of the rules at all nodes dominating the currently active node
Coherence Regained
131
Figure 8: Another parse forest. We mark the ,active' parts of the parse forest - i.e. active n o d e s , their dominating nodes up to the t o p level and lexical nodes added since the last illustration - in black and the remaining parts in grey.
(3) reactivate previously discarded parsing alternatives (4) look for structurally similar starting points at all levels. The idea is to generate a sufficiently large number of parsing alternatives according to these strategies and then to let the normal parsing procedure select those alternatives which are correct for the given input: The incorrect alternatives will, at a point soon after the current position in the input, run into a second word segmentation problem. Due to the lack of a second hesitation or fragment in the input at that point, these alternatives will be discarded. For alternatives A and A', new alternatives with the following categorial hypotheses are computed according to grammar G and strategies (1) to (4): strategies
A
A'
(1)
[4] conj
[3]
η
(2)
[4] adj
[3]
adj
(2 y
[2] det
[3] det
(2)"
[1] np
[1] np
(3)
-
-
(4)
-
-
Strategies (1), (2), (2)' and (2)" create new parsing alternatives which are added to the set of currently active parsing alternatives. In the case of (2)" the computed categorial hypothesis is non-lexical, but that problem is handled as described for the normal parsing procedure, "multiplying" both A(2)" and A'(2)" by the num-
132
H.-J. Eikmeyer, W. Kindt, U . Laubenstein, S. Lisken, Η . Rieser, U . Schade
ber of "np"-rules in G (i.e. doubling them). (3) and (4) cannot be applied, however, (3) because no parsing alternatives have been discarded so far and (4) because G does not specify structural parallels such as " d e r " vs. "die kleine..." in the role of a subject noun phrase. (2) is successful for both A and A ' for the following reason: A(2) predicts that the next word to be segmented will be an adjective because of the "adj_p"-rule [4] and similarly for A'(2) because of "np"rule [3]. Ultimately, only A'(2) will succeed, since it correctly specifies the subsequent adjective and noun (cf. rule [3]). After the parsing alternatives according to strategies (1)... (4) have been computed, the parse forest looks as in figure 9. With this significantly extended parse forest, control is again handed on to the object system (the 'normal procedure') and we are back on the familiar ground of the object system. As mentioned before, only A'(2) survives the next segmentation steps, indeed it takes only one word segmentation to eliminate all parsing alternatives but A(2) and A'(2), and one more segmentation to single out A'(2) as the only 'survivor'. A'(2) is the only alternative to complete a noun phrase analysis after parsing "mann", and in the subsequent process of 'switching to the next category' it 'goes up' to the dominating "s"-node, looks up the category " v p " there and expands it as already described. As there are three rules for " v p " in G we multiply the parsing alternative again by three, all of which continue to parse the verb "schlaeft". The alternative using the simple rule [5] reaches the end of its "s"-rule, checks that the input is used up as well, and reports a successful analysis. The other two try another segmentation according to the respective categorial hypotheses " a d v " and "adv_p", but recognize their failure as there is no further input to segment. The state of the parse forest after the successful analysis is shown in figure 10 below. (We leave out the two unsuccessful "vp"-alternatives, which would be located on a double line next to the " v p " node shown.) It should be obvious that the usual parsing systems do not have strategies like (1) to (4) above. The development of the repair parser led to the following, perhaps surprising result: A classical parser with additional segmentation and lexical access facilities plus a suitable control component is all one needs to handle certain kinds of repairs. The system can handle bridging repairs in the sense explicated in section 2, but not long distance repairs like 'The cat caught many mice uh the red cat oh sorry, shit, the tiger cat.' Even if non-regimented data similar to this invented example are hard to get by (cf. Nooteboom, 1980 and Levelt, 1983), one would nevertheless like to have a system strong enough to handle such cases showing scope ambiguities and multiple recursion. In principle this could be done at a technical level along the lines of the existing repair parser, but the repair strategies for constructions like these, especially the semantic and pragmatic aspects involved, are as yet not very well understood.
133
Coherence Regained
e 8 CJ. < s? ^ ä .Ρ ^g
„(NO. V ^ C •δ < '-'- Delta-X Delta-X What is still lacking is a representation facility which interprets this sequence of single text propositions as constituting a coherent whole characterized by the constant reference to a single topic (Delta-X) in the text under consideration. Recognizing linguistic forms of text coherency and providing appropriate thematic grouping operators for text knowledge bases is essentially what text coherence parsing is about. Even if parsers would perfectly recognize and normalize all occurrences of text cohesion phenomena in texts, missing recognition capabilities for text coherence phenomena would nevertheless produce understructured, incoherent text knowledge bases in the sense that global pragmatic indicators of discourse bracketing would be lacking. As a corollary, ignoring text cohesion phenomena in texts not only produces invalid, insufficient, and incohesive representation structures of the contents of the text, but actually precludes any computation of text coherence patterns. After the motivation for text structure oriented parsing of texts given in this section, we shall turn in the following section to an in-depth consideration of the text macrostructures dealt with in our model. Section 4 gives an outline of the methodological requirements for distributed text structure parsing. In section 5 basic patterns of thematic progressions are considered on the formal level of technical specifications and further illustrated by sample parses. As a by-product, the text coherence phenomenon discussed informally in this section ([1][4]) will be shown to be covered by the text grammar formalism and its associated text parser implementation.
3. Basic text coherence patterns In this section, we informally describe the basic patterns of text coherence focused on in this article. Following the original classification given by Danes (1974) three prototypical categories of thematic progressions have to be distinguished (Grimes, 1978, considers only slightly different forms of thematic movements in texts - topic expansion, topic shift, and topic splitting according to the order of Danes types below - , but essentially converges on the same basic patterns for coherence description): - Constant Theme. This pattern is characterized by the constant elaboration of one specific topic in a text (passage) by considering several of its conceptual facets. The following two paragraphs serve to illustrate this major pattern of
Distributed Text Structure Parsing
219
thematic progression (the reference points to the constant theme (Delta-X) are indicated by italics): [Tl.l] The Delta-X from ZetaMachines Inc. is a multiuser, multitasking computer system that runs Unix V.3 and comes complete with most of the software needed for business applications. The combination host computer/workstation is based on a 68020 processor, with dual 68000 processors providing peripheral processing. It has a 12-inch monochrome display and an integrated telephone handset and built-in modem. Internally, there's a 40-megabyte hard disk, a 1.2-megabyte 5Ά -inch floppy disk drive, 4.5 megabytes of RAM, a network controller, three RS-232Cports, and an ST-506 port. ... - Continuous Thematization of Rhemes. In contrast to constant themes, this pattern realizes a continuous shift of topics (depicted by bold italics; non-bold italics indicate fragmentary and, thus, invalid patterns of the constant theme type which are ruled out by the formal definitions supplied below). The first topic is elaborated by considering one of its conceptual facets. This facet then is taken as the next topic and elaborated by considering one of its facets, etc.: [Tl.2] The $12,000 Delta-X host/workstation (which can accommodate up to 32 workstations at the maximum) together with additional workstations each $1200 can be supplied from ZetaMachines Inc., 2999 State St., Santa Barbara, CA 93105. ZetaMachines' sales manager, Brian Wilson, says that they also plan to market the Gamma-Z, a dedicated CAD/CAM workstation based on a connection machine architecture. The underlying theoretical foundations are due to D. Hillis, a former M.I. Τ student who first developed an experimental prototype based on connectionist principles. - Derived Theme. Global textual structure can also be introduced by a variety of topics which share conceptual commonalities (facets) without necessarily stating the general concept explicitly. Technically, this is realized by a set of subordinates or instances of a common (often only implicit) superordinate/generic concept class. Suppose the illustrative text [XI = Tl.l + Tl.2] is augmented by several paragraphs dealing with AlphaBooster and Sigma-P, i.e., related machines on a similar level of detail as those passages which consider the Delta-X in [Tl]: [T2] The Delta-X from ZetaMachines Inc. ... runs Unix V.3 and... is based on a 68020 processor... It has a 12-inch monochrome display and an integrated telephone handset and built-in modem. Internally, there's a 40-megabyte hard disk, a 1.2-megabyte 5Ά -inch floppy disk drive, 4.5 megabytes of RAM, a network controller, three RS-232Cports, and an ST-506port... [= T l ] AlphaBooster is another UNIX machine. Peripheral devices include an 8-inch color display, a laser printer, and a keyboard. Based on
220
U. Hahn
a 68000 an 80-megabyte hard disk and a CD-ROM disk drive serve as external storage devices. ... The Sigma-P system makes available a lot of desirable application software. Customers may choose among a communication package, a database system, a word processor, and a variety of games. All this is for $8,000 and can be supplied from MagicTronics. ... The text implicitly has workstation as a derived theme, since that is the immediate common generic concept of those three instances (Delta-X, AlphaBooster, Sigma-P) that are explicitly mentioned in text [T2],
4. An outline of the knowledge parsing
sources involved
in distributed
text
structure
Prior to considering formal text grammar specifications and illustrating the operation of TOPIC'S distributed text parser with respect to these textlinguistic phenomena, this section deals with the knowledge sources involved in actually parsing a text. Basically (see, for example, Figure 1), these are constituted by the PARSE BULLETIN, an agenda-like data structure which records the single events of the parsing process, the DOMAIN KNOWLEDGE BASE which makes explicit the domain-specific background knowledge needed for the parse and various EXPERTs actually driving the parse through their message-passing behavior and the text grammar specifications they incorporate. The PARSE BULLETIN has a flat list structure. It records the sequence of text tokens as they appear in the text and, if they are considered relevant for the parsing process (see below), notes their class identifiers (FRAME item, ADJective, etc.). More important, various operations of the knowledge base and the parser are indicated at several positions (so-called parse points) in the PARSE BULLETIN. The type of operation being performed is indicated by a particular parse descriptor. Some of them are internal to the management of the knowledge base, e.g., DEFACT (default concept activation), while others indicate grammatical relations the currently considered text token is involved in. These entries reflect the recognition capabilities of the parser, such as NounATT (conceptual attribution relations among nouns), AdjATT (conceptual attribution relations among adjectives and nouns), or various cases of anaphora and textual ellipsis resolution. Parse descriptors are recorded together with the individual items affected by such an operation, the latter forming a parse tuple. The parser does not consider every token it receives from the input text at the same level of detail. Instead, it distinguishes between words which are significant to its performance (conceptually relevant ones, such as nouns or adjectives which denote complex concept units (frames) in the domain knowledge base, or linguistically relevant ones, such as negation particles, some conjunctions, quantifiers), and those that are not (among them many function words and a wide
Distributed Text Structure Parsing
221
variety of semantically indifferent nouns, verbs, particles, etc., each of which is assigned the class identifier NIL). The latter are simply discarded from further analysis, while the former are assigned lexicalized grammar specifications. The parser has thus been tuned towards partial parsing in a spirit similar to that advocated by Schank, Lebowitz and Birnbaum (1980). These analytic principles lead to a shallow understanding of texts, primarily on a terminological (taxonomic) level of knowledge representation. The DOMAIN KNOWLEDGE BASE (KB for short) contains frame representation structures. Each frame identifier (in bold face) is assigned a list of slots (enclosed by angular brackets) which are qualified in two ways. An indication of permitted slot fillers is given by an expression enclosed in square brackets characterizing the range of possible slot fillers, while actual slot fillers are enclosed in curly braces. Valid conceptual ranges are characterized by an expression [n-frame name] that implicitly permits slot filling by all those frames which are a subordinate or an instance of frame name. For instance, (see Figure 1) the qualification [a-manufacturer] characterizing the permitted slot fillers of the slot of the frame Delta-X allows ZetaMachines Inc. to become an actual slot filler, since it is an instance of manufacturer. Actual slot fillers, on the other hand, can be taken as facts either known a priori to the system or acquired continuously from the text as its understanding proceeds during the parse. As usual in frame languages, the slot indicates that frame's immediate superordinate/generic frame with respect to the specialization hierarchy 1 . In addition, each concept has attached to it an activation weight counter. The values of the weight factors are enclosed by vertical bars attached to each item; if no bars explicitly occur, then a zero weight is assumed. Activation weights are incremented (starting f r o m zero-level activation) whenever a noun denoting its associated concept occurs in the text, and whenever structure-building operations affect the concept to which it is assigned. The manipulation of activation weights thus serves two distinct purposes. First of all, they are used as a kind of flag to indicate those items in the knowledge base the text has actually referred to. Concepts in the knowledge base with non-zero activation weight are taken as constituting its text knowledge subbase. Secondly, they are heavily used as an indicator of salience of concepts, either for parsing purposes, such as the determination of the current focus of the text (Hahn, 1990), or during the text condensation phase where, finally, text summaries are generated f r o m the text representation structures resulting from the text parse (Reimer & Hahn, 1988). The text grammar is composed of a set of distributed grammar experts (actors), each one responsible for some specific linguistic function (e.g., concept 1
This slot type has only been introduced to simplify the communication of our results. The actual frame model underlying the TOPIC system does not require an explicit slot, but computes specialization hierarchies directly from the knowledge structures under consideration (Reimer, 1986).
222
U. Hahn
PARSE BULLETIN [000]
EOP
[001]
The
NIL
[002] [002.1] [003]
Delta-X Delta-X 111 from
FRAME DEFACT NIL
[004] [004.1] [004.2]
ZetaMachines Inc. ZetaMachines Inc. Ill Delta-X 121 < manufacturer 111: ( ZetaMachines Inc. Ill | >
FRAME DEFACT NounATT
[010.3] [010.4]
Delta-X 141 < usage mode 111: [ multiuser ) > Delta-X 151 < operating mode 111: { multitasking | >
AdjATT AdjATT
[013.2]
Delta-X |6| < operating system 111: { Unix V.3 111 ] >
NounATT
[024.2]
Delta-X |7| < application domain 111: | business 111 } >
NounATT
[033.2] [033.3]
Delta-X 191 < C P U 111: { 68020 111 ] > Delta-X 191 < processors 111: ( 68020 111 | >
NounATT NounATT
[037.3]
Delta-X 1101 < processors 121: { 68020 111, 68000-1 111, 68000-2- 111 ) >
NounATT
[039.2] [039.3]
68000-1 121 < f u n c t i o n 111: [ peripheral processing | > 68000-2 121 < f u n c t i o n 111: { peripheral processing | >
NounATT NounATT
[046.2] [046.3] [046.4]
display 101; display-1 111 < size 111: { 12-inch ] > display-1 121 < presentation mode 111: { m o n o c h r o m e ) > Delta-X llll < i/odevices 111: | display-1 111 | >
AdjATT AdjATT NounATT
[046.5]
Delta-X llll < peripheral devices III: ( display-1 111 | >
NounATT
[050.2] Delta-X 1121 < communication devices 111: | telephone 111 } >
NounATT
[050.3] Delta-X 1121 < peripheral devices 121: ( display-1 III, telephone 111 ( >
NounATT
[053.2] Dclta-X 1131 < communication devices 121: {telephone 111, m o d e m 111 | > [053.3] Delta-X 1131 < peripheral devices 131: 1 display-1 111,.. ., m o d e m 111 | > [054] [055]
NounATT NounATT PUNCT EOP
DOMAIN KNOWLEDGE BASE D e l t a - X Il3l < self: a-workstation > < C P U 111: { 68020 111 } [a-processor] > < processors \2\: { 68020 111, 68000-1 111, 68000-2 111 } [a-processor] > < manufacturer 111: { ZetaMachines Inc. Ill } fa-manufacturer] > < operating system 111: { Unix V.3 111 } [an-operating system] > < communication devices I2l: {telephone 111, m o d e m 111 ) [a-communication dev.] > < i/o devices 111: { display-1 111 [an-i/o device] } < peripheral devices 131: ( display-1 111, telephone 111, m o d e m 111} [...] > < usage m o d e 111: { multiuser } [a-usage mode] > < operating m o d e 111: { multitasking } [an-operating mode] > < application domain 111: { business 111 ) [an-application domain] > < application software: [an-application software] >
I CTJ-XPERT E
E
^
·
2 & (i) newpos is maximal in the sense that —>3 Apos e [max( prepos, testpos )+l, textpos-1] : Δροε > newpos & conditions (c) - (g) apply, too. Otherwise, constant-theme{
3
textpos, testpos ) = *
References to entries in the P A R S E B U L L E T I N have the format (BuIletinPosition, TextToken, Classldentifier).
228
U . Hahn
Some comments related to the specification above may be in order: ( a ) The parameters supplied to constant-theme span the spatial extension in the PARSE BULLETIN which is searched for a constant theme; textpos always denotes the end of the current paragraph, i.e., the upper bound of the search area, while testpos delimits its lower bound. (b) The parse point characterized by textpos must contain the end-of-paragraph symbol 0. ( c ) Since testpos may be any arbitrary parse point preceding textpos, prepos denotes the specific parse point in the PARSE BULLETIN that contains the end-of-paragraph symbol which occurs right before the one at parse point textpos. (d) After fixing the search interval in the PARSE BULLETIN for which a constant theme is going to be computed, newpos allows for various choices as to how far a constant theme may actually extend in that interval. (e) theme may be any frame from the DOMAIN KNOWLEDGE BASE. ( f ) A theme is related to its various rhemes according to the following condition: at each bulletin position (k;) where theme occurs in THEMES within the interval delimited by newpos its associated slot (single rheme) is assigned to the set RHEMES. (g) To guarantee that theme is the only topic dealt with in the current text passage, we also require that no altertheme different from theme occur in the chosen interval such that it also forms part of THEMES; condition ( i v ) accounts for those cases where both, altertheme and theme, occur at the same bulletin position (cf., e.g., the parse points '046.*' in Figure 1). (h) To rule out insignificant occurrences of theme the cardinality of RHEMES must exceed a certain size level. ( i ) The maximality criterion for newpos rules out choosing too small values of newpos. Let us now consider an example of the computation processes involved in actually parsing text coherence patterns (see Figure 1). In order for the set of coherence experts to run it is necessary that the end of a paragraph (indicated by 0 and the EOP class identifier at bulletin position '055') be reached in the course of text analysis. Various coherence experts start execution upon consumption of the 0 symbol by the administration expert of the parser, but we shall limit our attention to the CT_EXPERT. After its initial activation, i.e., receiving c h e c k _ C T ( EOP, 0 5 5 , 0 0 0 ) a s a start up message, constant-theme is supplied with initial parameters, i.e., textpos = 055, testpos = 000. Obviously,prepos = 000, since the analysis starts for the first paragraph of the text; newpos may now range from '001' to '054'. Let us tentatively consider Delta-X as the theme4. The choice for newpos must accommodate the temporary breakdown of the selected theme be4
This is a proper choice. If improper choices were made, constant-theme significant result.
would not produce a
Distributed Text Structure Parsing
229
ginning from position '039', since we have k' = 039 e [001, 054] with altertheme = 68000-1 (or 68000-2) in THEMES and no proper triple (.Delta-X, slot*, 039) as required by condition g ( i v ) above. So newpos has to be adjusted properly to the bulletin position '039' where the constant theme pattern for Delta-X eventually terminates for the first time. This produces: constant-theme( 055, 000 ) = {Delta-X, {manufacturer, usage mode, mode, operating system, application CPU, processors }, 039)
operating domain,
As a consequence, the CT_ EXPERT issues a C T - g r o u p reading to the DOMAIN KNOWLEDGE BASE incorporating the constant theme together with its associated rhemes. Since the PARSE BULLETIN has not exhaustively been investigated with respect to its prospective coherence structures (newpos+1 < textpos), the CT_ EXPERT resumes execution5, now starting its examination of the PARSE B U L LETIN for a constant theme with a second set of parameters: textpos = 055, testpos = 039 (see the second expert placed into the foreground in Figure 1). Again, prepos = 000, but due to the new testpos parameter newpos is now in the interval [40, 54]. The evaluation of constant-theme( 055, 039 ) then starts with a proper choice of newpos = 054; obviously, testpos+l excludes 68000-1 (and 68000-2) from further consideration. Finally, we get constant-theme( 055, 039 ) = (Delta-X, { i/o devices, peripheral devices, communication devices }, 054) Note that the occurrence of display-1 at bulletin position Ό46' does not conflict with criterion ( g ) , since we also have Delta-X (thematically related to i/o devices znAperipheral devices) at that point (see criterion g ( i ν ) ) . Since after the second round newpos+1 = textpos (the end of the paragraph is reached), the computation process for computing a constant theme halts. Figure 3 represents the effects of grouping a constant theme and the rhemes referred to in the text passage by the shadowed area of the (frame) box (cf. also [055.1] and [055.2]). One can see that not all properties of the concept specified in the DOMAIN KNOWLEDGE BASE need to be considered within a particular text passage. Grouping in text knowledge bases represents the fact that the grouped items are treated coherently in a text passage. Our definition does not require the pragmatic unit constant theme to coincide with the formal text segment of a paragraph. Two constant themes have been independently identified within one paragraph, although, in this case, they 5
We here give a coarse-grained exposition of the structure of the distributed parser to simplify the communication. Actually, there are several low-level experts which account for appropriate parameter supply for the functions to be evaluated (e.g., constant-theme). As this low-level decomposition is not vital to the text parsing model outlined in this paper, we have chosen a more compact description style.
230
U. Hahn
PARSE BULLETIN [000] [001] [002] [002.1] [003] [004] [004.1] [004.2]
The Delta-X Delta-X 111 from ZetaMachines Inc. ZetaMachines Inc. Ill Delta-X 121 < manufacturer 111: ( ZetaMachines Inc. Ill ) >
EOP NIL FRAME DEFACT NIL FRAME DEFACT NounATT
[010.3] Delta-X 141 < usage mode 111: { multiuser } > [010.4] Delta-X 151 < operating mode 111: { multitasking } >
AdjATT AdjATT
[013.2] Delta-X 161 < operating system 111: ( Unix V.3 111 ) >
NounATT
[024.2] Delta-X 171 < application domain 111: { business 111 ) >
NounATT
[033.2] Delta-X |9| < C P U 111: { 68020 111) > [033.3] Delta-X 191 < processors 111: { 68020 111 ] >
NounATT NounATT
[037.3] Delta-X 1101 < processors 121: ( 68020 111, 68000-1 111, 68000-2- 111) >
NounATT
[039.2] 68000-1 121 < function 111: { peripheral processing } > [039.3] 68000-2 121 < function 111: { peripheral processing ) >
NounATT NounATT
[046.2] [046.3] [046.4] [046.5]
AdjATT AdjATT NounATT NounATT
display 101; display-1 111 < size Ii 1: ( 12-inch } > display-1 121 < presentation mode 111: { monochrome } > Delta-X Uli < i/o devices 111: | display-1 111) > Delta-X llll < peripheral devices 111: ( display-1 111) >
[050.2] Delta-X 1121 < communication devices 111: ( telephone 111 } > [050.3] Delta-X Il2l < peripheral devices 121: { display-1 111, telephone 111 ( >
NounATT NounATT
[053.2] Delta-X 1131 < communication devices 121: {telephone 111, modem 111 ] > [053.3] Delta-X 1131 < peripheral devices 131: { display-1 111,..., modem 111 } > [054] [055] [055.1] Delta-X { manufacturer, usage mode, operating mode, operating system, application domain, CPU, processors }
NounATT NounATT PUNCT EOP
[055.2] Delta-X {i/o devices, peripheral devices, communication devices }
CT
CT
D O M A I N K N O W L E D G E BASE | Delta-X 1131 | l ^ i ^ e r a t i n g j n o d e j j l : { multitasking ] [..." ^ [ < application domain 111: { business 111 } [...] > < application software: [an-application software] > < price: [a-price] >
Figure 3: Postconditions holding with respect to a constant theme pattern
Distributed Text Structure Parsing
231
denote identical topics. The procedure is robust in the sense that coherence islands are detected and minor thematic distortions are ignored; this is demonstrated by the recapture of the topical thread (which actually broke down at position Ό39') by the second instance of Delta-X as constant theme. 5.2 Basic text coherence patterns II: continuous thematization
of rhemes
Continuous thematization of rhemes is a coherence pattern which most significantly departs from the constant theme schema just outlined (in fact, both are mutually exclusive) in that it incorporates a continuous shift of the topics being considered. The process starts with a theme and some comment on that theme which we shall call rheme. Now this rheme is taken as the next theme that is elaborated by an associated rheme, etc. Figure 4 illustrates this continuous change of issues considered in a text. The PARSE BULLETIN contains a sequence of local theme-rheme pairs with frameTt being the current local theme and slotfillerTi being the current local rheme. Text coherence is due to the fact that the current local rheme (slot fillerTj) becomes the next local theme (frameri+1), with slot fillerTi and frameTi+I designating the same concept; cf. the double-sided black arrows in the DOMAIN KNOWLEDGE BASE. A single basic theme-rheme connection at the local level is indicated by the one-sided shadowed arrow which goes from the local theme (a frame) to its local rheme (one of its associated slots or slot fillers). A proper sequence of local theme-rheme pairs as depicted in Figure 4 constitutes what is here called continuous thematization of rhemes, i.e. a global theme-rheme cluster. Consider the illustration of continuous thematization of rhemes in Figure 5. The PARSE BULLETIN contains an extract of the parse trace for text fragment [Tl.2] from section 3. Again, these entries are the result of local parsing procedures at the phrasal, sentence and text cohesion level of linguistic analysis. As with constant theme, we only need to consider entries annotated by parse descriptors of the LC* type. Appropriate formal criteria should then identify the following theme-rheme cluster which characterizes the specific pattern of continuous thematization of rhemes in Figure 5 (bold italics stress the emerging global theme-rheme cluster constituted by local theme-rheme pairs): Delta-X - manufacturer - ZetaMachines Inc., ZetaMachines Inc. - product - Gamma-Z, Gamma-Z - architecture - Connection Machine architecture, Connection Machine architecture - developer - D. Hillis Note that (just as with constant theme) minor distortions of the basic pattern have to be accounted for. For instance, at parse point Ί 9 6 ' another possible candidate for rheme thematization is available in the same way as it is at parse point '210'. In any case, their consideration must be suppressed in order to prevent defective theme-rheme clusters from being determined.
232
U. Hahn PARSE B U L L E T I N
]
0
]
frame
]
frame
^ < slot
]
frame
Tj
]
frame
·]
frame
]
0
EOP f l
< slot T l
< slot T i
( slot filler T1 = frame
\>
LC*
{ slot filler 72 = frame 73 ) >
LC*
(slot filler
LC*
Tj
= frame γ(ί+ΐ) 1 >
< slot T(n-l)· ( s l o t m"T(n-i) m
= frame
„ 1>
T
LC* LC* EOP
Figure 4: Continuous thematization of rhemes configuration pattern in the PARSE B U L L E T I N and the D O M A I N K N O W L E D G E B A S E
Distributed Text Structure Parsing
233
PARSE BULLETIN [153] [156.2]
EOP D e l t a - X 121
DEFACT
[180.1] ZetaMachines Inc. Ill [180.2] Delta-X 141 < manufacturer 111: { ZetaMachines Inc. Ill } >
DEFACT NounATT
[196.2] Brian Wilson 111 [196.3] ZetaMachines Inc. 141 < sales manager I2l: { Brian Wilson 111 } >
DEFACT NounATT
[206.1]
DEFACT
G a m m a - Z 111
[206.2] ZetaMachines Inc. 151 < p r o d u c t 111: | G a m m a - Z 111 } >
NounATT
[210.1] C A D / C A M 111 [210.2] G a m m a - Z 121 < f u n c t i o n 111: | C A D / C A M application 111 | >
DEFACT NounATT
[215.1] [215.2]
C o n n e c t i o n Machine architecture 111 DEFACT G a m m a - Z 131 < architecture 111: { C o n n e c t i o n Machine architecture 111 >) N o u n A T T
[225.2] D . Hiilis 111 [225.3] C o n n e c t i o n Machine architecture 121 < developer 111: { D. Hiilis 111 | >
DEFACT NounATT
[229.1] [229.2]
DEFACT NounATT
M.l.T. Ill D. Hiilis 121 < universities 111: { M . l . T . Ill | >
[242]
EOP
D. Hiilis 121
< universities 111: ( M.l.T. HI] | CTR EXPERT I -: prc-cund: N O '
HQ! \ i
post-cond: KB < = - C T R aroup ( I ( D e l n - X ZetaMacTincs lnc.l, ||/ct.iMachines Inc. - f i i m m i - Z ] , I:< Famma-X - Connection Machine Architecture], | |[Connection Machinc architecture - D. Hiilis] ) )
re 5: Preconditions holding with respect to a continuous thematization of rheme pattern
234
U. Hahn
At the technical level, continuous thematization of rhemes requires a careful distinction of three computation layers: - the function micro-rheme computes local theme-rheme pairs; - the function macro-rheme computes global theme-rheme clusters based upon the determination of a collection of local theme-rheme pairs provided by micro-rheme and their agreement with rheme-specific connectivity criteria (current rheme gets next theme); - the top function rhemata encapsulates the whole rhematization process and supplies macro-rheme with appropriate start parameters for the computation of global theme-rheme clusters. rhemata( (a) (b) (c)
(d) (e)
textpos, testpos ) = ( THEMES χ RHEMES, n e w p o s ) i f f testpos < textpos & ( textpos, 0, EOP ) is in the PARSE BULLETIN & ( prepos, 0, EOP ) is also in the PARSE BULLETIN with prepos < textpos and no other triple intervening prepos and textpos in the PARSE BULLETIN that has also Ό ' as a text token & macro-rheme( max{ prepos, testpos )+l, textpos, {} ) = ( THEMES x RHEMES, newpos ) & \THEMES\ > 2
Otherwise, rhemata( macro-rheme(
textpos,
startrheme,
testpos
endrheme,
) = *
TOPICS
)
macro-rheme( π ( micro-rheme(true, startrheme, endrheme)), endrheme, TOPICS υ { π 1 ; 2 ( micro-rheme (true, startrheme, endrheme)) } ), if TOPICS = {} & micro-rheme(true, startrheme, endrheme) Φ * m a c r o - r h e m e ( π3 ( micro-rheme(false, startrheme, endrheme)), endrheme, TOPICS υ { π 1 > 2 ( micro-rheme(false, startrheme, endrheme)), } ), if TOPICS Φ {} & micro-rheme(false, startrheme, endrheme) Φ * ( TOPICS, startrheme ), if TOPICS Φ {} & micro-rheme(false, startrheme, endrheme)=* * else 36
b
π; are standard projections u p o n the ί-th c o m p o n e n t of a cross-product (in this case, the image of the function micro-rheme).
Distributed Text Structure Parsing
235
micro-rheme( firstcall, testpos, textpos ) = ( loctheme, locrheme, pos ) iff (a) testpos < textpos & (b) pos e [testpos, textpos—1] & (c) 3 antepos e [testpos, textpos—1] : antepos < pos & [firstcall true => antepos e [testpos, textpos-1] ] & [firstcall false => antepos = testpos ] & (d) there are two triples in the PARSE BULLETIN: ( p.pointi, , desCi ) at bulletin position antepos & ( p.pointj, , desc-j ) at bulletin position pos, with desCi and descj being LC*-type parse descriptors. We then require: 3 ν e [2, s] : pv equals p*x {string identity} & pL is loctheme {variable binding} & p*! is locrheme {variable binding} & (e) there is no other triple intervening those characterized by (d) at some bulletin position bullpos in the PARSE BULLETIN such that (i) either antepos < bullpos < pos or testpos < bullpos < pos & (ii) that triple's parse descriptor is LC*-type & (iii) NOT-EQUAL( micro-rheme(false, bullpos, textpos), * ) & (f) pos is minimal in the sense that —13 Apos e [testpos, textpos-1] : Apos < pos & conditions (c)-(e) apply, too. Otherwise, jnicro-rheme(firstcall, testpos, textpos) = * Remarks on these definitions: As already indicated, rhemata serves as the top function for the computation of a theme-rheme cluster assembled in THEMES χ RHEMES. Criteria ( a ) - ( c ) are basically the same as for constant-theme and supply proper data for the spatial extension of the search for that cluster: (a) testpos gives an arbitrary test point located before textpos. (b) textpos is the current position in the PARSE BULLETIN and contains the end-of-paragraph symbol. (c) prepos indicates the position of the end-of-paragraph symbol immediately preceding the one for textpos.
236
U. Hahn
(d) macro-rheme computes the actual theme-rheme cluster (if any). ( e ) To rule out insignificant occurrences of a theme-rheme cluster its cardinality must exceed a certain size level. macro-rheme is essentially a recursive function intended to incrementally build up a theme-rheme cluster; macro-rheme calls micro-rheme, a function that determines a local theme-rheme pair (if any) and its location in the PARSE BULLETIN; macro-rheme selects the values returned by micro-rheme in the following way: -
the projection π, 2 determines the currently considered local theme-rheme pair which is immediately added to the set TOPICS; - the projection π3 provides the bulletin position associated with the current theme-rheme pair in PARSE BULLETIN. The values for macro-rheme distinguish four different cases. The first one applies to its initial call, with TOPICS being the empty set and micro-rheme producing the first local theme-rheme pair. The second holds for subsequent calls producing valid theme-rheme pairs and enhancing the set TOPICS, a container for theme-rheme pairs; the third case indicates that no further local theme-rheme pair can be found in the PARSE BULLETIN. At this point, the recursive procedure halts, having generated significant results. Any other combination of conditions, e.g., the lack of any local theme-rheme relation in the text passage considered, requires macro-rheme to terminate with an insignificant result. micro-rheme computes local theme-rheme pairs. Conditions ( a ) - ( c ) delimit the search space for a valid theme-rheme pair: testpos and textpos as above. pos indicates the bulletin position of a theme that has been playing the role of a rheme at a preceding bulletin position (viz., antepos). (c) antepos indicates a bulletin position located right before pos in the interval [testpos, textpos-1] for the first call of micro-rheme, otherwise it is equal to testpos. (d) The basic criterion for a local theme-rheme pair requires that two different bulletin positions be considered in the PARSE BULLETIN, namely antepos and pos. Two triples with LC :: "-type parse descriptors have to be determined at these positions in order to check for the required kind of theme-rheme relationship - either a slot or a slot value manipulated at bulletin position antepos becomes the next thematic head at bulletin position pos. ( e ) The density condition requires that antepos and pos be chosen such that no triple intervenes at some bulletin position bullpos where an alternative and valid continuation of the theme-rheme cluster is feasible (the latter is tested using the look-ahead facility in e ( i i i ) ) . (a) (b)
Distributed Text Structure Parsing
237
( f ) On the other hand,pos must be minimal in order not to loose any possible valid local theme-rheme pair. With these remarks the reader may follow the behavior of the CTR_ EXPERT, the coherence expert for Continuous Thematization of Rhemes. Its preconditions state that the function rhemata must produce a significant result in terms of a set of tuples whose components are frames related through rbematization (i.e., current rheme becomes next theme). Since [Tl.2] constitutes the final paragraph of [Tl] (see section 3), the determination of prepos in (c) no longer produces trivial results. Instead, it indicates that the current paragraph whose end mark is textpos - 242 only extends up to parse point '153' (= prepos). After setting the proper extension of the current paragraph, (d) in rhemata requires the computation of macro-rheme (154,242, {}). Its recursive definition immediately leads to the evaluation of micro-rheme{ true, 154,242). At this level of micro computations, antepos is chosen as the parse point where a local rheme is introduced relative to some theme, which is going to become the next theme at parse point pos (therefore, antepos < pos in ( c ) ) . Appropriate choices for antepos and pos lead to the triples referred to in condition ( d ) : for antepos=l80: ( 180.2, < Delta-X, manufacturer, ZetaMachines Inc. >, NounATT) for/7O5=206: ( 206.2, < ZetaMachines Inc., product, Gamma-Z >, NounATT) According to the constraints expressed in ( d ) , we have Delta-X as loctheme and ZetaMachines Inc. as its associated locrheme. Now, there seems to be an apparent perturbation. One might argue that there is hullpos=\96 with antepos < hullpos , NounATT)
which should be considered as a possible value for the first call of micro-rheme. However, the problem with this choice of hullpos becomes evident when the look-ahead facility ( i i i ) in (e) is taken into consideration, i.e. evaluating micro-rheme{ false, 196, 242 ). In this case, we shall find no proper pos > 196 (= testpos/antepos in the subsequent call) such that condition (d) in micro-rheme is also fulfilled, since Brian Wilson does not occur at all as thematized rheme in the selected fragment of the PARSE BULLETIN. With this look-ahead facility of micro-rheme we achieve a high degree of robustness of the recognition procedure with respect to possible distortions of rhematization-based coherence structures in a text, which are likely to occur due to the large amounts of lowlevel text propositions collected in the PARSE BULLETIN. On the other hand, considering the valid initial choice of pos we succeed in the recursive calls of macro-rheme, since micro-rheme( false, 206,242 ) = ( ZetaMachines Inc., Gamma-Z, 215) and thus the tuple (ZetaMachines Inc., Gamma-
238
U. Hahn
2 ) can be added to the set TOPICS with the new test point given at position '215'. Exhaustively evaluating macro-rheme, finally, produces the set TOPICS = { ( Delta-X, ZetaMachines Inc.), ( ZetaMachines Inc., Gamma-Z ), ( Gamma-Z, Connection Machine architecture ), ( Connection Machine architecture, D. Hillis ) } The set TOPICS is subject to the CTR grouping operation in the DOMAIN KNOWLEDGE BASE. Given its frame configuration prior to (Figure 5) and after the CTR grouping process (Figure 6) the enhancement of the basic representation structures in the DOMAIN KNOWLEDGE BASE by CTR overlays becomes apparent. Unlike CT grouping, which only moderately modifies the a priori frame representation structures by characterizing a coherent subset of conceptual facets associated with one particular frame, CTR grouping relates different frame units that have had no a priori relationship by text-specific coherence indicators. 5.3 Basic text coherence patterns III: derived theme The pattern considered next overcomes the paragraph boundary (motivated above) for determining large-scale textual coherence. It further generalizes the results of previous coherence computations at the paragraph level and extends them over various (adjacent) paragraphs and possibly over the whole text. Consider a series of paragraphs, each one dealing exclusively with one special topic (see Figure 7). The first paragraph deals with frameT1, the second one elaborates on frameetc. A derived theme may be computed when all these different (sub)topics can be assigned to the most specific general (super)topic; in technical terms, these subtopics are all subordinates/instances of a single supertopic. Figure 7 depicts this relationship by black arrows pointing from each subtopic of a single paragraph to the supertopic of the whole text thematically characterizing the entire sequence of paragraphs at a more general level of conceptualization. The generalization builds upon the conceptual dependencies at the lower level of subtopics denoted by frameT1, frame^, .. .,frameTm Their slot expresses the fact that they are all subordinates of frameT1,, j2..., Tm., resp., with frameTbeing the most specific common general concept related to frameT1,, ..., T*m. Figure 8 illustrates this phenomenon with respect to the text [T2] from section 3. There are at least three paragraphs (at parse points Ό55.Γ, Ό55.2', '338.1', '998.1') with coherence annotations of the type Constant Theme (CT) relating to Delta-X, AlphaBooster, and Sigma-P; the implied conceptual generalization (cf. their slots) links them to the generic concept class workstation.
Distributed Text Structure Parsing
239
PARSE B U L L E T I N [153]
EOP
[156.2] Delta-X 121
DEFACT
[180.1] ZetaMachines Inc. Ill [180.2] Delta-X 141 < manufacturer 111: { ZetaMachines Inc. Ill } >
DEFACT NounATT
[196.2] Brian Wilson 111 [196.3] ZetaMachines Inc. l4l < sales manager l2l: ( Brian Wilson 111 ) >
DEFACT NounATT
[206.1] Gamma-Z 111 [206.2] ZetaMachines Inc. 151 < product 111: { Gamma-Z 111 } >
DEFACT NounATT
[210.1] CAD/CAM 111 [210.2] Gamma-Z 121 < function 111: ( C A D / C A M application 111 ) >
DEFACT NounATT
[215.1] Connection Machine architecture 111 [215.2] Gamma-Z 131 < architecture 111: ( Connection Machine architecture 111) >
DEFACT NounATT
[225.2] D. Hillis 111 [225.3] Connection Machine architecture I2l < developer 111: { D. Hillis 111 } >
DEFACT NounATT
[229.1] M.I.T. Ill [229.2] D. Hillis 121 < universities 111: ( M.I.T. Ill) >
DEFACT NounATT
[242] [242.1] ([Delta-X - ZetaMachines Inc.], [ZetaMachines Inc. - Gamma-Z], [Gamma-Z - Connection Machine architecture], [Connection Machine architecture - D. Hillis])
EOP
CTR
D O M A I N K N O W L E D G E BASE Delta-X 141 manufactureTTTirTZm etaMachines Inc. 151 roduct
Gamma
Gamm : architecture HI: | Connection Machine architecture 111 Connection Machine a r c h i t e c t u r e l2l i D. Hillis ill I |Ρ7ηπϊϊΓΪ2Γ
< universities III: ( M.I.T. Ill) [...] >
Figure 6: P o s t c o n d i t i o n s h o l d i n g w i t h respect to a c o n t i n u o u s t h e m a t i z a t i o n of r h e m e p a t t e r n
240
U. Hahn PARSE BULLETIN
LC*
< slot^ t
{ slot fillerj-j j )>
LC*
[...]
frame T1
LC*
[...]
0
[•·.]
frame T2 < slot·^
[•·•]
frame n
[•••]
frame j2
[...]
0
EOP
[...]
0
EOP
[...]
frame
[···]
frame
[...]
frame
T1
EOP
S
'°V2 i
^
>
Slot τ „, : {slot H!k'rj*j_: ] > i— '••qjfjg •,•„,.„ :isl.,t 1,1k-,y.,,,,1··^ < ... >
< slot Tmr- (.-!>
Figure 7: Derived theme configuration pattern in the PARSE BULLETIN and the D O M A I N K N O W L E D G E BASE
Distributed Text Structure Parsing
241
PARSE BULLETIN [002]
Delta-X
FRAME
[004.2] Delta-X |2| < manufacturer 111: [ ZetaMachines Inc. Ill | >
NounATT
[013.2] Delta-X 161 < operating system 111: | Unix V.3 111 ] >
NounATT
[033.2] Delta-X |9| < C P U 111: [ 68020 111 ] >
NounATT EOP
[055] [055.1] Delta-X [ manufacturer
application domain, C P U , processors }
[055.2] Delta-X [ i/o devices, peripheral devices, communication devices |
CT CT
[242] 0 ΕΟΡ [242.1] | [Delta-X - ZetaMachines Inc.], [ZetaMachines Inc. - Gamma-Z], [ G a m m a - Z Connection Machine architecture], [Connection Machine architecture - D. Hillis | C T R [243]
AlphaBooster
FRAME
[246.2] AlphaBooster 121 < operating system 111: ( Unix 111 I >
NounATT
[253.3] AlphaBooster 141 < peripheral devices 121: { color display-1 111 ) >
NounATT
[265.2] AlphaBooster 171 < C P U 111: [ 68000 III ] >
NounATT
[338]
EOP
[338.1] AlphaBooster ( operating system, peripheral devices, C P U , ... )
CT
[602]
FRAME
Sigma-P
[998] [998.1] Sigma-P | application software, ... )
EOP CT
[999]
EOT
000
DOMAIN KNOWLEDGEBASE workstation < self: a-computer > < C P U : [a-processorl > < manufacturer: [a-manufacturer) > < operating system: [an-operating system) > < peripheral devices: (a-peripheral device) > < application software: [an-application software]
Delta-X < self; a-workttation < C P U : { 68020 U AlphaBooster < manufacturer; < self: a-workstation > < operating syst < C P U : { 68000 U—j-l-^ < peripheral de\ < m a n u f a c t u r e r / Sigma-P < application sc < operating syst < sdf: a-workstation > < price: {$12.0( c peripheral dc\ < C P U : la-processor] > < application so < manufacturer: j MagicTrontcs } f.. ,J > c price: [a-pric< < operating system: {an-operating svstem] > < peripheral devices: (a-peripheral Jcvice} > < application software; {communications,... < price: ( $8.000){...} >
Figure 8: Preconditions holding with respect to a derived theme pattern [We omit from this and the next figure any activation indication for the relevant frames in the D O M A I N K N O W L E D G E BASE. The main reason for this is the weight cancellation heuristics according to which any concept activation weight (except for the focus frame) of the previous paragraphs is zeroed out once the analysis of a new paragraph is started. In addition, since the Derived Theme pattern is determined when the end of the text has been reached and considers only paragraph-level coherence data, the decision to hide these activation data seems reasonable.]
242
U . Hahn
The statement below gives a technical account of the formal conditions underlying the computation of a derived theme: derived-theme( textpos ) = ( theme, SUBTHEMES ) iff (a) ( textpos, 000, EOT ) is in the PARSE BULLETIN & (b) let Ρ = { eop_pos1, eop_pos2, . . ., eop_posn } denote the set of all bulletin positions for which ( eop_posi( 0, EOP ) is in the PARSE BULLETIN (i e [l,n]), i.e., all occurrences of the end-ofparagraph symbol & (c) determine the subset P* of Ρ considering all those bulletin positions eop_pos1( eop_pos2, . . ., eop_posm (n>m) in the PARSE BULLETIN which contain triples of the form ( p.pointy., , desc1 , ) ( ρ.point2 „, ' desc2 , ) ( ρ.pointq ,, , desc q , ) such that desCi , equals the Constant Theme parse descriptor (CT)7, for all i e [l^q] (qän* since multiple occurrences of CT descriptors may occur at the m bulletin positions under consideration) & (d) Considering P* and all triples from the PARSE BULLETIN that satisfy condition (c) above, we stipulate : IF some theme can be determined to be the most specific common superordinate/generic concept class of all pi-:l, i e [l,q] , with respect to the generalization hierarchy in the DOMAIN KNOWLEDGE BASE THEN p 1 1 ( p2-1, . . ., p q l e SUBTHEMES Otherwise, derived-theme( textpos ) = * Some comments to the formal expression for derived-theme: (a) textpos indicates the current end-of-text position in the PARSE BULLETIN. (b) Ρ contains all bulletin positions where end-of-paragraph symbols occur in the PARSE BULLETIN. 7
O n l y Constant Theme is mentioned here, but other coherence schemata that focus on one particular topic may be considered, too.
Distributed Text Structure Parsing
243
(c) P* is constructed from Ρ such that the former only contains those bulletin positions where the CT parse descriptor - indicating the recognition of a Constant Theme pattern in that paragraph - co-occurs with the E O P class identifier; all other paragraphs are discarded from further analysis. (d) The criterion for a derived theme requires the consideration of all constant themes from P::" and to determine their most specific common superordinate or generic concept class (theme). Finally, all these constant themes are assigned to the reference set SUBTHEMES; obviously, the set SUBTHEMES is already maximal. As an example consider Figure 8. The 'end-of-text' marker (indicated by the occurrence of the 000 symbol in the PARSE BULLETIN and the parse descriptor E O T at position '999') initializes the so-called text wrap-up group of coherence experts which are only activated at the end of the text. We shall concentrate on the activities of the DT_ EXPERT, the coherence expert for the determination of Derived Themes. In order to evaluate the function derived-theme(999), step (b) requires that the bulletin positions be determined where the symbol 0 occurs indicating the end of a paragraph. With Ρ = { 055, 242, 338, 998 } we have exactly four labels to check. Ρ reduces to P* = { 055, 338, 998 }, since bulletin position '242' has the CTR descriptor only. From these remaining three parse positions a derived theme can be computed. We may now check with respect to (d) at position 055: ( 055.1, < Delta-X { manufacturer, usage mode, operating mode, operating system, application domain, CPU, processors } >, CT ) ( 055.2, < Delta-X { i/o devices, peripheral devices, communication devices } >, CT ) 338: ( 338.1, < AlphaBooster { operating system, peripheral devices, CPU, . . . } > , C T ) 998: ( 998.1, < Sigma-P { application software, . . . } > , CT ) Given the conceptual relations in the DOMAIN KNOWLEDGE BASE (cf. Figure 8) we may conclude that SUBTHEMES = { Delta-X, AlphaBooster, Sigma-P } and derived theme = workstation. The grouping operation issued to the DOMAIN KNOWLEDGE BASE by the DT_ EXPERT produces the results shown in Figure 9. The darker shadowed region indicates what the text is basically about (supertopic) while lighter shadowed areas indicate the concrete topics (subtopics) treated in various paragraphs. In addition, the grouping information which is due to Constant Theme recognition during earlier steps of coherence parsing has also been included in this figure. Note that the text need not mention the supertopic workstation explicitly, although it now figures as a common conceptual denominator for the whole text.
244
U. Hahn PARSE B U L L E T I N [002]
Delta-X
FRAME
[004.2] Delta-X 121 < manufacturer 111: { ZetaMachines Inc. Ill | >
NounATT
[013.2] Delta-X 161 < operating system 111: [ Unix V.3 111 ) >
NounATT
[033.2] Delta-X 191 < CPU 111: { 68020 111) >
NounATT
[055]
EOP
[055.1] Delta-X { manufacturer,..., application domain, CPU, processors |
CT
[055.2] Delta-X | i/o devices, peripheral devices, communication devices 1
CT
[242] 0 EOP [242.1] {[Delta-X - ZetaMachines Inc.], [ZetaMachines Inc. - Gamma-Z], [Gamma-Z Connection Machine architecture], [Connection Machine architecture - D. Hillis] | CTR [243]
AlphaBooster
FRAME
[246.2] AlphaBooster 121 < operating system 111: { Unix 111 } >
NounATT
[253.3] AlphaBooster 141 < peripheral devices 121: { color display-1 111 } >
NounATT
[265.2] AlphaBooster 171 < CPU 111: { 68000 111) >
NounATT
[338]
EOP
[338.1] AlphaBooster { operating system, peripheral devices, C P U , . . . |
CT
[602]
FRAME
Sigma-P
[998] [998.1] Sigma-P ( application software,... } 000 [999] [999.1] workstation { Delta-X, AlphaBooster, Sigma-P }
EOP CT EOT DT
D O M A I N K N O W L E D G E BASE
< seif: a-computer > < CPU; (a-processor] < manufacturer: [a-manufacturcr] > < operating system: [an-operating system] > < peripheral devices: [a-periphcral device] > < application software*, fan-application software] > < price: [a-price] >
V
- ( i'i; : i ox:20 j ' p . ] > < manufacturer: { ZetaMachines Inc.} {. ;; •:;:< operating system: { Unix V.3 }[...]> peripheral devices: { display-1, ... } {... I < application software; | communication " V an u f ac tu r erJ>" | < price: ( $8.000 )[...) > • . • ; . jt< operating system: ( Unix }{...}> peripheral devices· { color display -1, .7. I j · J > ^ < application software < price: [a-price] >
V,
J
Figure 9: Postconditions holding with respect to a derived theme pattern
Distributed Text Structure Parsing
245
5.4 The merits of text macrostructure parsing There are several advantages to having text coherence phenomena under computational control. First, they are a natural extension of formal text grammars in terms of the coverage of global structural properties of text constitution (cf. Zadrozny & Jensen, 1991, for a related approach). Second, besides linguistic plausibility, there is empirical evidence for the existence of these patterns as cognitive memory clues, e.g., as aids for summarizing texts or retrieving (recalling) specific facts (van Dijk, 1979). Third, inspired by these cognitive arguments their potential is evident for information retrieval applications. Various speculative suggestions have already been made pointing out their usefulness for sophisticated conceptual organization and management of text knowledge (cf., e.g, the proposals concerning automatic extracting by Janos, 1979). TOPIC, however, provides one of the first implementations dealing with several text structure phenomena and their incorporation in information retrieval dialogs (at the formal level of text partitioning, Hearst, 1994, proposes a statistically-based algorithm which determines the extension of coherent text segments for the purposes of text passage retrieval). While extending the TOPIC system by the interactive graphical retrieval interface TOPOGRAPHIC operating on TOPIC'S text knowledge bases (Thiel & Hammwöhner, 1987) experimental evidence was gathered in support of our hypothesis that there is a close functional relationship between the selection of particular coherence patterns as retrieval operators and particular search states during the retrieval process: 1) Constant Theme coherently characterizes a variety of facts related to one particular topic. A CT-based search operation enhances the user's knowledge of that topic by presenting facets (or data related to those facets) the user is probably not aware of, although they may be relevant for his or her problem solution. 2) Continuous Thematization ofRhemes links a set of formerly unrelated topics by a coherent line of conceptual dependencies (current rheme becomes next theme) motivated by the text's thematic development. A CTR-based search operation therefore provides the basis for thematical associations and stimulates previously unconsidered lines of reasoning by thematically constrained browsing. 3) Derived Theme groups hierarchically related topics and thus may enhance the knowledge of thematical alternatives compared with the particular topic (and facts related to it) which is under focused attention of the user, e.g., by way of stimulating comparisons, recognizing information gaps, introducing new knowledge. Text coherence analysis thus not only serves methodological purposes of properly specified text grammars, but also accounts for an enhanced functionality of text information systems by thematically (i.e., semantically) supported and directed search and retrieval processes.
246
6.
U. Hahn
Conclusions
In this paper, we have dealt with basic text coherence patterns building upon the results of prior text cohesion analysis. Technically, the difference between those two levels of text analysis comes from the fact that cohesion phenomena are due to single instantiations of co-occurrence or conceptual relations, while coherence phenomena are characterized by particular patterns of multiple instantiations of these relations: - constant theme is defined by multiple instantiations of aggregation (or conceptual association) relations for one particular frame unit in the DOMAIN KNOWLEDGE BASE; - continuous thematization of rhemes is defined by multiple instantiations of aggregation relations for continuously changing, though locally overlapping frame units in the DOMAIN KNOWLEDGE BASE; - derived theme is defined by multiple instantiations of generalization/classification relations holding among subparts of a frame hierarchy in the DOMAIN KNOWLEDGE BASE. The internal structure of frames, therefore, provides a representation format whichprecompiles major text schemata into the knowledge representation structures of the DOMAIN KNOWLEDGE BASE. These precompiled knowledge structures are then instantiated by the topical evolution of a particular text as represented in the PARSE BULLETIN.8 Note that our knowledge-based approach is not only capable of determining the extension of coherent text passages, but also recognizes their topical content and thematical organization. This contrasts with alternative methods of computation which are only able to demarcate the spatial extension of coherent text segments, either by lexical frequency and distribution information (cf. Hearst, 1994, for such an algorithm based on statistical similarity scores of co-occurring lexical items) or lexical cohesion information (based in the exploitation of thesaural relations (Morris & Hirst, 1991). In summary, this contribution provides an advancement over current methodology in that it gives a thorough treatment of these text phenomena in terms of formal text grammar specifications, which has been lacking so far, and also puts it into the operational environment of an implemented text parser. More research on thematic progression patterns seems justified, since they constitute general, domain-independent regularities of text constitution. Furthermore, they can dynamically be configured at different levels of text organization (within para8
Although Wilks (1985) is correct when he questions the validity of story grammar based models for text structure analysis, the rigour of his criticism related to frame structures, nevertheless, seems inappropriate. O f course, frame structures, per se, do not reflect any text structure. Instead, we have demonstrated their value as a representation structure which lends itself as a framework for assigning text structures to them - single frames (constant theme) as well as clusters of different frames (continuous thematization of rhemes, derived theme).
Distributed Text Structure Parsing
247
graphs, combining single paragraphs or sequences of paragraphs). This way they constitute particularly flexible tools for text macrostructure description. Due to the high degree of modularity achieved in our model (each coherence pattern is modelled by a particular coherence expert, different coherence patterns assigned to a text passage are dynamically combined at suitable levels of text organization) they provide a true alternative to those coherence descriptions underlying story grammars or superstructures. These formal grammar systems describe fairly idealized text structures which often do not account for the variability of structural patterns encountered in authentic documents. In addition, they also need to be specified for each text sort (scientific paper, magazine article, letter, memo, etc.) and domain covered by a text understanding system. We have limited our discussion of global text patterns to those that are determined by structural linguistic indicators (thematic progression types) and their tight coupling to basic conceptual, taxonomic relations in a frame knowledge base. Alternative coherence patterns, such as functional coherence relations and macropropositions, require deeper propositional modelling in the domain knowledge base than is currently available in the TOPIC system. Nevertheless, there should be a peaceful co-existence between both approaches in that thematic progression patterns might complement the propositionally oriented coherence relation and macropropositional approach by one that is mainly oriented towards terminological knowledge, i.e., focusing on inherent properties of concept taxonomies. Implementational Remarks. The parser is implemented in C and Prolog and runs in a distributed SPARCStation environment under Unix (SUNOS V4.1.1). The functionality described in this paper is fully operational and embedded within the larger framework of the TOPIC text understanding system. A second-generation prototype, the Parse Talk system, is currently under development. This parser is implemented in Smalltalk, with extensions that allow for coarse-grained parallelism and asynchronous message passing (Hahn, Schacht & Bröker, 1994).
References Atterman, R. 1989
Event concept coherence. In D. L. Waltz (Ed.), Semantic structures. Advances in natural language processing (pp. 57-87). Hillsdale: Erlbaum. Birnbaum, L., & Selfridge, M. 1981 Conceptual analysis of natural language. In R. C. Schänk, & C. K. Riesbeck (Eds.), Inside computer understanding: Five programs plus miniatures (pp. 318-353). Hillsdale: Erlbaum. Correira, A. 1980 Computing story trees. American Journal of Computational Linguistics, 6, 135-149.
248 Danes, F. 1974
U. Hahn
Functional sentence perspective and the organization of the text. In F. Danes (Ed.), Papers on functional sentence perspective (pp.106-128). Prague: Academia.
Dijk, T. A. van 1977 Semantic macro-structures and knowledge frames in discourse comprehension. In M. A. Just, & P. A. Carpenter (Eds.), Cognitive processes in comprehension (pp. 3-32). Hillsdale: Erlbaum. Dijk, T. A. van 1979 Recalling and summarizing complex discourse. In W. Burghardt, & Κ. Hölker (Eds.), Text processing. Papers in text analysis and text description (pp. 49-118). Berlin: de Gruyter. Dijk, T. A. van 1980 Macrostructures: An interdisciplinary study of global structures in discourse, interaction and cognition. Hillsdale: Erlbaum. Fukumoto, J., 8c Tsujii, J. 1994 Breaking down rhetorical relations for the purpose of analysing discourse structures. In CO LING '94: Proceedings of the 15 th International Conference on Computational Linguistics (pp. 1177-1183). ACL. Giora, R. 1983a Segmentation and segment cohesion: On the thematic organization of the text. Text, 3, 155-181. Giora, R. 1983b Functional paragraph perspective. In J. Petöfi, & Ε. Sözer (Eds.), Micro and macro connexity of texts (pp. 153-182). Hamburg: Buske. Grimes, J. Ε. 1978 Topic levels. In Proceedings of TINLAP-2: Theoretical Issues in Natural Language Processing, 2 (pp. 104-108). New York: ACM. Grosz, B. J. 1981 Focusing and description in natural language dialogues. In A. K. Joshi, B. L. Webber, & I. A. Sag (Eds.), Elements of discourse understanding (pp. 84-105). Cambridge: Cambridge University Press. Hahn, U. 1989 Making understanders out of parsers: Semantically driven parsing as a key concept for realistic text understanding applications. International Journal of Intelligent Systems, 4, 345-393. Hahn, U. 1990 Lexikalisch verteiltes Text-Parsing. Eine objektorientierte Spezifikation eines Wortexpertensystems auf der Grundlage des Aktorenmodells. Berlin: Springer. Hahn, U., & Reimer, U. 1986 TOPIC Essentials. In COLING '86: Proceedings of the llth International Conference on Computational Linguistics (pp. 497-503). Bonn: Institut für angewandte Kommunikations- und Sprachforschung. Hahn, U., Schacht, S., & Bröker, N. 1994 Concurrent, object-oriented dependency parsing: The ParseTalk model. International Journal of Human-Computer Studies, 41, 179-222. Hearst, M. A. 1994 Multi-paragraph segmentation of expository text. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (pp. 9-16). ACL. Hinds, J. 1979 Organizational patterns in discourse. In T. Givon (Ed.), Syntax and Semantics. Vol. 12: Discourse and syntax (pp. 135-157). New York: Academic Press.
Distributed Text Structure Parsing Hobbs,J. R. 1982 Janos, J. 1979
249
Towards an understanding of coherence in discourse. In W. G. Lehnert, & Μ. H. Ringle (Eds.), Strategies for natural language processing (pp. 223-243). Hillsdale: Erlbaum.
Theory of functional sentence perspective and its application for the purposes of automatic extracting. Information Processing and Management, 15, 19-25. Johnson, M., & Klein, E. 1986 Discourse, anaphora and parsing. In COLING '86: Proceedings of the 11th International Conference on Computational Linguistics (pp. 669-675). Bonn: Institut für angewandte Kommunikations- und Sprachforschung. Kintsch, W., & Dijk, T. A. van 1978 Toward a model of text comprehension and production. Psychological Review, 85, 363-394. Kurzon, D. 1984 Themes, hyperthemes and the discourse structure of British legal texts. Text, 4,31-55. Lascarides, Α., & Asher, N. 1991 Discourse relations and defeasible knowledge. In Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics (pp. 55-62). ACL. Longacre, R. E. 1979 The paragraph as a grammatical unit. In T. Givon (Ed.), Syntax and Semantics, Vol. 12: Discourse and syntax (pp. 115-134). New York: Academic Press. Mann, W. C., & Thompson, S. A. 1988 Rhetorical structure theory: Toward a functional theory of text organization. Text, 8, 243-281. McKeown, K. R. 1985 Discourse strategies for generating natural-language text. Artificial Intelligence, 27, 1^1. McKevitt, P., Partridge, D., & Wilks, Y. 1992 Approaches to natural language discourse processing. Artificial Intelligence Review, 6, 333-364. Morris, J., & Hirst, G. 1991 Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational Linguistics, 17, 21—48. Nishida, F., Takamatsu, S., Tani, T., & Kusaka, H. 1986 Text analysis and knowledge extraction. In COLING '86: Proceedings of the 11th International Conference of Computational Linguistics (241-243). Bonn: Institut für angewandte Kommunikations- und Sprachforschung. Pustejovsky, J. 1987 An integrated theory of discourse analysis. In S. Nirenburg (Ed.), Machine translation. Theoretical and methodological issues (pp. 168-191). Cambridge: Cambridge University Press. Reichman, R. 1978 Conversational coherency. Cognitive Sdence, 2, 283-327. Reichman-Adar, R. 1984 Extended person-machine interface. Artificial Intelligence, 22, 157-218. Reimer, U. 1986 A system-controlled multi-type specialization hierarchy. In L. Kerschberg (Ed.), Expert Database Systems. Proceedings from the 1st International Workshop (pp. 173187). Menlo Park: Benjamin Cummings. Reimer, U., & Hahn, U. 1988 Text condensation as knowledge base abstraction. CAIA '88: In Proceedings of the 4th Conference on Artificial Intelligence Applications (pp.338-344). Washington: IEEE.
250
U. Hahn
Reimer, U., Sc Hahn, U. 1990 An overview of the text understanding system T O P I C . In U. Schmitz, R. Schütz, & Α. Kunz (Eds.), Linguistic approaches to artificial intelligence (pp. 305-320). Frankfurt: Lang. Riesbeck, C. K. 1975 Conceptual analysis. In R. C. Schänk (Ed.), Conceptual information processing (pp. 83-156). Amsterdam: North-Holland. Rumelhart, D. E. 1975 Notes on a schema for stories. In D. G. Bobrow, & A. Collins (Eds.), Representation and understanding. Studies in cognitive science (pp.211-236). New York: Academic Press. Scha, R., & Polanyi, L. 1988 An augmented context free grammar for discourse. In CO LING '88 Budapest. Proceedings of the 12th International Conference on Computational Linguistics, (pp. 573-577). Budapest: John von Neumann Society for Computing Sciences. Schänk, R. C., Lebowitz, M., & Birnbaum, L. 1980 An integrated understander. American Journal of Computational Linguistics, 6, 1330. Scinto, L. 1983 Functional connectivity and the communicative structure of text. I n j . S. Petöfi, & Ε. Sözer (Eds.), Micro and macro connexity of texts (pp. 73-115). Hamburg: Buske. Shen, Y. 1989 The X-bar grammar for stories: Story grammar revisited. Text, 9, 415-467. Sidner, C. L. 1983 Focusing in the comprehension of definite anaphora. In M. Brady, & R. C. Berwick (Eds.), Computational models of discourse (pp. 267-330). Cambridge: M.I.T. Press. Thiel, U., & Hammwöhner, R. 1987 Informational zooming: An interaction model for the graphical access to text knowledge bases. In Proceedings of the 10th Annual International ACM SIGIR Conference on Research & Development in Information Retrieval (pp. 45-56). New York: ACM. Tucker, A. B., Nirenburg, S., & Raskin, V. 1986 Discourse and cohesion in expository text. In COLING '86: Proceedings of the 11th International Conference on Computational Linguistics (pp. 181-183). Bonn: Institut für angewandte Kommunikations- und Sprachforschung. Wachtel, T. 1986 Pragmatic sensitivity in N L interfaces and the structure of conversation. In COLING '86: Proceedings of the 11th International Conference on Computational Linguistics (pp. 35-41). Bonn: Institut für angewandte Kommunikations- und Sprachforschung. Webber, B. L. 1979 Α formal approach to discourse anaphora. London: Garland. Wilks, Y. 1985 Text structures and knowledge structures. Quademi di Semantica, 6, 335-344. Yonezawa, Α., & Hewitt, C. 1977 Modelling distributing systems. In IJCAI'77: Proceedings of the ith International Joint Conference on Artificial Intelligence (pp. 370-376). Zadrozny, W., & Jensen, K. 1991 Semantics of paragraphs. Computational Linguistics, 17, 171-209.
OSTEN D A H L
Causality in Discourse In the study of intratextual relations, causality has played an important role. Sometimes causal relations are even seen as the main source for connectivity or coherence in a text, as in Schank and Abelson (1971) 1 . More frequently, perhaps, causal concepts are seen to underlie one or more of the relations - whether they are called rhetorical predicates (Gimes, 1975), coherence relations (Hobbs, 1985), rhetorical relations (Mann & Thompson 1987, p. 88) or whatever - that are supposed to provide the links between sentences or larger chunks of texts. The lists of such relations that are found in the literature thus commonly include labels such as 'Cause', 'Reason', 'Effect'. In a couple of papers (Dahl & Hellman, 1990; Dahl & Hellman, forthcoming), Christina Hellman and I have argued that the set of rhetorical relations postulated e.g. in the Rhetorical Structure Theory of Mann and Thompson have to be seen as belonging to different levels of discourse structure and that a relation on one level may well be involved in another relation on another level. One of the key notions in our discussion was causality. In this paper, I will elaborate on our argument, making a rather simple claim, namely that the relation between causality and discourse structure is much too complex to think of in the simple terms that postulating 'Cause' as a rhetorical relation would do. To recapitulate the relevant parts of the previous papers, causation is but one of several closely related relations, which all tend to have a similar role in reasoning and discourse, but which are treated separately by Mann and Thompson and others. The most important of these are listed and exemplified below: CAUSE:EFFECT
W: If you eat too much chocolate, you'll have a stomach ache. A: John ate a lot of chocolate. C: He had a stomach ache. REASON:ACTION
W: An apple a day keeps the doctor away - If you eat an apple every day, you will preserve your health A: John wanted to be healthy. C: He ate an apple every day. 1
A terminological apology: In this paper, the terms 'causation', 'causality', 'cause' and 'causal relations' are used at the same time. Since they focus on somewhat different aspects of the same phenomenon, it did not seem possible to decide on just one of them.
252
Ö . Dahl
CLAIM
-
CHALLENGE
-
EVIDENCE
illocutionary level
SUPPORT
ρ