215 122 14MB
English Pages 211 [212] Year 1988
Distributed Language Translation The goal of these series is to publish texts which are related to computational linguistics and machine translation in general, and the DLT (Distributed Language Translation) research project in particular. Series editor Toon Witkam B.S.O./Research P.O. Box 8348, NL-3503 RH Utrecht The Netherlands Other books in this series: 1. B.C. Papegaaij, V. Sadler and A.P.M. Witkam (eds.) Word Expert Semantics 2. Klaus Schubert Metataxis
Bart Papegaaij and Klaus Schubert
TEXT COHERENCE IN TRANSLATION
¥ 1988
FORIS PUBLICATIONS Dordrecht - Holland/Providence Rl - U.S.A.
Published
by:
Foris Publications Holland P.O. Box 509 3300 A M Dordrecht, The Netherlands Sole distributor
for the U.S.A.
and
Canada:
Foris Publications U.S.A., Inc. P.O. Box 5904 Providence Rl 02903 U.S.A. Sole distributor
for
Japan:
Toppan Company, Ltd. Sufunotomo Bldg. 1-6, Kanda Surugadai Chiyoda-ku Tokyo 101, Japan CIP-DATA
In co-operation with BSO, Utrecht, The Netherlands ISBN 90 6765 360 8 (Bound) ISBN 90 6765 361 6 (Paper) © 1988 Foris Publications - Dordrecht
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission from the copyright owner. Printed in the Netherlands by ICG Printing, Dordrecht.
Contents
Foreword Chapter 1.
Chapter 2.
Text coherence in machine translation An appraisal of the problem and some prerequisites for its solution 1.1. Text and context 1.2. Text linguistics at a glance 1.3. Distributed Language Translation 1.4. Some terms and concepts of grammar Clues and devices of text coherence ctical analysis 2.1. English to Esperanto: a sample translation 2.1 .1. The text 2.1 .2. Circumstantial information 2.1 .3. A superstructure of text A close-up of some translation 2.1 .4. steps 2.1 .5. Conceptual proximity 2.1 .6. Tense, mood, aspect and voice Adding explicitness 2.1 .7. 2.1 .8. Theme and rheme, repetition of words and syntagmata 2.1 .9. Pronominal reference 2.1 .10. Names, terms, abbreviations 2.1 .11. Voice and markedness 2.1 .12. Prominence and deep-case structure 2.1 .13. Rhetorical patterns 2.1 .14. Stylistic changes 2.1 .15. Conclusion 2 . 2 . Deictic reference 2.2.1. Pronouns and translation 2.2.2. Pronouns and text coherence
9 11 13 15 17 20 21 21 23 24 26 29 32 35 35 37 40 41 44 46 47 48 50 51 53
2.2.3. 2.2.4. 2.2.5.
Syntactic restrictions Syntactic redundancy Syntactic rules and semantic features 2.2.6. The inadequacy of syntactic and semantic features 2.2.7. Focus Speech acts 2.2.8. 2.2.9. Conclusion 2.3. Content word reference Lexical variation 2.3.1. 2.3.2. Reference identity and translation 2.3.3. Indefinite versus definite reference 2.3.4. Directing the search 2.3.5. Definiteness without modification 2.3.6. Identity of sense or reference 2.3.7. Types of definite reference Searching for reference 2.3.8. 2.3.9. Conclusion 2.4. Communicative functions: theme and rheme 2.4.1. Thematic progression 2.4.2. Analyzing thematic progression Given and new 2.4.3. Focusing rules 2.4.4. 2.4.5. Breaking expectations, reintroduction of concepts 2.4.6. The complete pattern 2.4.7. Thematic patterns as a summary mechanism 2.4.8. The complete pattern, continued 2.4.9. Conclusion 2.5. Reduction to verbal elements 2.5.1. Case grammar 2.5.2. The verb as central element 2.5.3. Nominalized verbal elements 2.5.4. Some sample analyses Using extended expectations to 2.5.5. solve definite reference 2.5.6. Coherence through 'deep' reference 2.5.7. Translating metaphorical expressions When is a word a verb? 2.5.8. Finding parallel structures 2.5.9. Conclusion 2.5.10.
56 58 63 66 67 72 72 74 75 76 78 80 82 82 85 87 90 92 94 99 102 106 109 120 122 123 127 128 128 130 134 135 140 141 142 143 145 147
Chapter 3.
2.6. Rhetorical structures and logical connections 2.6.1. Rhetorical patterns 2.6.2. Logical connectives 2.6.3. Conclusion 2.7. Towards the translation of text coherence
148 148 154 156 158
Text coherence with translation grammar 3.1. On the translatability of language units 3.2. Coherence of entities 3.2.1. Lexeme choice 3.2.2. Deixis and reference 3.3. Coherence of focus 3.3.1. How 'free' is word order? 3.3.2. Communicative syntagma ordering 3.3.3. Grammatical features determined by syntagma ordering 3.3.4. Excursus on word order in dependency syntax 3.4. Coherence of events 3.5. Pragmatics - an escape from text grammar? 3.5.1. The kind of translation-relevant pragmatic knowledge 3.5.2. Full understanding? 3.5.3. Knowledge representation 3.6. Towards implementation
159 160 167 167 170 173 174 176 177 182 184 189 190 194 196 198
Index
200
References
206
Foreword
The present study was written with the aim of discussing an attempt to render the coherence of texts in a machine translation process with a broader public in theoretical and computational linguistics. It is based on work done for the Distributed Language Translation system, a long-term research and development project of the BSO software house in Utrecht for advanced semi-automatic multilingual machine translation. We wrote this book during the winter of 1987/88, a period of intense implementation work for the first prototype of the D L T system, demonstrated in December 1987, and a second version shown in March 1988. We should like to express our gratitude to all our colleagues in the D L T team for making it possible during these very labourintensive months for us to work in peace and to think beyond the limits of the first (sentence-based) prototypes. The practical body of this study, chapter 2, is by Papegaaij, and chapters 1 and 3 are by Schubert, but of course we have discussed manuscripts and contributed to one another's work throughout. A draft version of the second chapter was studied very carefully by Ian Fantom of Newbury, Great Britain (previously connected with the D L T project). Toon Witkam, research director of BSO/Research, also took a critical interest in the manuscript. An inspiring forum was provided by those of our colleagues who took part in our course on text grammar, held in BSO/Research in the winter of 1986/87. W e thank them all for many helpful remarks and suggestions. W e are also grateful to Dan Maxwell for improving our English and to Ronald Vendelmans, Maiumi Sadler and Hanni Schroten for their contribution to the graphical and typographical form of this book.
Utrecht, April 1988 B. C. Papegaaij Klaus Schubert
8
Chapter 1
Text coherence in machine translation An appraisal of the problem and some prerequisites for its solution
The objects of translation are texts. A translation process starts from a given text, it analyses the text, and its final product is a new text in another language. One way that Machine translation is different from a human translator's work is that the rules and the knowledge that guide the process have to be made explicit and fixed with the degree of precision required for computer programs. The prime and major source of rules and knowledge in this respect is grammar. But unfortunately grammatical research has for many decades and even centuries focused much more on clauses and sentences than on entire texts. That grammatical investigations with a broader domain are a desideratum was noted even before machine translation entered the scene, and the last sixty years or so have seen a slowly but steadily increasing interest in text linguistics, discourse analysis or text grammar. These fields of study have come into blossom and have brought about a sizeable collection of studies, models and theories. However, the available grammatical rules that might be applied for the translation of texts still cover considerably less ground than sentence-grammatical ones, are much less reliable, and are in no way ready to use for machine translation. The higher awareness of text phenomena among linguists has also led to the insight that the results of the endeavour have remained relatively modest (compared to sentence level grammar) — not so much due to a lack of attention to the problem, but rather because the regularity in text structure is of a considerably less obvious nature. The present study is an attempt to estimate the impact of text phenomena on a machine translation process, to assess the feasibility of a number of suggested models for this particular application and to propose solutions. In view of the size and complexity of the problem, we do not aim to resolve the entire set of text-grammatical tasks in machine translation in any exhaustive way, nor can we work out the solutions we outline in elaborate detail. We concentrate on a substantial fraction of the overall text-grammatical phenomena which is particularly pertinent to translation. This fraction
9
is the coherence of texts. Put concisely, the question we keep in mind throughout this study is: -
What is the difference between a set of unrelated sentences and a text, and how can we avoid destroying a text by translating it into unrelated sentences?
In this chapter we account for the prerequisites on which the present study is built. Section 1.1. gives a preliminary sketch of what text coherence is and which role it plays in machine translation. Section 1.2. reviews some of the main grammatical theories germane to text coherence, which may be taken to be the theoretical sources of our applications. The study is written for a general audience in theoretical and computational linguistics, and we do not proceed very far beyond the point which would make it necessary to get involved with the technical details of the linguistic design of a machine translation system. This general guideline notwithstanding, our work derives from one particular machine translation project, the Distributed Language Translation system (DLT), and builds on earlier work done for that system. It may therefore be in order to provide a brief sketch of the main characteristics of DLT and, also of the grammatical model we refer to when speaking about translation not only at the text level, but also at the lower levels. This is done in sections 1.3. and 1.4., respectively. Having outlined in chapter 1 the theoretical and implementational setting of our study, we immediately leave it in order to approach text coherence in chapter 2 without any bias, as far as possible. In view of the explorative and at times seemingly somewhat groping way in which we deal with this highly intricate problem, we have chosen to address text coherence initially with a good deal of deliberate, creative eclecticism. The chapter draws on various grammatical text theories and computational text models, trying to apply appropriate elements of them in combination with our own findings. This is done with an orientation toward practical application, specifically by closely examining a sample translation. In particular, chapter 2 deals with various grammatical and extragrammatical text coherence phenomena such as pronominal reference, reference identity of content words, thematic progression and rhetorical patterns. In chapter 3 we return to the theoretical model set out in 1.4. and attempt to extend it to cover text coherence, incorporating the insights and the practical devices considered in chapter 2. We outline the position of text coherence in an overall translationoriented text grammar (3.1.) and describe three different types of coherence and of relevant coherence-translating devices (3.2. to 3.4.). It is well known that the rules and the knowledge that pertain to translation are not only of a grammatical nature. Section 3.5. is accordingly dedicated to extragrammatical, pragmatic approaches and section 3.6., finally, outlines a number of steps towards implementation of coherencepreserving devices in machine translation.
10
1.1. Text and context
To translate means to express in another language the content elements from which the target language text is composed grammatical relations (or more precisely, as we argue in 3.1., relations). Words, the most self-evident text elements, are well the decisive role the context has to play in translation.
of a given text. The are words and their morphemes and their suited for illustrating
Texts and many of their elements are linguistic signs of form and content. The objective of translation is to replace the form and to preserve the content of the text. Translation is thus form manipulation with reference to content. The most outstanding obstacle to accurate translation is ambiguity. As far as translation is concerned, ambiguity in linguistic signs such as words is of a twofold nature, at least: Firstly, a word does not have a single meaning, but refers to a number of concepts. This is especially obvious in words such as English ball, that refer to quite distinct concepts, but it is no less true for any arbitrary word, even technical terms, since a concept can always be divided up into several subconcepts. There is in principle no end to this process of dissecting meaning into ever more subtle elements. Saying that words are ambiguous in this way is accordingly almost a tautology. It should, however, be borne in mind that translation is not directly concerned with disambiguating words on this monolingual basis. Translation has to do with resolving the second, translationrelevant, type of ambiguity and may in the course of such a disambiguation indirectly revert to monolingual ambiguity of the first type. The second type of ambiguity is bilingual or contrastive ambiguity, which also may be called translation ambiguity. Two words (or other text elements) from different languages that are often adequate translations of each other do not denote exactly the same concept or the same range of concepts. Virtually all bilingual dictionaries whatsoever bear witness to this fact, giving in many cases not only a single translation, but several translation alternatives for a single source language entry. In spite of this incongruence between source and target language, translation is nevertheless possible because a word in a given text does not denote the entire range of its possible meanings, but normally only a certain fraction of them, which often may be well covered by a single target language expression (Those who believe in countable distinct concepts might say that a word in a given text denotes only a single concept out of the set of concepts it can refer to.). The main goal of disambiguation for translation purposes is to find out which fraction of its possible range of meanings a word refers to in the given text. By contrast to monolingual ambiguity, the resolution of translation ambiguity is not an endless process. It has a well-defined 11
accomplishment condition: the choice of a single translation alternative. Translationoriented disambiguation may thus be viewed as an analysis of the source language text, steered by the requirements of the target language. Translation ambiguity and translation-oriented disambiguation are language pair-specific. It is in translation-oriented disambiguation that the role of context in translation is found: The context in which a word (and mutatis mutandis other text elements) is used restricts and determines its meaning in the particular occasion. As Henrik Nikula (1986: 41) puts it, context is co-text plus situation. The indications as to the precise meaning of a word in a text should thus be sought first in the surrounding text and secondly in extratextual information. It cannot be the purpose of a machine translation system to pursue this analysis into infinite depth. Translation, machine or human, is not concerned with resolving the infinite monolingual ambiguity as such, but only inasmuch as this is inevitable for resolving translation ambiguity. Since translationrelevant information may be found both in the text itself and in the extratextual situation, the devices for detecting and consulting it are of a grammatical and of a pragmatic character (see 1.4.). The surrounding text contributes to a proper understanding of a given word or expression. This means that there is a connection between a given text element and other pieces of the text. It is this connectedness that makes up the coherence of a text. Obviously it is not enough for a good translation just to understand the source text properly and to detect its coherence. Because the target and source texts should be understood the same way by their respective readers, not only detection of the coherence of the source text, but also the creation of a similarly cohering target text are indispensable. Machine translation must therefore devise rules both for the analysis and for the synthesis of text coherence. We hope to take a few steps towards this aim in the present study.
1.2. Text linguistics at a glance
This book draws on the stock of theoretical insights about text coherence obtained in a number of linguistic schools. Especially in chapter 2 we attempt to apply some of these theories' ideas to specific translation problems. We do not refer to a specific source from which our idea is taken for every single step we take, nor do we always explicitly identify the ideas we have added ourselves when applying the models. It is therefore appropriate at this point to mention the main models we make use of. It would be an interesting chapter in the history of science to investigate how the scientific interest in language in the course of the centuries shifted from texts to sentences, to words, to sounds, and back. The interest in texts is not a novelty of this century. On the contrary, the roots of modern linguistics (in the countries of European culture) are found, as far back as we can trace them, in ancient Greek language philosophy, where the text occupied a central position, for example also as the object of rhetoric. In our century, this interest was revived. Elisabeth Giilich and Wolfgang Raible (1977: 60ff.) give a good account of this development. They list four main groups of general linguistic text models (plus a few models for literary texts, which we pass by here): -
The Prague school. The theme-rheme distinction originates from this school, in particular, from Vilem Mathesius (1929). From Mathesius's ideas an uninterrupted flow of theoretical and practical work has come, mainly under the heading of aktualni cleneni vety, in English usually called Functional Sentence Perspective.
-
The Copenhagen school. The language theory developed by Louis Hjelmslev (1943) and his associates, glossematics, is the first model to choose the text as its central object, approaching other elements such as sentences, as it were, primarily as elements of texts.
-
Tagmemics. This school, linked with the name of Kenneth Pike (1967: 287), started a text-grammatical revolution first of all against American Structuralism of Bloomfield's direction.
-
A group of models which we should like to subsume under the heading of a German school, since they all are offsprings of an innovative approach to linguistics taken by Peter Hartmann in the 1960's (cf. Beaugrande/Dressler 13
1981: 24, note 8; Hartmann e.g. 1971). The main representatives of this school are in Giilich and Raible's account Roland Harweg (1968), Harald Weinrich (1972), Klaus Heger (1976) and Janos Petofi (1969). Among the models of literary texts they also list Teun van Dijk (1978), whose interest in text linguistics starts off in the German school (Giilich/Raible 1977: 250). Although Giilich and Raible in 1977 do not yet give it such a prominent place as the other models, we may now, a decade later, assign also the work by Wolfgang Dressier (1972), later in association with Robert-Alain de Beaugrande (Beaugrande/Dressler 1981) to the German school. Most of these authors have published enormous quantities of relevant works. We quote here only an initial or central work.
Several of these approaches have found applications in computational linguistics. The model of the Prague school has been developed by scholars such as Eva Hajicova, Petr Sgall (e.g. 1987 for a recent sample from their long list of publications) and others in Prague, but has also found a following at various other places in the world (a computational example: Vasconcellos 1986). The Prague approach as well as the approaches of Hjelmslev, Weinrich and Heger are of special interest to the present study, due to their dependency-minded view on grammar. Both in general and in computational linguistics, many other ideas and solutions have been put forward. We refrain from an attempt to mention them all, but we refer to a few of them in the subsequent chapters where we deal with some of their ideas more thoroughly.
14
1.3. Distributed Language Translation
Although this study is written with a wider audience in mind, it is a part of the research endeavour undertaken within a particular machine translation project, the DLT system. Distributed Language Translation is the name of a long-term research and development project of Buro voor Systeemontwikkeling (BSO/Research), a software house based in Utrecht (the Netherlands). DLT was initiated around 1980 by Toon Witkam. With funding from the European Community, Witkam's subsequent feasibility study (Witkam 1983) provides a detailed account of the linguistic, computational, and commercial aspects of DLT. Since the beginning of 1985, the DLT project has been in a seven-year research and development period with the aim of delivering a complete prototype for a single language pair (English to French) by the end of the period (1991). A restricted prototype was demonstrated in December 1987. This period is being jointly funded by the Netherlands Ministry of Economic Affairs and BSO, and has a total budget of 17 million guilders. Marketing of the DLT machine translation system is scheduled for about 1993. DLT is a system for semi-automatic machine translation with a monolingual interactive dialogue with the user. It is designed for use in personal computers in data communication networks and is therefore set up to work without post-editing. The basic idea, from which the epithet "distributed" derives, is that the translation process is split up in two parts: The text is entered in, say, English and immediately translated into an intermediate language. Problems that cannot be resolved by the system are submitted to the user by means of an interactive dialogue. The dialogue is exclusively in the source language - English, in the example - so that the user need not have command of the target or the intermediate language and need not even know the identity of the target languages. The intermediate-language form is sent to receivers in the network, and only there the text is translated on into the final target language(s), without a dialogue with a user in the second half of the process. The system is modular in the sense that the intermediate form of a text lends itself to further translation into any target language and is thus in no way dependent on the source language. DLT is designed as a multilingual machine translation system which is easily extensible to include more languages of whatever kind. It is not restricted to typologically similar languages. These requirements presuppose a fully expressive intermediate language, and the need of translating from the intermediate language into the target language fully automatically presupposes an extremely clear and
15
translation-friendly intermediate language. DLT therefore uses a slightly modified version of Esperanto for this purpose. Besides the feasibility study (Witkam 1983), a number of publications describe the DLT system as well as various of its facets. Overviews are given by Witkam (1985), Hutchins (1986: 287ff.) and Schubert (1986b). At the time of writing the most up-todate one is by Witkam (forthc.). DLT's model of dependency syntax is suggested by Schubert (1986a) and later worked out in more detail, especially with respect to metataxis, a dependency-based system of translation syntax (Schubert 1987b). The (likewise dependency-minded) semantic-pragmatic model of DLT with its word expert system and its lexical knowledge bank is described in detail by Papegaaij (1986), in a concise sketch also by Papegaaij, Sadler and Witkam (1986). The ambitious quality test of DLT's semantic modules, that was announced by Alan Melby as an external assessor in 1986, has in the meantime been passed successfully. It was designed to measure the precision of lexical transfer, more precisely of content word choice at the syntagma level, in a first, preliminary version of DLT. (About the set-up of the test cf. Melby 1986, about the outcomes Melby forthc.). The present study builds on these achievements in an attempt to extend the dependency-syntactic and -semantic devices to the text level. Since it is closely linked to the ongoing development of the DLT machine translation system, the present study focuses in its practical parts (chapter 2) mainly on the language pair that is currently worked with in DLT: English and Esperanto. The theoretical framework is then approached in chapter 3 in a more general way, with the aim of achieving crosslinguistic relevance. This is not the place to repeat any of the discussions in the aforementioned publications, but since distinct usages of identical linguistic terms have become abundant during the past several decades, if not longer, it seems expedient to provide the reader of this study with a shortcut introduction to the way in which grammatical notions are conceived and grammatical terms are used throughout this book. This is done in the next section.
16
1.4. Some terms and concepts of grammar
Although a certain part of this study contains attempts to apply other scholars' findings, which makes it necessary at times to relate their analyses and to use even their terms, we try to describe grammatical and extragrammatical phenomena by means of a consistent set of appropriately termed concepts. How linguistic terms are used is partially a question of definition, and partially also of labelling. We thus do not maintain that the following terms are the best, nor do we claim that they are the only possible ones. But in view of the existing diversity of definitions, we find it necessary to make our own choices clear. Basically we stick to the definitions adopted earlier (Schubert 1987b: 14f.): Language is a system of signs for human communication. The signs consist of form and content (Saussure 1916: 32). Grammar is the theory about the internal system of language. Grammar contains the study of both the formal and the content side of the linguistic sign. The theory about the formal side is syntax, the one about the content is semantics. Language is not a closed system, but is subject to influence from outside factors. The theory about extra-linguistic influence on language is pragmatics (with sociolinguistics as an important subbranch).
The terms syntax, semantics and in particular pragmatics are defined in many different and sometimes mutually contradictory ways by different authors. As for our definitions, it should be pointed out that we take syntax to comprise all levels of language, so that it deals not only with the arrangement of words in sentences, but with morphemes in words, words in syntagmata, syntagmata in clauses, clauses in sentences and sentences in texts. When we speak about syntax, syntactic features etc., these terms cover even morphology and the form-oriented side of word formation. Semantics is also taken to comprise all levels, including, for example, the content side of word formation. Our definition of pragmatics is among the broadest definitions found for the term. It is not confined to the persons participating in the speech act, nor to speakers' intentions, underlying communicative motivations or the like. Though emphasising throughout this and other studies a strict separation of form, content and extralinguistic implicature, we do not maintain that these separations are, as it were, observable facts beyond all discussion. On the contrary, we think that they have been drawn as purposeful, but ultimately arbitrary decisions within grammarians' model design. Accordingly we do not deny that there are semantic motivations for syntactic
17
regularities, borderline cases between pragmatics and semantics and the like (Schubert 1987b: 226). For the design of a theoretical grammar model as well as for computational implementations of language-related rules clear-cut separations of syntax, semantics and pragmatics are useful. We keep them in mind throughout, although we find it purposeful in the beginning not to stick too strictly to a preformulated model in the more experimental parts of this study. Indeed this study is experimental in two respects. Firstly, it explores ways of translating texts as coherent units, and secondly, it investigates the applicability to the text level of the principles and insights previously established in the design of rule systems for the lower levels of language structure. It is therefore appropriate to give a very short account of some of these principles. To translate means to transpose a message from an extremely intricate symbol system, whose internal regularity scholars have not been able to describe in any totally exhaustive way, into another such system. If machine translation is to attempt an automation of this intellectually demanding process, a maximal reduction of the complexity of the symbol systems and the transposition process is of prime importance. We therefore strive to detect the precise relevance of grammatical features to translation, which has much to do with the cross-linguistic validity of grammatical categories and their values. It is our aim to remove as many language-specific features and characteristics as possible from the translation process proper, using them in separate, monolingual processes before (source language) or after (target language) translation proper. In these monolingual processes language-specific features often serve as indicators for a more abstract, translation-relevant grammatical function. It is on the basis of this reasoning that we have chosen to apply dependency grammar (Schubert 1986a: 1 Iff., 1987b: 17ff.). Dependency grammar - more so than the alternative, constituency grammar — focuses on the function various phenomena have in the grammatical system, and uses form only to detect or to express such a function. Translation proper, those steps of the overall translation process in which elements of the source language are actually replaced by elements of another language, can then more straightforwardly work on the basis of grammatical function rather than grammatical form. Word order, which seems to be a single phenomenon if viewed as form, but serves several quite diverse functions, is a good example of the usefulness of such an approach in a machine translation attempt. Word order is discussed in more detail in 2.4. and, with explicit reference to the form-function distinction, in 3.3. The policy of maintaining a strict distinction between syntax and semantics has another consequence which is borne in mind throughout the discussions in this study. We imagine the role of syntax on the one hand and semantics and pragmatics on the other hand as follows: the syntactic parts of the overall translation process generate all syntactically possible translation alternatives of a given piece of text and that thereafter semantic and pragmatic processes make a choice out of the alternatives available. It is our working hypothesis in this study that this approach of a formrelated preselection with a subsequent content-related choice is transferable to the text level as well.
18
To round off these introductory remarks, we should return to the concept of text coherence. Robert-Alain de Beaugrande and Wolfgang Ulrich Dressler (1981: 3ff.) suggest a distinction between Kohäsion and Kohärenz. Abbreviating their model to a minimum, we can say that they use Kohäsion to refer to the syntactic connectedness in a text, whereas Kohärenz describes semantic-pragmatic connectedness. Although such a distinction seemingly fits in well with our approach, we have chosen to investigate the overall connectedness in texts as a unit, but of course with reference to the interplay of syntactic, semantic and pragmatic devices for detecting and rendering it. We use the term coherence for all these aspects.
19
Chapter 2
Clues and devices of text coherence A practical analysis
Treating a text as nothing but a collection of individual sentences doesn't do justice to a text's structural complexity. Apart from the fact that sentences taken in isolation from their surrounding text will often be highly ambiguous, and sometimes practically meaningless, much of the message of the text itself can only be understood when seen as a single structure. No simple summing up of the separate sentences' meanings (if at all possible) can capture a significant part of the message which the text was intended to convey. Text understanding involves complex linguistic processes on many levels. The sentences a text is composed of must not only be analysed internally (i.e., as selfcontained units), but also as functional elements in the larger unit of the text. Many words in many sentences can only be properly understood when seen as markers of textual functions, or influenced by such functions. Syntactic variation (passive/active, nominalization of verbs, etc.) is not just free variation, but more often than not fulfils a definite textual function. Translating one natural language into another can be seen as a form of analysis: explaining an utterance's meaning by rephrasing it in another language, or as Peter Newmark (1981: 23) puts it, "a translation is the only direct statement of linguistic meaning". Most people seem to agree that a considerable amount of surface processing (syntactic manipulation) is necessary to carry out such a task. There is much less agreement on the depth of understanding (semantic and pragmatic processing) required to translate from one natural language into another. Early machine translation proposals assumed understanding was unnecessary. Later proposals included "a mild form of understanding". And there are some (e.g. Tucker/Nirenburg 1984: 148; Nirenburg/Raskin/Tucker 1986: 627) who claim "full" or "deep understanding" is the only road to machine translation "in an interesting sense".
20
2.1. English to Esperanto - a sample translation
Before we take a position on the scale from "no understanding" to "full understanding" (see 3.5.2.), it is perhaps better to study in detail a sample translation, in order to see what kind of decisions are involved in translating a coherent piece of text. We will submit the sample text to a kind of "close reading", highlighting interesting points, with special emphasis on the kind of information that is available at each point, how that information is presented, and how it influences the translation decisions that are taken. By going through a piece of text in this way, we hope to be able to extract a few of the more striking aspects of text translation, which will then be discussed in detail in subsequent chapters. Note that the view of translation presented here is of necessity incomplete. Many questions will be left unanswered, and much more work will have to be done to complete our understanding of the complex process of natural language translation. The analysis in this chapter is for a good deal biased towards English, which is compared with DLT's intermediate language Esperanto. Considerations of a wider cross-linguistic scope are included here and there, but they should mainly be sought in chapter 3.
2.1.1. The text The sample text is taken from a well-known periodical, and is part of a regularly appearing column (Computer Recreations by Brian Hayes in Scientific American, Volume 250, 1984, No. 2). It was chosen as an instance of non-specialist informative text, aimed at educated readers not necessarily familiar with the subject matter. The translation is a high quality human translation in Esperanto, made for the purpose of comparison. The translation is certainly not meant as an example of what a machine translation system should be able to achieve, but serves to illustrate the kind of choices a translator must make to rephrase a source language text into a target language text, in this case DLT's intermediate language, Esperanto. Figure 1 shows the original text. We have left out the first three paragraphs and added sentence numbers. The text was translated into Esperanto by one of DLT's linguists. The translation is shown in the right column of the table. In DLT's intermediate language the morpheme boundaries in words are normally indicated with a special token (e.g. komput'il'a'j) which we have omitted here to make the text more
21
Figure 1: A sample translation Computer Recreations
Komputilaj Ekzercoj
Turning turtle gives one a view of geometry from the inside out
Testudigo permesas rigardi la geometrion elinterne
by Brian Hayes
de Brian Hayes
1. The new way of thinking about geometry has come to be known as "turtle geometry".
1. La nova aliro al geometrio diskonatigis sub la nomo "testuda geometrio" (turtle geometry). 2. Gi estas intime ligita al la programlingvo Logo, tiu siavice devenas de la Teknologia Instituto de Massachusetts (Massachusetts Institute of Technology). 3. Logon elpensis, en la sesdekaj jaroj, Seymour Papert ce MJ.T., unuavice kiel lingvon por eklernigi infanojn pri komputiloj. 4. De tiam, multaj aliaj kontribuis al giaj evoluo kaj aplikadoj, ne nur en edukado sed ankaù sur aliaj kampoj. 5a. Inter ili trovigas Harold Abelson kaj Andrea A. diSessa, de MJ.T. 5b. Ili eksplikis la ideojn, sur kiuj bazigas la testuda geometrio, en elstara enkonduka verko: Turtle Geometry: The Computer as a Medium for Exploring Mathematics (Testuda geometrio: la komputilo kiel rimedo por esplori la matematikon). 6. Origine, la testudo estis mekanikajo, nome radohava veturileto, kies movigado povis esti regata per tajpado den instrukcioj ce komputila klavaro. 7. La unuan tian estajon konstruis la brita neurofiziologo W. Grey Walter en la fino de la kvardekaj jaroj. 8. Gi havis kupoloforman kovrilon iom similari al tdraso de testudo. 9. Mekanika testudo povas movigi antauen ai malantauen kaj povas sangi sian direkton per simpla pivotado. 10. Sub la testudo eblas munti skribilon, tiel ke gi postlasas spuron de sia vojo, kiam oni devigas gin travagi folion de papero.
2. It is closely connected with the programming language Logo, which in turn has its roots in the Massachusetts Institute of Technology. 3. Logo was conceived in the 1960's by Seymour Papert of M.I.T., primarily as a language for introducing children to computers. 4. Many others have since contributed to its development and to its applications both in education and in other fields. 5. Among them are Harold Abelson and Andrea A. diSessa of M.I.T., who have set forth the ideas underlying turtle geometry in a remarkable expository work: Turtle Geometry: The Computer as a Medium for Exploring Mathematics. 6. The original turtle was a mechanical device: a small wheeled vehicle whose movements could be controlled by instructions typed on a computer keyboard. 7. The first such creature was built by the British neurophysiologist W. Grey Walter in the late 1940's. 8. It had a dome-shaped cover somewhat like a turtle's shell. 9. A mechanical turtle can move forward or backward and can change direction by pivoting in place. 10. A pen can be mounted on the undercarriage, so that when the turtle is made to wander over a sheet of paper, it leaves a record of its path. 11. Today such "floor turtles" are less common than "screen turtles", which move and draw on the surface of a cathode-ray tube. 12. The turtle itself is represented on the screen by a simple triangular form, which moves in response to commands or programs entered at the keyboard.
11. Nuntempe, tìaj "planktestudof estas malpli oftaj, ol "ekrantestudoj", kiuj movigas kaj desegnas sur la surfaco de katodradia tubo. 12. La testudon mem reprezentas sur la ekrano simpla triangula formo, kiu movigas lau instrukcioj au programoj tajpitaj per la klavaro.
22
readable. However, we use the morpheme tokens in the Esperanto words discussed in this study wherever appropriate.
2.1.2. Circumstantial information The basic assumption underlying this book's treatment of text-translation is that the sentences of a text do not normally function as self-contained entities, but have a function embedded in the superstructure of the text they are part of. This implies that proper understanding, and, often, proper translating, of a sentence must involve reference to other sentences in the same text. Within the present design of the DLT system, "other sentences" can only mean preceding ones, since the system operates on a sentence-by-sentence basis, and has not seen any of the sentences following the one it is currently processing. This means that there is at least one sentence that has no other sentences to refer to: the first one. Translating "on the fly" is one of the hard preconditions for the design of the DLT system. Since this study is written with the particulars of DLT's design only as a background, rather than being directly a part of that design, we do not refrain from questioning the principle and thinking about possible needs to modify or abolish it. It remains our preference, however, not to look ahead in the text, if possible.
In theory, a translator approaches a text in a "neutral" state: apart from her/his general knowledge (of the world, language and the translation process) s/he must process at least the first sentence before beginning to form an idea of what the text is about. In practice, this is almost never quite the case. In most cases, a translator will already have formed a mental model of the text based on information from a variety of sources associated with the text. In the case of our sample text, several pieces of information are available before we even begin to read the text. To name a few: -
the text is published in a well-known magazine, namely Scientific American. From this fact alone we can deduce some information about: a) the public that is aimed at: fairly well-educated, non-specialist (i.e., looking for reliable, accessible information about fields they are not themselves specialists in); b) the range of subject matter: anything related to scientific research and applications; c) the register of the language used: informative, non-idiomatic texts, relatively free of obscure terminology or technical jargon, except where this is explained in the text; d) the approximate length of the text: not more than a few pages (seldom more than 10). Though this kind of information is only circumstantial, and not very precise, it helps to narrow down our expectations about the text, possibly activating beforehand certain areas of our knowledge that we expect to need for the task at hand.
23
-
the text is taken from a regularly appearing column in the magazine; which makes such information as the sort listed above even more specific: a) the subject always has something to do with computers, and generally deals with computer application in a new or unexpected field; b) the column is generally aimed at people who are not specialists in information science, but are interested in computers and their applications; c) the language is informative, but not extremely technical, avoiding excessive use of technical terms. The tone is educational (i.e., it explains concepts and ideas) rather than argumentative (i.e., it does not expound theories and defend them with proofs and arguments); d) the length is well within the 10-page limit.
In addition, a translator who is also a regular reader of the column may have some idea of the author's reliability and accuracy, use of illustrations, punctuation, paragraph-divisions and other meta- or non-textual means to structure and clarify his/her text. All this information can help the translator understand the first few sentences of the text, by providing enough general context to make an educated effort at disambiguating some of the content words.
2.1.3. A superstructure of text Titles, sub-titles, paragraph-headers etc. are added to the actual text to clarify what Teun van Dijk (1978: 166) calls the "superstruktuur" of a text. They assist the reader in understanding the message of the text. Such "superstructural" elements can be seen as "compensation" for the absence of the direct speaker-to-hearer contact that occurs in regular discourse. For this reason, these meta-elements often contain explicit markers for information that is present in an implicit form only in the text itself. A careful reader can probably do without them, but since they are there for the reader's convenience, it makes more sense to include them in the process of analysis. When translating a text, meta-element markers can be both a help and a problem: Though they help elucidate the text's structure and meaning, translating them can be problematic, especially in a translation system which works on an on-the-fly basis, processing sentences in step with the text being entered and thus without access to the text to come after the current position. Most sub-titles, paragraph headers, etc. are not meant to have much content on their own. Rather, they serve to point to and highlight the salient pieces of the text. This means that they derive their proper meaning solely from the text they point to. And since this text nearly always follows its meta-element markers, their complete meaning and intention often becomes clear only after the relevant text has been properly analysed.
24
Figure 2 A heavily structured piece of text
Journal
of Semantics
4: 257-263
DISCUSSION VALENCES LTD. vs VALENCES ASSOCIATED Comments on Heringer's Association Experiment as a Basis for Valence Theory* H. E C K E R T
ABSTRACT H e r i n g e r claims t h a t the value of existing theories of valence is limited as they have failed t o give a clear a c c o u n t of the crucial distinction b e t w e e n c o m p l e m e n t s a n d s u p p l e m e n t s . H e m a i n t a i n s that a s s o c i a t i o n s b e t w e e n verbs a n d q u e s t i o n w o r d s can serve as a basis f o r valence t h e o r y . T h e results of his association e x p e r i m e n t , h o w e v e r , d o n o t permit us t o infer d e p e n d e n c y relations, to distinguish clearly between o p t i o n a l a n d o b l i g a t o r y elements, t o specify q u a n t i t a t i v e v a l e n c e , o r t o distinguish b e t w e e n elements that are g r a m m a t i c a l l y a n d semantically implied by the verb as o p p o s e d t o merely c o n t e x t u a l elements. I s h o u l d also like t o a r g u e that the q u e s t i o n s that are said t o i m p o s e themselves u p o n the s p e a k e r s need n o t necessarily d o so b e c a u s e of the s e m a n t i c p o w e r of the verb, a n d that the values f o r certain q u e s t i o n w o r d s a r e partly i n f l u e n c e d by the test m e t h o d , where each q u e s t i o n w o r d affects the values of the s u b s e q u e n t ones. I feel t h a t while the association e x p e r i m e n t yields s u p p o r t i n g evidence f o r valences in a n u m b e r of cases it c a n n o t claim to f u n c t i o n a s a basis f o r valence t h e o r y .
In his article on "The Verb and its Semantic Power: Association as a Basis for Valence T h e o r y " Heringer (1985) gives a detailed account of the carefully compiled and computed data of his association experiment that is concerned "with syntagmatic associative relations between verbs and questions." (83) (Page references without a u t h o r and year refer to Heringer (1985)). I should like to deal here with two aspects of his article: firstly with his account of existing valence theories, whose explanatory power he considers very limited, and secondly, with the validity of his claim that association can function as a basis for valence theory. Heringer concentrates on one particular aspect of valence, i.e. the distinction between complements and supplements, which - in an earlier article - he described as "the basic issue of valence theory, because it concerns its explanatory power in general" (Heringer 1984:35)'. I would agree with Heringer that "valence theory has not succeeded in justifying syntactically the difference between complements and supplements ..." (80), but I feel that the syntactically based distinction should be rejected for the right reasons. Summing up this theory, Heringer claims, " C o m p l e m e n t s are those NPs that are obligatory whereas supplements are those that are facultative. This was the leading idea" (80). None of the leading linguists in this field that I happen * cf. H . J . Heringer: " T h e Verb and its S e m a n t i c P o w e r " , this J o u r n a l , vol. 4.2: 79-99 (1985).
25
It may be better, therefore, to use meta-elements as they occur in the text to facilitate analysis, but postpone translating them until at least the section of text they refer to has been completely processed. Only then, on the basis of what is known about the section they belong to, can the meta-elements be further disambiguated, their intended effect evaluated, and a good translation determined. Since machine translation is not fully automatic anyway, one could also take the opposite point of view: A title exists in order to help the reader interpret the subsequent text. The user of a system such as DLT will not be surprised if a title causes an especially large number of questions in the disambiguation dialogue (see 1.3.) and having access to the full text, he or she is able to answer them. Asking detailed questions about titles and subtitles would be an interesting way of incorporating in an interactive knowledge-based system information about the domain a text comes from. In some of today's translator aid and machine translation systems, such information is supposed to be given at the beginning of the session or in a header of the text to be translated, which is only feasible if the assumption holds that the entire text concerns the same domain. Refraining from using such, strictly speaking, extratextual information and applying instead the information explicitly given in the form of tides and the like seems to be a much more organic way of consulting the human, not only about the domain of a whole text, but much more subtly about appropriate pieces, such as paragraphs, sections, chapters, etc., exactly in those portions the author has chosen to subdivide it.
2.1.4. A close-up of some translation steps Since a column must be directly identifiable for the reader, it always appears under the same title. To provide more specific information about the current column's contents, a sub-title can be added to this title, usually giving some key words from which the reader can get an impression about the content of the column. Often such a sub-title can be viewed as a very concise summary, giving the reader an idea about the subject matter. It allows the reader to decide whether to read on or not, and provides an appropriate indication of context to start processing the text with. In our sample text the sub-title is: Turning turtle gives one a view of geometry from the inside out. It has a few peculiarities, some of which can only be fully appreciated after reading the text itself. The syntagma turning turtle will at first be interpreted as a slightly informal synonym for 'becoming a turtle'. The inappropriateness of the word turtle in the context of geometry and the general context of computers and science makes one suspect a pun (not uncommon in columns like this, where a catchy title is often sought), or a new sense of the word turtle; a sense that would tie in with the semantic fields of both computers and geometry. The subject of the column is given as a view of geometry from the inside out, where 26
from the inside out is slightly informal and difficult to interpret correctly, unless one is already familiar with the concept turtle geometry, as explained in the current text. So far, it seems, the sub-title can only be properly understood when one is thoroughly familiar with the subject matter (and recognizes it from the key words in the sub-title). Without such knowledge, one will only begin to grasp the full significance of the words used when reading the column itself. Especially the pun on turning turtle will only then be fully appreciated. Since this column is typically not aimed at specialist readers, but at the interested layperson, we must conclude that the sub-title has been kept vague deliberately, in order to rouse the curiosity of the reader without giving much away. Not surprisingly, such puns are very difficult to translate, if they can be translated at all. Inventive translators are sometimes able to substitute one pun for another, which may perhaps not be an accurate translation, but has a comparable effect. Often, however, no equivalent expression or appropriate alternative can be found, and the translator will have to decide to choose the most literal translation possible, forgetting the word play altogether. Generally speaking, the more "informative" a text is, i.e. the more it is aimed at conveying information rather than entertaining the reader, the less it will contain nonliteral usage of words (puns, metaphors, figurative speech). Though some proposals have been made, the use of words in a non-literal meaning will continue to be a problem for machine translation systems for several years to come. Most machine translation systems - DLT included - that aim at informative texts assume that nonliteral word usage in such texts remains within reasonable bounds although it will never be completely absent. And unless puns are literally recorded in the dictionary (which a number of often used phrases may well be) our current approach is to distill the literal meaning, if possible, forgetting the pun. The Esperanto translation of the text's sub-title is: Testudigo permesas rigardi la geometrion elinterne, which can be glossed as follows: testudigo 'becoming a turtle' permesas 'allows' rigardi 'to view' la geometrion 'geometry' elinterne 'from the inside out' We see that testudigo is a literal paraphrase of one of the meanings of turning turtle. The other meanings, ('a turtle that turns around' and 'turning round a turtle') one of which is intended in the pun, cannot be understood until after one has read, and understood, the text. The author, however, who is typically the person involved in a DLT setting, knows this and can answer dialogue questions. This does not change the fact that a pun may remain untranslatable, so that the author is forced to choose one of the meanings, which will probably be the straight one.
27
The syntagma gives a view of geometry, which has the nominal form of the verb to view, has been translated into a verbal form permesas rigardi la geometrion. As nominalization of verbs is much less frequent in Esperanto than in English, the translator will often prefer the verbal form over the nominalized form. We will see this choice of an unmarked form over a form that is closer to the original but more marked in the target language occur again and again. Indeed, the problem of finding the closest unmarked equivalent of a source language construction will be seen as a central part of translating text coherence.
The body of the text starts - not surprisingly - with sentence 1: 1. The new way of thinking about geometry geometry".
has come to be known as
"turtle
Though this sentence is not actually the first sentence of the text as it was originally printed (two introductory paragraphs have been omitted), it was presented to the translator as if it was. The structure of the text was such that the chosen fragment could well be regarded as a self-contained piece.
Although this is the first sentence, we saw that quite some knowledge about the text can already have been made available in the form of the peripheral knowledge related to the text's existence as a regular column in a magazine. The sub-title, though rather vague and puzzling, then triggered the lexical knowledge related to its content words and the syntagmata they are part of: turtle a view of geometry from the inside out Each of these words and syntagmata carries expectations about likely contexts; expectations that can help to disambiguate subsequent sentences. The lexical "clash" between turtle and geometry, and between turtle and the field of computer science as a fixed part of the column's contents, is as yet unresolved, and the reader will expect to find an explanation within the text. "Enigmatic" or deliberately "vague" titles are a form of emphatic communication, rousing the reader's curiosity by forced combinations of words that are not normally found in each other's context. A title of a text is particularly subject to this, as part of the "battle for attention" that has been observed in everyday conversation, where capturing the attention of the audience is of prime importance (cf. the abundance of text-linguistic literature on spoken language, in particular on conversations). In general, getting the attention one wants is more difficult than keeping it, as several rules of social behaviour and the natural desire to "hear a story out" - work against any tendency to turn away from someone one has started to listen to. Though a writer has no direct control over her/his audience, s/he must still try to capture the reader's attention in order to be read. "Catchy" titles, as well as clearly structured tables of contents and an eye-pleasing lay-out, definitely help to get a reader started.
28
2.1.5. Conceptual proximity The sentence begins with a definite noun syntagma: the new way of thinking about geometry, which usually implies a definite reference to something that is already known or mentioned earlier. Definite noun syntagmata are often used to refer back to earlier noun syntagmata in the text that refer to the same concept, or are conceptually close to them. A pattern frequently used in English is the introduction of a concept by means of an indefinite noun syntagma, after which all subsequent references to the same concept are made by means of the same (or largely similar) noun syntagma with the definite article. Figure 3 shows some of the noun syntagmata that follow this pattern in our sample text.
Figure 3
a view of geometry from the inside out a small wheeled vehicle a mechanical turtle a computer keyboard
A simple triangular form represents the turtle itself on the screen.
All languages know such variations, and use them to create or support text coherence. But different languages use different devices, and it is not always easy to find a proper translation. The passive, for instance, is far less frequent in Esperanto than it is in English, and a direct translation would be experienced as "unnatural" or "strained" by most readers. On the other hand, a translation of the sentence into a structure having Logo in unmarked position, would greatly reduce the explicitness of the connection of this sentence with sentence 2.
42
Esperanto has a way out, however. The translation is: 3. Logon elpensis, en la sesdekaj jaroj, Seymour Papert ce MJ.T., lingvon por eklernigi infanojn pri komputiloj.
unuavice kiel
As can be seen, Logo is still in initial position, but in the Esperanto sentence it is a direct object of the verb elpensis 'conceived'. In Esperanto, as opposed to English, the syntactic function of an object is not marked by position, but with a function morpheme: the accusative morpheme n. Because of devices of this kind, word order is much less fixed than in English. For more details about the interdependence of word order and syntactic-function assignment, see 3.3.1. In this case, a source language device has been replaced by a target language device with very much the same effect. That such a "device-translation" must be done quite often is shown in figure 8, where some of the structural changes in our sample text are given. In many cases, there is no single "correct" translation, only more or less appropriate alternatives. It is always up to the translator to choose the one s/he feels comes closest to the intentions of the writer. As Mildred Larson (1984: 394) puts it: There are many devices which give cohesion to a text. The particular device which is used, and even the ways in which they are used, will vary from language to language. Such cohesion devices as pronouns, substitute words, verb affixes, deictics, pro-verbs, conjunctions, special particles, forms of topicalization, and so forth, if translated one-for-one from the source language into the receptor language, will almost certainly distort the meaning intended by the original author. It is, therefore, very important that a translator be aware of cohesive devices and recognize them as such. He will then look for the appropriate devices of the receptor language for use in the translation.
Figure 8 English marked constructions with Esperanto marked equivalents
PASSIVE
> A C T I V E w i t h SUBJECT/OBJECT i n v e r s i o n :
Logo was conceived in the I960's by Seymour Papert of MJ.T. > Logon elpensis, en la sesdekaj jaroj, Seymour Papert ce MJ.T. The first such creature was built by the British neurophysiologist W. Grey Walter in the late 1940's. > La unuan tian estajon konstruis la brita neurofiziologo W. Grey Walter. The turtle itself is represented on the screen by a simple triangular form. > La testudon mem reprezentas sur la ekrano simpla triangula formo.
43
2.1.12. Prominence and deep-case structure 4. Many others have since contributed to its development and to its applications both in education and in other fields. Sentence four begins with the noun syntagma many others, which is a good example of an explicit link to the preceding sentence(s). Others implies contrast or comparison, and this in turn implies at least two things: something that is compared or contrasted and something to compare or contrast it with. The sentence-initial position of the syntagma indicates that the concept to which others refers must be sought in the preceding sentence(s). The problem is now to identify that concept, in order to give others a specific meaning. There are several ways to tackle this problem. First of all, reference is governed by preference rules. Among all possible candidates a selection can be made of those that are more likely to be intended because of their syntactic function or position relative to the referring element. As the word preference already indicates, such a selection is not absolute, but relative: one can select a best candidate, but not one which is guaranteed to be the correct one. A general rule of preference is that antecedents must not be too far removed from the anaphor. The closer together antecedent and anaphor are in the text, the more easily (and reliably) they can be identified by the reader. It is a good idea, therefore, to begin the search for a referent for others by listing the most recently mentioned concepts in reverse order. Limiting ourselves to sentence 3, they are: computers children a language Mil. Seymour Papert Logo All these candidates must be considered, in the hope that one will be markedly more appropriate than the others.
44
Figure 9 3. Logo was conceived in the 1960's by Seymour Papert of MJ.T., primarily ducing children to computers.
f£
as a language for intro-
I
4. Many others [...]
3. Logon elpensis, en la sesdekaj jaroj, Seymour Papert ce MJ.T., unuavice idei lingvon por infanojn pri komputiloj. t
eklernigi
_J
4. De tiam, multaj aliaj [...]
To begin with computers, children and a language, there is at least one reason not to prefer them as referent to others. First of all, they occur within a free adjunct, that is they are syntactically not as closely linked to the main verb of sentence 3 as are the valency-bound dependents such as subject, object etc. Since they thus are not obligatory with the given verb, they may be taken to convey additional information. Such information, called the tail by Simon Dik, is not central to the statement, but must be read "as an 'afterthought'", "meant to clarify or modify (some constituent contained in) the predication" (Dik 1978: 153; for details about the role of predications, 2.5.). Not being central to the statement automatically means that the information receives less prominence, and less attention from the reader. In contrast, Seymour Papert, by appearing in a marked, post-verbal, position, is clearly recognized as the most salient information. The statement is about Logo, and its new information is that Seymour Papert conceived it. Why (for what purpose) he conceived it is only given as an "extra" - interesting, but not indispensable information. From this it follows that a reader will expect to hear more about Seymour Papert, rather than about a language, children or computers. For the same reason MJ.T. in the syntagma of MJ.T., receives low prominence. It is presented merely as additional information, useful to better identify Seymour Papert, but not central to the statement. We have now more or less eliminated all candidates except Seymour Papert and Logo. Though Logo is itself in a position of prominence, being the theme of sentence 3; that what is talked about, there is good reason not to prefer it as referent to others. The reason is its relation to the main verb. Since it acts as the (syntactic) subject of the passive verb construction was conceived, its semantic or case relation to the active verb conceive is that of patient, the thing the action is performed upon (see 2.5.1. on case grammar). The agent, the one performing the action, is Seymour Papert. In sentence 4, others is agent, not patient. Schematically, sentence 3 can be represented as:
45
conceive( [AG] Seymour Papert (of [PAT] Logo ...)
Mil.)
and sentence 4 as: contribute( [AG] many others [BEN] to its development and applications ...) What is important here is that conceive and contribute to are conceptually related. There is, in fact, a logical sequence to the two verbs: first someone conceives something, i.e. causes it to exist, then others (or the originator her/himself) contribute to its development, i.e. cause it to develop. This conceptual proximity, coupled to the correspondence in case position between Seymour Papert and others is sufficient reason not to choose Logo as referent. The Esperanto translation of many others in our sample text is multaj aliaj which literally means 'many others'. What is the point, one may ask, to spend so much time finding the intended referent for the word others, when the translation does not show any of the information found? First of all, we cannot know beforehand that the translation will be (un)specific in the same way as the original. Only by first analysing the original, translating it, and then analysing the result can we make sure that the translation is at least as specific as the original and does indeed reflect the original. Secondly, finding the referent of words such as others is necessary to properly disambiguate the sentences they occur in. Lexical disambiguation can be done accurately only when the words in the sentence (or, mdre precisely, the clause) have sufficient semantic content to restrict each other's possible meaning. "Empty" words, such as pronouns, leave gaps in the network of semantic restrictions that make sentences more than a collection of possible meanings. By "borrowing" their meaning from other parts of the text, semantically underdetermined words can not only actually take part in the disambiguation process, but in doing so allow other sentences to influence the one they are part of, thus enhancing textual coherence (see 3.2.1.).
2.1.13. Rhetorical patterns 5. Among them are Harold Abelson and Andrea A. diSessa of Ml J., who have set forth the ideas underlying turtle geometry in a remarkable expository work: Turtle Geometry: The Computer as Medium for Exploring Mathematics. The opening syntagma of this sentence among them (translated as inter ili), provides some difficulty when it comes to solving its reference. The most recently used plural 46
noun is fields, the very last word of sentence 4. Though fields in itself is not in a very prominent position (it is part of the tail, rather than the central part of the information), the fact that the fields and them are so close together makes a link between them very likely. Furthermore, not many human readers will be confused by taking fields as the intended referent. The reason that fields is not the intended referent for them can be found in the link between them and the two names in Harold Abelson and Andrea A. diSessa. They will immediately be recognized as person names, which bars any association with fields. Fields and persons are semantically incompatible, so another referent will have to be found. Applications is just as incompatible with persons as fields, which brings us to many others, which was itself linked backed in contrast to Seymour Papert. There are two reasons why this is a good match. First of all, others was already linked to Seymour Papert, who is a person, so there is a good semantic match. Second, there is a logical progression: from a single person, to an unspecified group of others to a specific example from among those others. The way a text moves from concept to concept has to do with its rhetorical patterning; the writer's method of presenting his/her arguments. Very often the shape of a sentence is directly influenced by the writer's need to "make a point", and such "deviations" from the unmarked form can only be understood when seen in the light of the argument they serve. Rhetorical patterns are characterized by features such as contrast and comparison, thesis/antithesis/synthesis, progression from abstract concept to concrete instances or vice versa. A generally useful account of rhetorical patterning is given by Michael Jordan (1984), which will be discussed in more detail in section 2.6.1.
2.1.14. Stylistic changes The Esperanto translation of sentence 5 is split into two separate sentences: 5a. Inter ili trovigas Harold Abelson kaj Andrea A. diSessa, de Ml.T. 5b. Ili eksplikis la ideojn, sur kiuj bazigas la testuda geometrio, en elstara enkonduka verko: Turtle Geometry: The Computer as a Medium for Exploring Mathematics (Testuda geometrio: la komputilo kiel rimedo por esplori la matematikon). This split is apparently motivated by stylistic requirements. The verb underlie in underlying turtle geometry is translated into Esperanto as bazigi, whose syntactic dependency pattern clearly departs from unmarked English-to-Esperanto rules: bazigi, lit. 'to be based on', is intransitive, whereas underlie is transitive. Moreover, the subject of underlie becomes the argument of a preposition governed by bazigi, while 47
the object of underlie becomes the subject of bazigi. (In fact, this is a metataxis rule, cf. Schubert 1987b: 137ff.) Since the dependency pattern of bazigi so clearly departs from that of underlie, it is impossible to transfer the -ing construction directly. It must therefore be translated by means of a relative clause sur kiuj bazigas, lit. 'on which are-based'. But the original sentence already uses a relative clause construction, beginning with a comma and who. To have two such relative clauses, both beginning with the relative pronoun kiuj would be stylistically awkward, and possibly confusing to the reader (there is no person/non-person distinction in Esperanto as in who versus which, but instead a distinction between reference to a noun syntagma with kiu(j) 'who, which' and reference to the entire superordinate clause with kio 'which'). Since the relative clause in the original is non-restrictive rather than restrictive (i.e., it is used to convey additional information to the already uniquely defined Harold Abelson and Andrea A. diSessa, rather than information that is essential to define some imprecisely indicated object), it is perfectly possible to separate the relative clause from its governing syntagma, without the antecedent unspecific. In other words, cutting loose the relative clause does not reduce the information content of the correlate, while making the overall structure much clearer. In addition, the importance of the relative clause in the general rhetorical pattern (it is crucial in tying together Logo, to its development and to its applications and Harold Abelson and Andrea A. diSessa in the context of turtle geometry) as well as its length justify its being used as a separate sentence. o
That this last sentence of paragraph 1 is indeed important, can be seen in the way turtle geometry, the computer and mathematics are now joined together, returning, as it were, to the concepts activated by the sub-title, but now in a more defined context, that of Logo, programming language, M.I.T. etc. To ensure that this "return" to the concepts of the sub-title will not be lost in the Esperanto version, the translator has chosen to give an Esperanto translation of the book's title. The same is valid for such titles as was already said about proper names above: They are in some target languages (and within them, in some text types) left untranslated (unless there exists a target language version of the work, in which case a translator may decide to refer to that instead), but in cases where the title of the book (or rather its meaning) is relevant to the flow of the argument, the comment construction in parentheses can help to prevent the loss of valuable information. In addition the Esperanto translation of titles is needed for target languages that require translated names and titles instead of, or in parallel with, the original wordings.
2.1.15. Conclusion A central problem in translating natural language is what is called skewing: the fact that there is no complete and exact overlap between the way the signs of two different languages refer to reality. This makes translation a constant approximation. One can only try to be as accurate as possible, but there will never be only a single correct answer.
48
The accuracy of a translation very much depends on the amount of information accessible to the translator: information in the form of knowledge — about the world, culture, language - and information obtained from the text. In general one can say that the more information a translator is able to glean from the text, the more accurate the resulting translation can be. Textual information manifests itself in many different ways, some less accessible than others, and there are many ways to analyse a text's structure and translate its internal coherence. In general, however, none of these methods of analysis can work in isolation. Cohesion and (on a global level) coherence occur as the result of the interplay between all kinds of structuring devices, both implicit and explicit. Studying them in isolation, as we will do in the coming sections, can help to understand them better, but eventually we will have to work towards integrating them into a single theory.
A general idea, which we shall pursue in the following sections, has already turned up in the above discussion; we have to work with a number of translation alternatives, out of which the one which best fits in with the context should be chosen. Much of this study is concerned with determining precisely what context is in this regard, and what the criteria are for fitting in with a given context.
49
2.2. Deictic reference
One of the most obvious of the many problems that a translator faces is that of pronominal reference: the use of small, almost meaningless, words as a substitute for semantically fully specified words. The discussion of pronouns and pronominal reference in this chapter makes no mention of other "pro" forms, such as pro-verbs and pro-adjectives. It cannot be more than introductory, and certain aspects will have to be left untreated, although we take up the topic again in 3.2.2. Since we chose English as the source language, and pronouns are by far the most common pro-form in English, it seemed reasonable to concentrate of pronouns as a typical example of pro-form reference. In particular, we treat personal and relative pronouns. Later work will undoubtedly have to include the other pro-forms as well.
Unfortunately, traditional sentence-based grammars almost totally ignore the phenomenon presumably because sentence-based grammar in most cases cannot offer a solution. And even though many text grammarians have worked (and are working) on the problem, it seems that no single fail-safe procedure for solving it can be given. The most prominent group is personal pronouns, which behave syntactically like full content words in that they can take the same positions in a clause as semantically fully specified nouns. In contrast to content words, however, pronouns carry (virtually) no meaning by themselves, but must derive their meaning from other elements inside or outside the sentence. This means that a pronoun can only be understood once one knows what it refers to. Pronominal reference is a subset of the more general problem of deictic reference (see 2.3.; deixis can be split into two types, anaphora and cataphora; in the language pair we discuss most in this section, cataphora, i.e. forward-pointing reference, however, is so rare, that we shall mainly speak of anaphora here; cf. Hirst 1981: 5, note 6). The most important difference between pronouns and other words having anaphoric reference function is the lack of semantic specificness of pronouns. While pronouns have all (or more) of the syntactic features of ordinary words, their semantics are restricted to some (fairly unspecific) semantic features such as (for English) male/female, human/non-human. Since finding referents for words in a text involves matching both syntactic and semantic restrictions (not to mention pragmatic restrictions) the absence of semantic specificness makes it far more difficult to single out one referent from a set of syntactically possible candidates. Besides being more difficult to resolve, pronouns form an obstacle for semantic 50
disambiguation. Disambiguation is based on the restrictive influence that words have when they combine to form sentences and texts. Though individual words may have a large range of possible meanings, the moment they are used in relation to other words, that range is narrowed down. In general, one can say that the more content words one can relate to a particular word, the less ambiguous that word will be. When trying to find words to relate to some word that has to be disambiguated, pronouns, by taking the place of full content words, cause gaps in the "network" of restricting influences. Such gaps can often cause a word or syntagma to remain ambiguous, unless one can identify the pronoun's intended referent, in which case the gap becomes filled, and can exert its restricting influence (see figure 10).
Figure 10 Demonstration of lack of restriction caused by a pronoun
It runs smoothly. to run
=
1. 'to go at a pace faster than a walk' 2. 'to pass or slide freely' 3. 'to function'
the watch runs smoothly = 3 the dog runs smoothly = 1 the rope runs smoothly = 2
2.2.1. Pronouns and translation In human communication, a certain lack of specificness, such as that caused by unresolved pronoun reference, can often go unnoticed. In many cases, human language users can ignore local failures in understanding by relying on their global understanding of the text. This recovery procedure is so automatic, that often people do not even realise they failed to disambiguate certain text-fragments, unless they are asked to paraphrase or explain that fragment precisely. Geoffrey Sampson (1987: 97) puts it this way: "It is a truism that language is often ambiguous; what I am saying is that ambiguity is far commoner than we usually realise, and that when we think some piece of language is unambiguous it is not because competent language-users have algorithms for resolving ambiguities but because we are good at not noticing ambiguities." The ability to recover from unresolved or misunderstood references can not be used, however, when translating the text in question. Translation is a process of substituting 51
target language tokens for source language tokens in such a way that the resulting meaning is as close as possible to the meaning of the original. Since source language and target language words very often have different ranges of possible meaning, it is not enough to substitute on a word-for-word basis. Only by analyzing the exact ways in which the source language words restrict each others' meaning and then finding a set of target language words that, together, restrict each others' meanings in very much the same way, can an adequate translation be produced. That this leaves very little room for vagueness and lack of understanding is shown by the difficulty most people have when translating texts. Suddenly they are forced to analyze everything in detail, whereas normally they can skip over difficult parts and still get the general message. Though most people are very good at processing and understanding natural language, only relatively few are really good at natural language translation, and then only after intensive training and a lot of experience. When pronouns cause gaps in the mutual disambiguating influence of the words in a text, this causes vagueness and ambiguity. One could ask whether it is not possible to simply translate the source language pronouns with equivalent target language pronouns and thus arrive at the same vagueness and ambiguity that is found in the original text. In this way, one could avoid total analysis, and leave it to the receiver of the target language text to interpret (or fail to interpret) the gaps in very much the same way as the receiver of the original text would do it. Unfortunately, this is usually not possible, for several reasons. First of all, the set of available pronouns and the distribution of syntactic and semantic features over them differ from language to language. This means that it is often simply not possible to find a target language pronoun with exactly the same set of syntactic and semantic features as the source language one. Second, even when syntactic and semantic features seem to coincide, the scope of reference of the pronouns can differ because of differences in conventions between languages a problem which is made more difficult by the inevitable range of exceptions. And third, the specificness of the content words in different languages can differ, so that a gap caused by a pronoun in one language can be more or less ambiguous than the gap caused by the equivalent pronoun in another language. When it is more ambiguous, the translation may be even harder to understand than the original, and (part of) the message may have been lost. And when the resulting gap is less ambiguous - i.e., the meaning of the surrounding words is more restricted - it is possible that some part of the intended meaning of the original has been excluded, again causing part of the message to be lost. Considering the problems outlined above, the procedure which a translator must apply to be able to translate a pronoun from one language into another must take roughly the following form:
52
Step 1: analyze the scope of reference of the source language pronoun. Which words in the surrounding text are candidate referents and which are not? Step 2: identify the referent most likely to be intended by the writer, Step 3: select one (or more) possible target language pronoun(s) that can at least refer to the most likely referent identified in step 2; Step 4: analyze the target language text-fragment in which the chosen pronoun occurs to find the set of all candidates it can refer to in its present context; Step 5: check: a) does the target language context contain enough information to make the candidate that was felt to be most likely in the source language also the most likely in the target language?; b) is the set of candidates possible in the target language version a reasonable approximation of the set available in the source language? Step 6: When step 5, the evaluation of the translation, is felt to be unsatisfactory, discard the chosen pronoun and return to step 3. Step 2 - finding the referent most likely to be intended by the writer - and step 5 checking the "effect" of the chosen target language pronoun in its context - are the steps that require the most skill on the part of the translator and involve the most complex decisions. How, if at all, those decisions can be emulated in an machine translation system will be the main concern of the present section. The problem of resolving deictic reference is complicated by the fact that some of the personal pronouns may lack a referent. Two quite different cases are very common in English as in a number of other languages: Pronouns whose possible referent is not made explicit, for instance in an elliptic expression, and pronouns that do not refer to anything, i.e. so-called impersonal pronouns such as English it in it is well-known that... 2.2.2. Pronouns and text coherence In view of the difficulties that are caused by the use of pronouns in a text, there must be some reason why they are used. Otherwise, why use them at all, when simply giving the intended referent would avoid so much confusion and ambiguity? One of the reasons that pronouns are used is a general principle in language usage, the principle that one should try to reduce redundancy to a reasonable extent. Saying that something in a specific text is redundant means that it contains more explicit information than is minimally necessary for the receiver to decode the message. There are several reasons why redundancy is a necessary part of language. Given the fact that natural language is first and foremost spoken language, it is easy to see why. Speech relies on sound to carry the message across. But sound is a far from perfect medium for this task. To begin with, sound is mainly sequential in time: a spoken utterance consists of a series of sounds, one following the other.
53
By the time the last sound has been uttered, all the preceding ones have vanished. This time factor makes it necessary that speech is processed in "real-time", i.e., the sounds that come in must be processed immediately, or new sounds will have arrived and previous ones are no longer available. By providing more information than is strictly necessary, this task can be made easier. Secondly, sound is extremely vulnerable to noise, disturbances from various sources that cause sounds to be changed beyond recognition, or not to arrive at all. A language totally free of redundancy would fail to communicate anything each time such disturbances occur. A possible third reason for the redundancy of language is reflected in the way we recognize objects in the world around us. We can tell one object from another because each object has certain distinctive features that make it uniquely different from the others. In general, only a relatively small number of features will be needed to tell one object from another, i.e., out of the larger set of possible distinctive features, we use only a few to recognize the actual object But which features are actually used depends on the situation, e.g. the number of objects to be distinguished, the necessity of distinguishing them, etc. Language "mimics" this situation by providing a variety of features that can be used to recognize the message in the utterance. Which features are actually used are not known beforehand, but depend on the situation in which the utterance is made.
A typical example of redundancy is the number agreement between subject and verb in English. In most situations it would suffice to indicate number in either of the two, instead of in both, a fact that is demonstrated by the ease with which most people can correct a sentence with an error in agreement. A kind of redundancy that is very much under the speaker's control is the way words are combined into sentences with reasonably little ambiguity. Within the bounds of grammatical conventions, a speaker is free to phrase her/his message in as many or little words s/he thinks is necessary to get the message across. Where one speaker may decide that two words sufficiently restrict each other for the message to get across, another speaker may feel this restriction to be too uncertain, and add one or more words to restrict the other words' meaning even more and make the message more explicit. A principle in language opposite to redundancy is the need for efficiency. Efficiency is a major factor in everything we do, and communication is no exception to this rule. When people communicate with each other, they will often try to keep the form down to essentials, in order to state the message as directly as possible. In speech, this is often accomplished by making use of non-verbal means such as pointing at things, emotive gestures, demonstrative actions etc. In written language, the balance between redundancy and efficiency is not the same as in spoken language. Among the various sorts of written language, the efficiency argument pertains especially to informative, non-literary texts, the prevalent domain of machine translation today and in the foreseeable future. The written text is much less subject to time, in that a reader, misunderstanding a word or syntagma, can easily reread part of the text to recover from the mistake. And noise is much less as well, especially in printed texts, where the variations of handwriting are completely eliminated. Both the lack of noise and the reduction of the time factor allow utterances in written texts to be longer, more complexly structured and more dense (i.e., more information in less words). On the other hand, the absence of non-verbal communication forces writers to be more explicit, i.e., to use more words to convey 54
the same message. Where, for example, a speaker can use the word it to refer to an object, by saying it and pointing to the object, a writer will have to mention the object explicitly before s/he can use the word it to refer to it. Pronouns are among the linguistic devices where efficiency takes precedence over redundancy. Pronouns replace more explicit words, to save time and space. They are often used to avoid repetition. Mentioning the same words over and over rapidly reduces the interest people take in a text (spoken or written). Or they are used to strengthen the relation of the utterance to its context. In speech that context is often the external world, in which the pronoun is a "place-holder" for the accompanying gesture, action or demonstration which makes the relation explicit. In written text the context is generally the text itself, and the pronoun can be used because other information in the text makes clear what element the pronoun is intended to replace. Pronouns both function by means of and help create text coherence. The fact that sentences in a text are related, and that the receiver relies on this to be so, allows a pronoun to refer to elements outside the sentence it occurs in itself. There are several devices that tie sentences together in such a way that a pronoun will suffice as a pointer to some other element (some of which are the object of study in this book). At the same time, pronouns strengthen the bond between sentences by explicitly signalling coreference. Two sentences in which the same words are used can imply that they refer to the same object, but can also be understood separately; a sentence with an outside demonstrative pronoun simply cannot be understood as a separate, independently functioning, element, but must be related to some other part of the text. In other words, pronouns generally force the receiver to become aware of links that could otherwise go unnoticed. And, once recognized, such links can then help to increase the redundancy of the text, in spite of the efficiency of the device used. The dual function of redundancy and efficiency makes pronouns very difficult to translate, yet very important to translate correctly. Translating a pronoun into an underspecified target language pronoun means that either the result is so ambiguous as to confuse the receiver, or the wrong referent will be chosen, in both cases resulting in a failure to understand the message. Translating into an overspecified pronoun reduces the text-coherence function of the pronoun, because the receiver does not need so much information to become consciously aware of the links between the sentences. In other words, the receiver will not experience the text as a single structure, but more as a collection of only loosely related sentences. This is at best only annoying (comparable to reading a badly written original text), but in the worst case it can cause a failure to get the message across, because wider inter-sentential connections remain unnoticed. Redundancy is taken up again and examined more from the standpoint of grammar, in section 3.1.
55
2.2.3. Syntactic restrictions In this section, we will identify the pronouns that are used in our sample text, and try to analyze 1. 2. 3. 4. 5.
what they (can) refer to; what their most likely referent is; what information is necessary to carry out steps 1 and 2; how the pronouns are translated into Esperanto; and what information is minimally necessary to carry out that translation.
Since pronoun reference and text coherence are so closely related, we cannot avoid making references to sections 2.3. to 2.6., in which various other aspects of text coherence will be discussed. In general, however, we will attempt to make this chapter independently readable, even where this means some repetition of material from the other chapters.
Figure 11 shows all the pronouns in our sample text, their possible equivalents in DLT's version of Esperanto, and in the Esperanto translation actually chosen.
Figure 11
Sentence 2 2 2 4 4 4 5 5 6 8 10 10 11 12 12
Pronoun used in the Text it which its others its its them who whose it it its which itself which
Possible Espéranto translations
Espéranto translation chosen
gi, lo kiu, Ho gia, sia aliaj gia, sia gia, sia ili, ihi, isi kiu kies
gi kiu
gi, lo gi, lo gia, sia kiu, kio si, mem kiu, kio
56
-
aliaj gia('j) -
ili kiu kies gi gi sia kiu('j) mem kiu
In sentence 2, we find the pronoun it, which is translated in the Esperanto version by gi. The most likely referent of it in this clause is "turtle geometry", closely followed by the new way of thinking about geometry. In fact, there is little to choose between the two, since sentence 1 explicitly equates the one to the other by giving "turtle geometry" as a name for the new way of thinking about geometry. This means that the only two candidate referents available (if we ignore the sub-title) are perfectly interchangeable without misunderstanding the meaning of the clause. How do we know that the two noun syntagmata of sentence 1 are indeed possible candidates? The first selection criterion for the referents of a pronoun are the pronoun's syntactic features such as case, number, gender etc. A pronoun's syntactic features can be a useful pre-selection criterion. A number of features require agreement: any possible referent must have at least the same values for those features. In English and Esperanto, for instance, the feature demanding agreement is number. In French number and gender. In general, syntactic agreement rules are deterministic, i.e., they require elements to have certain features in common, or else reject the analysis that linked them together. For pronouns this means that we have a relatively easy filter, with which we can at least reject all elements in the pronoun's context that do not have the required set of features. Though the syntactic agreement rules are deterministic in the sense that they allow a clear decision as to the possible dependency links of the words in question at the level of the syntagma, pronouns need not in all occasions agree with all their candidate referents. Example: My neighbour bought a Jaguar. They are great cars. It seems that such disagreements can occur when semantic and pragmatic signals are clear enough to overrule the syntactic rules. For the moment, however, we will assume that the syntactic rules are binding also for the pronoun-referent relation, since (especially in the informative type of text we are aiming at) this seems to cover the vast majority of cases. Later investigations will have to concern themselves with the more intricate cases as well.
However we decide which elements in a pronoun's context are candidates to be that pronoun's referent, we can always reject the ones that do not agree in syntactic features. There are some (not many) differences between the personal pronouns systems of English and Esperanto, and some more between those of Esperanto and French. The main difference is that French has a syntactic feature gender, whereas English and Esperanto only have a semantic feature sex. As for a personal pronoun as subject of a verb, the differences between English and Esperanto are not in the agreement rules, both require only number agreement, but in the case system and the references to the sex of the referent. English pronouns (e.g. I) show the cases objective (me), genitive (mine), possessive (my) and reflexive (myself). To begin with reflexive, Esperanto has only one pronoun for the third person, si, which is used for all subjects, irrespective of sex or number. Where English has the genitive or possessive, Esperanto uses the regular adjectival morpheme a: compare
57
La libr'o est'as verd'a. 'the book is green' La libr'o est'as mi'a. 'the book is mine' DLT's version of Esperanto makes a finer distinction in the references to the various sexes of the referent (male, female, unknown or neuter), but this is a semantic feature, which we will discuss later. (Common Esperanto has no third person singular pronoun for unknown but non-neuter sex, and only a single one for the third person plural irrespective of sex. The DLT augmentations were introduced for the purpose of enhancing fully automatic translation from the intermediate language; see 1.3.). The difference between French and Esperanto or English is the syntactic feature gender, which loosely has to do with sex, but mainly is a syntactic feature of all French nouns, which can be found in the dictionary. This feature makes French personal pronouns more specific than English and Esperanto ones, since whereas all inanimate objects are considered sexless in English and Esperanto (with well-known exceptions in English), and receive the same neutral pronoun, in French a different pronoun is used for masculine nouns than for feminine ones. This causes problems when, for example, translating the clauses: The men and the women arrived this morning. They were too late. into French. Not knowing what they refer to, we can still translate into Esperanto: La viroj kaj la virinoj alvenis ci-matene. Ili estis tro malfruaj. where ili has exactly the same ambiguity, but the French is either: Les hommes et les femmes arrivaient ce matin. Ils étaient en retard. Elles étaient en retard. where elles refers only to femmes, and ils either to hommes, or to les hommes et les femmes.
2.2.4. Syntactic redundancy There are a few instances of pronoun usage in the sample text whose reference seems to be completely syntactically determined at the clause level. The first of these occurs in the sentence fragment in sentence 4: contributed to its development and to its applications. The second its is by virtue of its clause-syntactic function determined to be identical to the first one, that is, the second one will be interpreted to have whatever referent the first one is interpreted to have. The reason for this is twofold. To begin with, we can postulate a general rule, which we will use again later on, that says that when two syntactically and semantically identical pronouns (i.e., having the same set of features) are placed next to each other - i.e., with no noun (syntagma) in 58
between them - they will usually refer to the same referent. As the word usually already indicates, this rule is not always valid. The rule works, unless there are semantic reasons not to apply it. Since these semantic reasons cannot be formulated in a "hard" constraint, the rule discussed is a preference rule and not a deterministic one. It would seem that the semantic element included in the rule just postulated makes semantic checking inevitable. There are exceptions, however. Under certain circumstances syntactic checking can make semantic checking unnecessary, giving the rule a deterministic character. In our example, the two identical pronouns occur in a special kind of construction, a coordination of two prepositional syntagmata. This coordination (which might be interpreted as an ellipsis, with the second occurence of contributed omitted) is syntactically recognizable. Moreover, it causes the dependency type of both preposition syntagmata to be the same. In other words, the syntactic coordination clearly connects the two syntagmata to the same verb, with equal labels. For the pronouns in the two prepositional syntagmata this means that their left-hand context is in fact identical; they both operate in exactly the same environment. When two identical function words operate in identical environments one is justified in assuming that their function is identical as well. In the case of our pronouns, this means that their referents are identical. Syntactic parsing will clearly show this (figure 12), thus making semantic checking unnecessary.
59
Figure 12 Partial dependency tree of sentence 4, showing the exact parallelism of the two pronouns
have PAP contributed CIRC
CIRC-C
CIRC-C to
to
PARG
PARG development
applications
DET
DET
its
its
The order in which the two pronouns are assigned a referent (i.e., which of the two is actually resolved, and which is simply assigned the result of the other) does not really matter, since the end result will, semantically speaking, be the same. Since language, or rather speech, is essentially a sequential mechanism, however, it seems not unlikely that in fact the first occurrence of the pronoun will be resolved in the normal way, after which the second one will simply be analysed as referring to whatever the first one referred to, once its identical environment has been recognized. If we look at the translation, we see that there the first and the second pronoun have been merged, which in turn made the repetition of the preposition superfluous: kontribuis al giaj evoluo kaj aplikadoj. Apparently the translator reasoned that, since the two pronouns automatically refer to the same concept, they could be translated into a single pronoun in Esperanto. This is possible, and in Esperanto natural, since no syntactic ambiguity arises. The plural morpheme j in the pronoun gi'a'j does not agree with the singular noun evolu'o, which clearly indicates that gi'a'j depends not on evolu'o alone, but on the coordinated construction as a whole.
60
Figure 13 Partial dependency tree of the Esperanto version of sentence 4. On the rationale of representing common dependents of coordinated syntagmata in this way, see Schubert 1987b: 104ff.
kontribuis PREC al PARG
The pronoun is repeated in English not only because English lacks a syntactic plural marker in personal pronouns. Consider the three sentences below: A. She talked about her friend and companion. B. She talked about her friend and her companion. C. She talked about her friend and about her companion. The differences between the sentences is slight, but not trivial. A possible interpretation of sentence A is that friend and companion form a single unit, one concept that is described by means of two coordinated nouns (one person). (A special conjunction has been introduced into DLT's version of Esperanto for disambiguating these cases, kau, as opposed to the normal kaj: amiko kau kamarado de si 'a friend and companion of hers [one person]' versus amiko kaj kamarado de si 'a friend and a companion of hers [two persons]'.) In sentence C, the repetition of about her separates the two nouns, and makes it more clear that her friend and her companion are two completely separate concepts. Sentence B lies somewhere in between, in that it separates the two nouns, but not as strongly as sentence C. Obviously these ambiguities occur only when the two words denote compatible concepts. The sentence 61
She talked about her friends and books leaves no doubt (if it is not rejected as ungrammatical, as is likely for the singular version of this sentence) that the two nouns refer to distinct concepts. Sentence 4 of our sample text has the same pattern as sentence C, above, which means that its most likely interpretation is that its development is a separate concept from its applications. The choice made by the translator suggests that he thought this question unimportant for a correct interpretation of the sentence. This might give rise to a mistranslation, when we consider the tail of the sentence, both in education and in other fields. The English version makes quite clear that this tail belongs solely to its applications: its development is clearly separated by the repetition of to its, which is very unlikely to be "crossed" by the subsequent tail. In the Esperanto version, however, there is no such separation, and it is somewhat more likely that the tail will be interpreted as modifying both development and applications. In other words, the Esperanto version can still be interpreted in the same way as the original, although it is not unlikely that the default interpretation is different. A translation that departs from the original with regard to ambiguity, such as the one discussed here, need not have very serious consequences for the rest of the text, nor is it always avoidable. (And if it is unavoidable, it cannot be called a mistranslation.) The example nevertheless shows that one should not rely exclusively on syntactic properties at the syntagma or clause level (such as the apparent redundancy of the second pronoun), but carefully study the possible syntactic and semantic effects in a wider context before making a choice. This principle is even more important in an machine translation system. A human translator may occasionally deviate from the letter of the original (e.g. to get a more "natural" target language version), since s/he may be trusted to have enough insight and understanding not to change the actual message of the text. The question as to whether the tail of sentence 4 does or does not belong to development as well as to applications, for example, is not really significant for the rest of the text. But a machine translation system, lacking such global understanding, cannot "know" that. It seems necessary, therefore, to carefully study all possible surface variations, to find out what their semantic functions are, and how to copy these functions as faithfully as possible into the target language text. We do not mean to imply that this strategy will result in perfect translations. The very nature of natural language makes the concept "perfect translation" an unobtainable ideal, and also for the most part a subjective one. But only an intimate knowledge of the effects of surface variations on the (default) interpretation of a text, can make it possible to achieve something like "high quality machine translation".
62
2.2.5. Syntactic rules and semantic features One important group of pronouns in written texts is that of the relative pronouns, which have the clearly defined syntactic function of explicitly "linking" one clause to another. In English, the words which, who, whom, whose and that are used to connect a clause to a preceding noun syntagma, by making the relative pronoun (syntactically) a direct or indirect dependent of the verb in the subordinate clause, and having the pronoun (semantically) refer to a preceding noun syntagma, which is accordingly called the correlate or antecedent (cf. Schubert 1986a: 76ff.). In most cases the situation is straightforward: In English the relative pronoun - which does not really have many syntactic features of its own - copies the entire contents, including the syntactic features of it correlate - which is shown in (for example) the fact that the verb of the relative clause shows number agreement with that noun syntagma (singular in sentence 2, plural in sentence 5). All the examples in the table conform to this pattern. There is, however, at least one relative pronoun in the text whose correlate is not simply the preceding noun syntagma. Consider the sentence: 5. Among them are Harold Abelson and Andrea A. diSessa of Ml J., who have set forth the ideas underlying turtle geometry in a remarkable expository work [...] If the "preceding noun syntagma" rule was absolute, the referent of who in this sentence would be M.I.T. This is a correct possibility, since M.I.T., referring to a human organization, can belong to the so-called collective nouns; nouns that semantically denote more than a single item and (in British English) may show syntactic plural agreement, but have no explicit plural ending (e.g. the police, the cabinet). This agrees with the plural form of the the verb have. Even semantically everything seems to be in order. MJ.T. being a human organization can be referred to with a collective who. Yet most people reading the text will prefer Harold Abelson and Andrea A. diSessa as referent of who, over MJ.T. The question is, why? First of all, the rules outlined here do not work on linear sentences, but on syntactic dependency trees. When the correlate of who is sought in the syntagma Harold Abelson and Andrea A. diSessa of M.I.T., we therefore should not consider this syntagma as a string, but as a tree:
63
Figure 14 Alternative partial dependency trees of sentence 4 M J.T. depends either on the second noun syntagma or on the coordinated structure as a whole, represented by the conjunction are SUBJ
M.I.T. Note that the syntactic function of the free adjunct of MJ.T. is ambiguous; it can depend on just Andrea A. diSessa or on the coordinating conjunction. The framework within which this study is found applies semantic choices on syntactically possible alternatives. Therefore the two syntactic alternatives are treated separately (and in parallel). A conrelate-seeking rule that works on the English tree structure (before it enters the translation process proper) thus handles two distinct, but each unambiguous tree structures. But whichever of the two is treated, the "preceding noun syntagma" in terms of trees is not MJ.T. but the coordination, represented by its internal governor, the conjunction and (on this way of representing coordinated structures, cf. Schubert 1987b: 104ff.), and it does not really matter in which way of MJ.T. depends on and. This is true as long as there is no clash in syntactic features between the relative pronoun and its candidate correlate. We could postulate a rule that in such cases the noun syntagma's internal governor or head - i.e., the highest noun within the structure - is the element the relative pronoun refers to, instead of the last noun in the linear word string. The rule that assigns the feature plural to two coordinated nouns makes it syntactically possible for who to refer to Harold Abelson and Andrea A. diSessa, since this then agrees with the plural feature of the verb have on which who depends. We could refine the rule somewhat, saying that in cases where a complex 64
noun syntagma has an internal governor that does not syntactically match with the verb of the following relative clause, a dependent noun will normally be chosen as the correlate, provided that this dependent noun does agree. An example is the sentence: A. She talked about the captain of the leading team, who has scored so many goals. in which we assume that who refers to captain, since it is the internal governor of the noun syntagma, and it agrees syntactically with the verb of the relative clause. If we change the sentence into: B. She talked about the captains of the leading team, who have scored so many goals. the reference may still be to the internal governor of the noun syntagma, although the verb of the relative clause now agrees with the leading team as well. Changing the sentence into: C. She talked about the captain of the leading teams, who have scored so many goals. however, necessarily places the reference on teams, as this is the only available plural noun. Unfortunately, the English set of relative pronouns does not allow us to identify the correlate with syntactic features only. It would be a possible and by no means uncommon solution to adopt semantic features in addition to the syntactic features. If we consider the three pronouns who, which and that, we see that they show an important semantic feature, which can be used to eliminate some of the syntactically possible correlates found by the syntactic rule postulated above. Who is used to refer to persons (and some animals etc.), which refers to non-personal correlates, while that can refer to either. But the situation is not really this simple. In addition, which can also be used to refer to collective nouns such as the family, the police etc., that are then considered to be non-personal singular. In short, to deal with the relative pronouns of English we might introduce the semantic feature "person/non-person" in our system. Since the number of pronouns is finite it should not be so difficult to make an exhaustive list of all the semantic features needed by them. One approach could then be to mark all the nouns in the dictionary with the appropriate set of features, so that a simple matching procedure, comparable to that of matching syntactic features could be used. For English pronouns, the semantic features male/female, person/non-person and collective/non-collective would suffice. For Esperanto and French, we would have to add animate/inanimate. Assuming that all the words in the dictionary are marked with these features, we may be able to solve many cases of pronoun reference deterministically.
65
2.2.6. The inadequacy of syntactic and semantic features The usefulness, but also the inadequacy of syntactic selection mechanisms, even if augmented with simple semantic features, has been demonstrated by Jerry Hobbs (cited from Sampson 1987: 99ff.). Hobbs constructed an algorithm that was completely based on syntactic features and structures of the text. This algorithm was capable of resolving around 77% of the pronoun references it encountered. This score, though impressive, clearly indicates that a purely syntactic system will fairly often choose the wrong (read: unintended) referent for a pronoun, something that can completely ruin a translation. Hobbs arrived at the same conclusion, and augmented his system with semantics. Using a relatively simple feature semantics he managed to improve performance to 83%, this score still not being perfect, Hobbs concluded that only sophisticated semantic algorithms could hope to reach a near 100% score, and that syntactic means, though useful, were not the solution. An interesting point is made by Sampson, who argues that human readers also score well below 100% - some 90% to be precise. His argument is that we should not attempt to create computers that are better than humans in the task of natural-language processing, an attempt he feels is much too ambitious. Instead we should be grateful for whatever results we can get using the means we have available. His conclusion is that it is better to use relatively straightforward methods that give reliable (to a known degree) results, than to throw them away in favour of complicated methods that have not proven their worth and require enormous effort to develop.
The insufficiency of the syntactico-semantic feature approach is clearly demonstrated by several pronouns in our sample text. As in most informative texts, the nouns used are largely inanimate (which in English automatically means non-personal, so that the feature male/female has no function). This uniformity in feature distribution means that in many cases the semantic features are still not sufficient to select a single candidate, if they help at all. A few examples from the text: 2. It is closely connected with the programming language Logo [...] Candidate correlates: The new way of thinking about geometry "turtle geometry"
= inanimate = inanimate
10. [...] it leaves a record of its path Candidate correlates: a pen the turtle a sheet of paper paper
= inanimate = inanimate = inanimate = inanimate
66
The idea of simple semantic features is tempting and has indeed often been suggested and even been tried out to a certain extent. But although feature assignment may be simple . for the relative pronouns, the lexicographic task of determining the corresponding features of all nouns in the dictionary is not only time-consuming but also much less easy. However clear the animate-inanimate distinction, for example, will be for the bulk of entries, there are always doubts and hesitations in borderline cases, in metaphorical use etc. As usual in semantics, there are no "hard" rules and no "hard" boundaries. We therefore opt for a coirelate-seeking procedure that -
first identifies syntactically possible candidate antecedents, then performs a semantic match between the pronoun and its antecedent, taking into account the immediate clause level context of the two, and ranks the candidates according to their semantic probability in the given context, - subsequently examines the text level context for semantic and pragmatic matches, - and finally (in an interactive environment) leaves the decision to the user when none of the candidates is selected as more probable to a reliable extent than all the others.
2.2.7. Focus We have already said that pronoun reference resolution is an important factor in the detection (and reproduction) of text coherence. A person reading a text is confronted with a sequence of statements, each statement involving two or more concepts. It is not very likely that s/he will be capable of remembering every statement verbatim. On the contrary, most people cannot literally repeat any single sentence of a text they have just read. What they can do, however, is give a more or less accurate summary or paraphrase of what they have read, which indicates that they have not just forgotten the text, but, instead, have stored it in some way that does not include literal reconstruction of the original text. The key element in human text processing is memory. Psychologists have for some time used a model of human memory that has (at least) three divisions: a) the immediate memory, necessary to process incoming data; b) short term memory, in which processed data can be kept for a limited time, after which it will rapidly vanish, unless it is stored in c) the long term memory, where processed data can be stored permanently, to be retrieved when necessary (figure 15). It is this division that can be used to explain some essential characteristics of the way humans perceive a coherent text (cf. Schank's "Conceptual Dependency Theory", which also features this division, Schank 1975, and the Translator system, Tucker/Nirenburg 1984: 152).
67
Figure 15 Three-part division of memory
Immediate
Short-Term
Memory
Memory
Long-Term (Permanent) Memory
The three-part division of human memory would seem to imply the following model of text processing. Using immediate memory, the incoming signals (visual in case of written text) are processed, which involves recognition of letter shapes and words. Recognition results in the activation of concepts, presumably in some sort of semantic representation which will be built up in the short term memory. The representation in short term memory is foremost in the reader's attention, since it is immediately available. When a new clause arrives, the concepts recognized therein will in principle replace the ones currently active in short term memory, which will be moved on to long term memory (possibly in some other representation), where they are not directly available, but can be retrieved with some effort, if necessary. Since a text is more than a collection of separate sentences, a reader will attempt to link new concepts to the ones currently available in short term memory, before storing things away in long term memory. When no apparent link can be made between incoming concepts and the contents of short term memory, attempts will be made to find links with elements in long term memory, which will first have to be retrieved, i.e., placed back in short term memory (possibly in reversed chronological order; what arrived last will be retrieved first). It must be noted that there are very many ways to establish links between concepts, many of which involve complex lines of reasoning (inferencing) with active use of the wide body of knowledge already present in long term memory. We assume here that two (opposing) principles guide the processing of incoming clauses: the principle of "maximum understanding" and that of "least necessary effort". The first implies that a reader will try to relate as much of the incoming clause to her/his existing knowledge, to understand the message as completely as possible. The second assumes that a reader will stop processing the incoming clause as soon as a definite link has been established with already processed information, relying on the possibility of subsequent retrieval and additional processing, if such proves to be necessary for proper understanding. These two principles will maintain a balance between "full understanding", which can take large amounts of time, and "sketchy understanding", which is very fast, but ignores many of the details (explicit and implicit) present in the text Human readers apparently allow the task at hand to determine the relative strength of the two
68
principles. In other words, the "purpose" of the text, or its "usefulness" in a given situation, determines the depth of processing considered necessary. For our present purpose, it should be possible to find a balance appropriate for the task of translation, i.e., we should strive to build a system that "understands" enough to make an adequate translation possible, but does not process large amounts of information that will not in any way influence the actual translation output (except in slowing the system down).
Because of the large number of possible links a reader could construe, the careful writer of a text will generally attempt to "guide" the reader to the correct (or rather: intended) interpretation, by providing a variety of clues. The effectiveness of those clues is largely based on expectation. The linguistic form of a statement rouses in the reader a set of expectations for what is going to follow. The sense of coherence a text can cause in a reader largely depends on how well (how "considerately") the sequences of clauses rouse and fulfil his/her expectations. Many researchers have concentrated on the (surface-)syntactic clues that can be found in a text, in an attempt to solve the difficult problems of pronominal reference, and reference in general. In English, sentence level syntax relies very much on the sequential ordering of the elements (words, syntagmata ...) in a sentence. This establishes a link to rough theme-rheme or topic-comment distinctions (for a detailed discussion of theme and rheme, see 2.4. and 3.3.2.), in which the rheme (comment) is approximately everything to the right of the finite verb (and focus the central part of it). This makes it a piece of text which is directly identifiable without reference to semantics. Candace Sidner (1983: 273) - who develops a semantic, deep case-based system for the identification of what she calls focus - before entering a more detailed discussion, defines focus a bit vaguely as a piece of text which "speakers center their attention on". Since her definition is meant to cover both written and spoken language, we may paraphrase it, and gear it towards written texts, saying that Sidner's focus is that element that is supposed to be foremost in the reader's attention right after the clause has been processed. Sidner's (more or less tacit) presupposition with regard to deixis is this: Because the focused element is the one that receives prime attention, the following clause will first of all be searched for links with that focus. This comes down to the assumption that any recognizable instance of backward reference in a clause will by default point to the preceding focus, unless there are reasons (syntactic, semantic or pragmatic) not to inteipret the reference in this way. If a system is able to identify the focus, it has access to a powerful preference mechanism, which may not be foolproof (which nothing in semantics and pragmatics seems to be) but which can at least to some extent enhance performance. Let us try to apply this idea of the preceding focus as the prime candidate for a referee to the turtle geometry text.
Let us see how the recognition of focus can help us find the intended referents of some of the pronouns in our sample text. The first pronoun in the text is in sentence 2: 2. It is closely connected with the programming language Logo [...]
69
preceded by sentence 1: 1. The new way of thinking about geometry has come to be known as "turtle geometry". We already showed that the new way of thinking about geometry and "turtle geometry" are conceptually virtually equal through the syntagma has come to be known as. Nevertheless, we can make a choice between the two noun syntagmata if necessary, by means of the focus, which, according to the suggested rule, is on "turtle geometry". Although the choice of actual referent in sentence 2 Was not very important for a correct understanding of the clause in question, the pronoun its, in the following fragment of sentence 4 : 4. Many others have since contributed to its development [...] can refer to (at least) the following candidates: a language for introducing children to computers MJ.T. Seymour Papert of MIT. Logo Once these noun syntagmata have been selected as syntactically possible referents, the syntagma Seymour Papert of MJ.T. can most easily be rejected by semantic rules, because it does not match the non-person requirement of its. But how do we choose between the other three candidates? To begin with, the free adjunct primarily as a language for introducing children to computers is what Dik (1978: 153) calls a tail. A tail does not really belong to the clause, but is added as an "afterthought" or additional information. As such, tail information is never focused, but it can help to enforce primary focus in the clause it is added to. The abbreviation MJ.T. is accessible, since the syntagma it is part of, Seymour Papert of MJ.T., is focused. Logo, finally, is least focused. However, its low "focusedness" is increased by the reference identity of Logo and a language for introducing children to computers of the tail. What we have to do here is a trade-off between the rough word order-based rule that identifies MJ.T. as the focus and the focus-strengthening effect of a tail. For a human, the conclusion is clear: its refers to Logo. We mentioned earlier (2.1.11.) that the passive construction is much more marked in Esperanto than in English. It is interesting to see, therefore how the focus, as defined in terms of a left-right ordering, has been maintained, even though the syntactic structure of the sentence has been changed. The translation of sentence 3 is:
70
3. Logon elpensis, en la sesdekaj jaroj, Seymour Papert ce MJ.T. 'Logo+OBJ conceived, in the sixties, Seymour Papert at M.I.T. unuavice kiel lingvon por eklernigi infanojn pri komputiloj. primarily as a-language for beginning-to-teach children about computers' The word Logo is still in frontal position, but is now the object of an active sentence rather than the subject of a passive one. When considering syntactic functions as indicators of a theme-rheme distinction we must realize that we cannot simply transfer focus rules that were constructed for the English language to any other language. In the case of Esperanto, there are indeed several rules that function in the same way as in English. But other languages may well have completely different patterns to guide focus with. Several oriental languages, for example, use a special focus morpheme, to explicitly indicate which element is focused. Before we can incorporate focus rules into our system we must, therefore, carefully study the focus mechanisms of all languages involved. Since Esperanto is the pivot of the DLT system, the differences and similarities between each possible source or target language and Esperanto must be charted, so that for each natural language focusing device the nearest possible (or equally effective) Esperanto equivalent can be found. A similar case is the sentence: 10. A pen can be mounted on the undercarriage, so that when the turtle is made to wander over a sheet of paper, it leaves a record of its path. in which we must find the intended referent of it, which will be automatically assigned to its as well. Candidate referents for it are: a sheet of paper the turtle the undercarriage a pen If we apply the focus rule as we did above, the focus of the first clause will be on undercarriage. However, so that introduces a second, subordinate, clause, which has its own focus, on a sheet of paper, by virtue of the same rule. The effect on the third clause, beginning with it is that there seem to be two focused elements directly available, the undercarriage in the main clause, and a sheet of paper in the clause functioning as adverbial of condition. There seems to be no clear-cut rule for this situation, as was indicated by the fact that most of the people we asked to resolve the reference of it fluctuated between a pen and a turtle, without really being able to choose, but none of them seriously considered the undercarriage or the sheet of paper. This indicates that our focus rule is much too rough and also that it should not be applied prior to semantic-pragmatic attempts to match candidate referents with the text 71
surrounding the pronoun. What we must try to find is a rule that makes use of the degree to which a syntactically possible referent fits in with the semantic context of the pronoun. When the syntactic identification of candidate referents is accomplished, another syntax-based default rule such as the suggested focus rule should then only function as a last way out, when the semantic selection does not bring about a reliable choice. And even for that purpose the focus rule should be improved. Before we return to this idea, we review a few more approaches.
2.2.8. Speech acts In this book we will largely ignore the difficult field of speech acts, mainly because very little progress has been made in the field in relation to machine translation and/or natural-language processing. On the theoretical side, some interesting models for incorporating speech acts into discourse analysis have been proposed, but much work will have to be done before such models can be made useful for a machine translation system.
2.2.9. Conclusion What we have demonstrated in this chapter, is that something as common as pronominal reference, requires a complicated system of rules to be handled properly. We have discussed rules basically from three realms. Firstly, on the least difficult level, function the syntactic features. They have the big advantage that they are deterministic, i.e., they can function as absolute filters, but they are not very specific, and will seldom be able to select a single referent out of a set of possibilities. Rather, rules that rely on syntactic features select candidate referents for the subsequent processes. Secondly, we tried out a similar feature-based approach to the semantic reference conditions of pronouns. Pronouns have some semantic features, which could function in the same way as the syntactic ones. Like the syntactic features, they are not very specific, however, and are clearly inadequate by themselves. Moreover, there is no efficient and reliable way of assigning the correspondent semantic features to the content words that are considered as candidate referents. Even if it could be used in practice, this level would no longer be deterministic. Semantic feature rules belong to the preference rules of language. Thirdly, we have in a sketchy way taken up a very preliminary kind of focus-detecting mechanism. The rule we outlined can, in some cases, point to the most likely referent out of a set of possibilities. As we have seen, however, this choice can be overruled when the referent is semantically awkward. This means that semantics will have to lead to the final decision (if we forget about help provided by the user). The kind of semantics that is needed here functions on the basis of preferences, so the final outcome will never be a clear-cut decision, but a "ranking" of possibilities, in which, usually, one candidate scores notably higher than the others and will, therefore, 72
be considered to be the intended one. Should for some reason several candidates score equally high, we have a case of true ambiguity, which cannot easily be solved with the mechanisms discussed so far. To conclude this section, we outline a number of steps to be included in an algorithm for combining the absolute selection rules based on syntactic features with the preference rules of semantics and focus. This algorithm has as goal the selection of the most appropriate referent. How this can be implemented into a working system is an open question, and will have to be investigated in detail. Initially, the set of candidates will only be sought from the clause directly preceding the clause containing the pronoun. When this clause contains no suitable candidates, the one before that will be used, and so on. An interesting trade-off has to be made between nearby, but less probable candidates on the one hand and more probable, but also more distant ones. Step 1: Select noun and pronoun syntagmata as candidate referents. Step 2: Check all the syntactic features of all candidates, eliminating those whose features do not match those of the pronoun. Step 3: For the not yet discarded candidates, check the degree of semantic fit with the context of the pronoun. Arrange the candidates on a scale of semantic probability. Step 4: Refine the semantic scores by adding pragmatic context knowledge (to be discussed in 3.5.). Step 5: If the selection procedure so far has not yielded a clear decision for a single candidate, revert to default rules such as focus-based preference choices. Step 6: If the selection process so far has not yielded a clear decision, invoke the disambiguation dialogue (source language side) or choose the most probable candidate, however slight the difference in score compared to other candidates is (target language side).
73
2.3. Content word reference
The pronouns we treated in the last section are in fact cases of a very frequent phenomenon in texts: the use of different words for the same "thing". We assume that words have meaning because they refer to concepts of events, qualities, items etc. in the outside world. However, words do not have a neatly fixed one-to-one relation to the concepts we, the human language users, perceive. On the contrary, words are shifty and unstable. Rather than precisely pointing to a single concept, a word in isolation can usually refer to a whole range of concepts, not necessarily related to each other. In fact, a single word points to a conceptual cloud rather than to a single well-defined concept. Moreover, it is a question of definition or belief whether there are distinct concepts, whether concepts have cross-linguistic reality and whether they are independent of language. It is only when words are used together, in structured combinations, that their conceptual clouds begin to "thin out": the range of possible references of each word is narrowed down by the other words it is used in combination with. As for a translation purpose, this mutual restriction process need not be pursued to the point that there is only a single concept left, especially since there may not be any "single", or atomic, concepts. What translation needs is not a way of arriving at a single concept for a given word, but a way of singling out a translation alternative in a target language (see the distinction of monolingual and bilingual ambiguity in 1.1.). Words that restrict each other's meaning in this way are said to disambiguate each other by means of mutual meaning restriction, in that they establish a context for each other (figure 16).
Figure 16 Words restrict each other's meaning (cf. 2.2., figure 10)
to run
>
kuri, gliti, funkcii, funkciigi,
the dog runs smoothly the rope runs smoothly the watch runs smoothly
> > >
kuri gliti funkcii
74
...
Not only do words refer to a range of possible referents rather than to a single one, there is also considerable "overlap" between words and most concepts can be referred to in more than one way. Different words usually have different connotations and associations, stressing different aspects of the same concept. An interesting study by Karen Sparck Jones (1986: 77ff.) shows how words can be related to each other in terms of synonymity: two words are synonymous when they can be used in exactly the same contexts, without changing the meaning (i.e., they are interchangeable). In reality, most such word pairs are at best near-synonymous: they share a certain number of contexts, but also have contexts particular to themselves. The more context they share, i.e., the more easily they can be interchanged, the more synonymous they are. To put it differendy: Two words may in a number of contexts be synonyms and may not be so in other contexts. The more contexts they share, the more synonymous they are.
2.3.1. Lexical variation Any time someone wants to refer to some concept, s/he is faced with the task of finding the right words to make that concept clear to the people for whom the message is intended. Usually, there are several words for a given concept, and, using compounds, complex syntagmata, or even complete clauses, an infinite number of ways to refer to that concept. The key to this so-called lexical variation is the synonymity discussed above. Synonymity is never total, and different words, even when referring to the same concept, highlight or imply different aspects of that concept. For many purposes, a single word does not contain enough information to make clear what the sender has in mind. And often single words are not specific enough, leaving too much to the imagination and understanding of the receiver. By using different words for one concept, i.e., paraphrasing, the sender can try to describe the concept in a number of different ways, providing both additional information and - by means of the semantic restrictions words impose on each other - specifying more precisely the concept s/he has in mind. The task of the receiver is first of all to recognize that in the variety of words and syntagmata the reference is to a single concept. This is not an easy task, and unless the sender of the message provides assistance in the form of explicit clues to guide the receiver, misunderstandings may well arise. Recognizing the concept referred to by a variety of words and syntagmata is a primarily semantic process, involving two kinds of conceptual matching. The first matching that must be carried out is to establish the synonymity between the words and syntagmata expected to refer to the same concept. How synonymity can be handled in the DLT system is discussed in detail by Papegaaij (1986: 105ff.), so we won't go into that here. For now we will simply assume that synonymity can be measured in some way, and that human language users are capable of checking 75
whether two words or syntagmata can indeed refer to the same concept. The second matching process is more complicated and involves both context matching and consistency checks. When two words or syntagmata (we will use the word term to refer to words or syntagmata from here on) are found to be possible synonyms, that is no guarantee in itself that they do indeed refer to the same entity. In a single text, near-synonyms and synonyms may well be used to refer to separate entities (see below). What must be found is evidence that within this text the two terms are indeed meant to refer to the same concept. The distinction between concepts and entities is essential to the nature of the processes discussed here: A concept refers to a specific type of items (events, qualities ...), whereas an entity is a particular instance of this type. The concept is thus a unit of semantics (or, if concepts are believed to be language-independent, of general semiotics), while the entity is taken to have reality in the extralinguistic world and accordingly has to do with the realm of pragmatics.
2.3.2. Reference identity and translation Recognizing that two terms refer to the same entity is of crucial importance to the translator of a text. We said that reference identity between different terms is possible because of the way words partially overlap each other in meaning, and have "ranges" of meaning rather than a single fixed one. However, as we already saw with the pronouns, the distribution of grammatical features can differ considerably between languages. In particular, the distribution of words over the available concepts is far from symmetric. A "single" concept could be referred to by several different words in one language, and only one in another, or vice versa. And it is even possible that a word in one language has no equivalent in the the other language at all, but that it can be translated by means of a more or less elaborate paraphrase only. An interesting point is how "translatable" human languages are. In particular can we always find a paraphrase for a particular source language word, even when the concept the word refers to seems to be absolutely alien to the target language (and culture)? Various sources say that such is indeed the case (e.g. Larson 1984: 437).
Should a translator consider only local information when translating words, i.e., translating coherent sentences, without examining the broader context (with the exception, obviously, of the pronouns)? It may well be that two terms that were meant to be co-referential in the original can now no longer be so, because the words used have no common element of meaning. A second consideration is that recurring terms, when meant to refer to a single entity should be translated in a consistent way. A typical pattern (at least in English) is, for example, the introduction of a entity with a long description (e.g., the Massachusetts Institute of Technology), after which the term is gradually reduced in length (e.g., the Massachusetts Institute, the Institute of Technology, the Institute) or drastically
76
abbreviated (M.I.T.). This pattern involves using the words that make the complete term in different local contexts. A translator looking only at local contexts may well choose one translation in one situation and a completely different one in another (figure 17). This would totally destroy the coherence within the text, and probably lead to significant misunderstandings on the part of the target language receiver.
Figure 17 Some words with a general and a specialist meaning
community 'group of people'
>
community 'the European Economic Community'
file 'folder for documents' 'documents in a folder' 'line of people'
>
file 'data set in a computer'
All in all, it is important that a translator is aware of the referencing function of the words s/he is trying to translate. Only in this way can a coherent translation be achieved. The study of lexical references inside (and outside) a text is therefore, important for anyone trying to build a machine translation system. Let us trace some of the concepts referred to in our sample text, to see how they are referred to, and what syntactic evidence there is that a certain tenh is meant to refer to an entity already mentioned instead of to something new.
77
Figure 18 Reference identity in the turtle text Item 1
2
3
4
5 6
Terms used a view of geometry from the inside out the new way of thinking about geometry turtle geometry it turtle geometry the programming language Logo which Logo a language for introducing children to computers its its the Massachusetts Institute of Technology Mil. MJ.T. the original turtle a mechanical device a small wheeled device whose the first such creature it a mechanical turtle the turtle it its such "floor turtles" the surface of a cathode-ray tube the screen instructions typed on a computer keyboard commands or programs entered at the keyboard
Sentence Sub-title 1 1 2 5 2 2 3 3 4 4 2 3 5 6 6 6 6 7 8 9 10 10 10 11 11 12 6 12
2.3.3. Indefinite versus definite reference If we look at figure 18, we can roughly divide the terms used to refer to the entities listed into four, syntactically distinct, groups. The first group is formed by the noun syntagmata with a definite article the (the new way of thinking about geometry, the programming language Logo, the Massachusetts Institute of Technology, etc.), the second group are the noun syntagmata with the indefinite article a or no article at all (a view of geometry from the inside out, instructions typed on a computer keyboard, a language for introducing children to computers, such, "floor turtles", etc.), the third 78
group are the pronouns, already discussed in the previous section (2.2.), and finally there are some "proper nouns", nouns that are used as names, and are used without the article (turtle geometry, Logo, M.I.T.). It is not trivial to distinguish the latter group from the second one by purely syntactic means. The distinction between definite and indefinite noun syntagmata is serves a communicative function, as described by Geoffrey Leech (1975: 52f.). According to them the definite article is used when both we and the hearer know what is being talked about". They list under which this condition can occur as: "(A) "(B) "(C) "(D)
When When When When
not arbitrary, but and Jan Svartvik "we presume that the circumstances
identity has been established by an earlier mention [...]" identity is established by the postmodification [...] that follows the noun" the object or group of objects is the only one that exists or has existed" reference is made to an institution shared by the community".
Examples of these uses (with the exception of (D)) in our text are: (A) the keyboard (sentence 12), referring back to a computer keyboard (sentence 6); (B) the programming language Logo (sentence 2), where Logo establishes the identity; (C) the Massachusetts Institute of Technology (sentence 2), where it is understood that there is only one such institute; Furthermore, Leech and Svartvik mention generic use, "referring to what is general or typical for a whole class of objects" (Leech/Svartvik 1975: 53), thus for what we call the concept. An example of this use is found in the title of the book Turtle Geometry: The Computer as a Medium for Exploring Mathematics, where the computer refers to the class of computers (the concept 'computer') rather than to a single identifiable instance (entity). Finally, proper nouns, personal pronouns and demonstratives (this, that, these, etc.) are also said to have definite meanirig. From the above description we see that the definiteness or indefiniteness of noun syntagmata has consequences for the assumed known-ness of the concepts or entities referred to by them. This can be a useful clue for a text-understanding system (as it is for humans). It means that whenever a definite noun syntagma is encountered, it must be assumed to refer to a known concept or entity, which means it should be retrievable from the understander's knowledge (or the system's knowledge base). The principle that definite noun syntagmata signal known-ness may be fine as a principle, and humans seem to be able to handle it very well, but just knowing that something is supposed to be known does not tell us where it can be found. Even humans have trouble understanding references when the writer is not sufficiently careful in his phrasings. The sentence:
79
10. A pen can be mounted on the undercarriage, so that when the turtle is made to wander over a sheet of paper, it leaves a record of its path. from our sample text proves to be ambiguous with respect to the references of it (and its). Most people we showed this sentence to agreed that the reference was either to pen or to turtle, but between those two they found it hard to choose.
If encountering a definite noun syntagma simply means having to search through all of the available knowledge (about the text, about the situation, about the world), without any direction, we have not gained very much. The question is whether we can find some guidelines which can at least give us a preferred order in which to search through our knowledge.
2.3.4. Directing the search The order in which Leech and Svartvik describe the circumstances under which definite noun syntagmata are used may tell us something about the order in which the available knowledge can best be searched. The first two circumstances they mention (see 2.3.3.) have in common that they both involve a reference within the current text (what is called endophor by Hirst, 1981: 5, note 6) and make no reference to the vast amount of general knowledge not directly concerned with the text itself (called exophoric reference by Hirst). Obviously, if the search could be limited to just the knowledge available about the text, that would make a drastic reduction in processing possible. Let us first consider situation (B), which should be recognizable on the surface, because it involves a postmodification of the noun syntagma. In our sample text, the following noun syntagma have postmodification:
Figure 19 Definite noun syntagmata with postmodification Noun Syntagma the new way [of thinking [about geometry]] the programming language [Logo] the Massachusetts Institute [of Technology] the ideas [underlying turtle geometry] the surface [of a cathode-ray tube]
80
Sentence 1 2 2 5 11
The question is whether the postmodification is all that gives the noun syntagma its definiteness, or whether other factors play a role as well. Looking at the noun syntagmata out of context, all seem to be plausible definite references, that is, the references of the head nouns are sufficiently restricted by the postmodifications to see them as uniquely identifiable. In context, however, only two of the syntagmata in the table are without an identifiable referent elsewhere in the text. Those two "self-sufficient" syntagmata are the programming language Logo and the surface of a cathode-ray tube. The other three are linked in some way to their context. To begin with the syntagma the new way of thinking about geometry, this will generally be recognized as a reference to the sub-title (see 2.1.3.), because it is fairly close in meaning to a view of geometry from inside out. Apparently, even when a syntagma could function on its own, the preference is to link it to other elements in the text. Since such links help to improve text coherence, this observation supports the "maximum connectivity" rule postulated in section 2.2., which states that human readers prefer to link new statements to the text already processed as tightly as possible, i.e., they will try to find as many links as possible. The syntagma the ideas underlying turtle geometry makes this principle even more clearly visible. The postmodification underlying turtle geometry contains a literal repetition of turtle geometry, which is one of the main topics of the paragraph. Once this is recognized, there is no need to view the whole syntagma as new information. On the contrary, it is easier, and more logical, to see the ideas as an aspect of turtle geometry that was implicitly present, but is now made explicit. In a coherent text, this use of definite noun syntagmata, i.e., with an explicit link backward to a previously mentioned entity, is especially useful to talk about different aspects of a single entity, without confusing the reader about the identity of this entity. An interesting case is the noun syntagma the Massachusetts Institute of Technology, since this combines two kinds of definiteness. As we saw in section 2.1., it is first of all a name, which makes it uniquely identifiable, and justifies the definite article. But it is also a description. The general term institute is uniquely defined by the modifications Massachusetts and Technology. The combined effect of these two functions is that a reader familiar with M.I.T. will probably only recognize the name function, ignoring the descriptive part because it is already part of her/his internal knowledge, while a reader who has never heard of M.I.T. will use the descriptive function to have at least some idea what the name stands for. In 2.1.10. we showed that the translation of both name and description is not always possible, which, in this case, led to the somewhat artificial, but far from uncommon construction of a translation of the description, followed by the literal name, to make sure both functions were present in the translated text. As far as translation is concerned, Esperanto and English have a virtually identical relationship between postmodification and the use of the definite article, so all the translator has to do is make sure that the words s/he chooses have more or less the same kind of synonymity (or lack of it) with the words in the context that the words in 81
the English original refer to. When a noun syntagma is analyzed as "self-sufficient", and does not refer back to a previously mentioned concept or entity, it may be necessary to search for words that are different enough from the words in the context to avoid an accidental synonymity, since there is a serious risk that a reader would interpret that synonymity as a backward reference, even though none was intended. This means that the simple copying of words and structures that seems to have taken place in the Esperanto translation should not be taken for granted, and that a translator must always consider the effect of her/his choices in the larger context, even when on the local level the translation seems to be unproblematic. After all, the way words overlap in meaning is not the same across languages, and two words that have nothing in common in one language may well have an additional common meaning in another.
2.3.5. Definiteness without modification What we have seen so far is that noun syntagmata with postmodification can function independently of their context, i.e., can be taken as new information with built-in identification. When, however, there is synonymity between the words of the syntagma and some previously used term, an interpretation in which the syntagma refers back to that term is preferred. The rule seems to be: when possible, interpret information as being linked to earlier statements. This means that even when noun syntagmata are postmodified, this does not in itself provide a reason to stop the search for a referent. We still need some way to direct the search for a suitable referent. Let us examine the sample text again to see if we can find any guiding principles behind the reference mechanism there. The concept in figure 18 which is referred to most often is number 4, there listed as being referred to for the first time with the original turtle. That this should be the first mention of a concept is a bit strange, since it is a definite noun syntagma, without any postmodification. The definite article can be explained if we realize that the reference is to the turtle that has been mentioned several times in the preceding paragraph (though it remained unexplained there). The modifier original serves here to set off a particular entity of the generically used concept 'turtle'. In other words, the previous paragraph made mention of the concept 'turtle' in a general, non-specific way. It was the kind of object named turtle that was referred to, not a single turtle in particular.
2.3.6. Identity of sense or reference The way original separates a particular entity from the generically used turtle indicates that we must at least recognize two different kinds of reference. Hirst calls them "identity of sense" and "identity of reference" (Hirst 1981: 28). We speak of identity of reference or, in our terms, pragmatic reference identity, when two terms refer to the same entity. Identity of sense, i.e. mere semantic reference identity, on the other 82
hand, occurs when two terms refer to separate instances of the same concept i.e., two objects of the same class or "of a similar description" (Hirst 1981: 28), see figure 20. It will be important to keep the two kinds of reference apart whenever we can, as they have different consequences for the coherence within the text.
Figure 20
Pragmatic reference identity She threw the ball and I caught it. Here it denotes the same 'ball' (entity), that she threw. Semantic reference identity I spent all last month's salary; it will be more this month. Here it is another 'salary' (entity) than last month's salary.
The syntagma the original turtle can now be identified as a semantic referent. A sophisticated natural-language understanding system should be able to infer this from the word original which (among other things) has the function of referring to a particular instance of a concept, the first of its class. The Esperanto translation shows a slightly different interpretation. Instead of instantiating a reference to the single original turtle, it maintains the class reference (which is still a semantic reference, since there is no entity to refer to). The structure: 6. Origine, la testudo estis mekanikajo [...] 'originally, the turtle was a-mechanical-device [...]' indicates two things: a) the construction with esti 'to be' shows that the word testudo should not be taken in its literal interpretation (i.e., not taken to refer to the 'animal' it normally refers to, but to 'a mechanical device'); and b) the class of testudo has a history, and was initially different from what it is now. This interpretation, though not really the same as the English original, does not alter the function of the syntagma very much. Both the original turtle and origine, la testudo make clear that instances of the class turtle may not always have been the same. Naturally, if a translator does not follow the original in introducing a particular instance of a class, this may have serious consequences for the rest of the text. After all, a particular instance is not the same as the class it is an instance of, and it may well have features not normally associated with its class or lack certain typical ones. That the translator can get away
83
with it in this case is largely caused by a slight inconsistency in the original text, where the same original turtle is explicitly introduced a second time, this time as the first such creature in sentence 7. The two syntagmata of paragraph 2 in the original are a slightly "strained" combination of class and instance reference. First, a particular turtle is introduced as the original, and described as having certain qualities. It was a mechanical device and a small wheeled vehicle whose movements could be controlled by instructions typed on a computer keyboard. Since the sentence opens with an instantiation, the descriptions appears to hold for the single original turtle only. The second sentence, however, opens with the first such creature, where the word such clearly indicates a reference to a class: such creature is equivalent to 'a creature of that type and/or description'. Note that creature normally refers to a +ANIMATE object, which the original turtle is now known to be not (it is a mechanical device). This semantic inconsistency is probably meant as a pun on the (by now) double meaning of the word turtle. A machine translation system may well be confused by this kind of "inconsistent" reference, and fail to establish the link between a mechanical device: a small wheeled vehicle and the first such creature, even though the syntactic evidence (the way such points back to the preceding clause) would make it a likely choice. It may not be easy to solve this kind of "fuzzy" references.
This situation makes it necessary to re-evaluate the interpretation of the previous sentence. Instead of a single original turtle, the writer apparently meant a subclass of original turtles. The second sentence then truly introduces the actual first instance of the class of original turtles. After this, the following sentence continues with a detailed description of that first turtle, a description which holds for the individual only, and not automatically for the class to which it belongs. What the opening of paragraph 2 shows is first of all that texts need not always be perfectly consistent. Writers are sometimes less careful than they could be in the way they phrase references. The translator of our text has spotted the inconsistency, and "corrected" it in Esperanto, by turning the first instantiation into a more appropriate sub-class introduction, followed by the actual instantiation in the second sentence. This can be done, however, only after the second sentence has been read and its consequences evaluated. In a system that works on a purely sentence-by-sentence basis without look-ahead facilities, the first sentence would have been translated literally (the construction original turtle can be copied faithfully in Esperanto, with the same effect) and the inconsistency would have remained. But this can be dangerous! We can not know beforehand that such an inconsistency is resolved as easily in the target language as it is, apparently, in the source language here. It could well be that another language does not allow a backward reference to an instance to function as a class reference as well. A translation system must evaluate its own translation (either while translating, or after the "first draft" has been finished) to see if local choices do not lead to global incompatibilities. That the opening sentence of paragraph 2 was indeed meant to establish a sub-class of turtles, can be seen by the syntagma a mechanical turtle in sentence 9, which clearly refers back both to the original turtle and its description as a mechanical device. The 84
indefinite article here leaves no doubt that the reference is to the class, not the instance.
2.3.7. Types of definite reference A somewhat curious definite noun syntagma is the undercarriage in sentence 10. How can this be a definite noun syntagma, when there is no mention of anything even remotely synonymous to undercarriage in the text? Or is undercarriage something to be understood as part of our common knowledge? The answer is twofold: a) yes, we need common knowledge to correctly solve the reference; and b) the definite article does indeed indicate a reference to something inside the text. What most human readers will realize when they read sentence 10 is that undercarriage is a part of the turtle mentioned both in the previous sentence, and in this sentence itself. It appears that a definite reference does not necessarily mean that the referee and its referent are absolutely identical (either semantically or pragmatically), but that there are several relations that make definite reference possible. Hirst lists them as: "part of', "subset o f ' and "aspect or attribute o f (Hirst 1981: 27). Recognizing such semantic relations is far from obvious, and may involve quite some world knowledge, and a capacity for inferencing that possibly surpasses the capacity of current machine translation systems. The undercarriage example, for instance, requires first of all some knowledge about the general meaning of the word. If we take a dictionary entry to be representative of such knowledge, we see that undercarriage can be defined as: "a supporting framework (e.g. of a motor vehicle)". We also know that the theme (that what is talked about, see 2.4.) of the preceding sentences was turtle, described as a mechanical device: a small wheeled vehicle. Knowing this, we can make the link from a mechanical turtle in the previous sentence to the undercarriage-, paraphrased as: the undercarriage = the supporting framework of a mechanical turtle, and the definite reference is explained as a part of relation. It is interesting to see how the Esperanto translation makes no mention of an undercarriage at all. The clause: 10. Sub la testudo eblas munti skribilon [...] 'under the turtle it-is-possible to-mount a-pen [...]' replaces the part-of reference with an explicit description of position, with the turtle also explicitly given instead of being only implicitly referred to. The effect of this change is a simplification with respect to the original, with some loss of information. Not only does mention of the undercarriage in the original require relatively complex reasoning in order to be properly understood; one can furthermore ask whether the introduction of a new concept, the undercarriage in question, is really necessary. The concept is not referred to elsewhere in the text, and, in fact, seems to serve only to
85
indicate the position of the pen. That is in any case the interpretation of the translator, who decided to retain only the position function. He could have translated the syntagma literally:
Sub la subekipajo eblas munti skribilon 'under the undercarriage it-is-possible to-mount a-pen' leaving it to the Esperanto receiver to work out the intended reference (and the function of that reference).
The other relations in Hirst's list require similar or even more complicated lines of reasoning, as some of his examples indicate (Hirst 1981: 27). It is this kind of complexity that caused Hirst to state that: "any complete NLU system will need just about the full set of human intellectual abilities to succeed" (Hirst 1981: 49, note 28; NLU = natural-language understanding). Since the "full set of human abilities" seems far beyond our reach at the moment, the question becomes whether we truly need the full set for the translation process, or can a "subset" suffice? In other words, do we need "full understanding" to make a translation or can we make do with "partial understanding"? (See also 3.5.2.) Looking at human translators, we find arguments for both positions. Professional translators and translation text books, seem to agree that the best way to translate a text is first of all to make sure that one fully understands it, not only its message, but also its cultural setting, style and mood. Then comes the task of translation, mostly sentence-by-sentence, but with the local decisions guided by global understanding. And finally the resulting target language text must be thoroughly checked and analyzed to see if the result is close enough to the original; that is, the target language text must be completely understood as well. On the other hand, interpreters, who (as opposed to translators) have to work on a sentence-by-sentence basis without being able to wait for the subsequent text, must make do with partial understanding, local clues and default translations where information is lacking. Though the quality of such an interpreted translation will undoubtedly be less than the quality of a carefully prepared full text translation, it is usually enough to get the message across. Since we cannot, at this moment, if ever, match human understanding with our machine translation systems, we must find ways of coping with a more superficial "understanding". This may mean a reduction of quality, but that does not mean the resulting output will be useless.
We know that much of the information humans distill from a text is only there implicitly. That is, it is the vast amount of knowledge about the world that enables humans to "reconstruct" those parts of the message the writer felt unnecessary to phrase explicitly. To give a simple example, anybody reading the sentence: After testing the water with his toes, he jumped into the pool.
86
can reconstruct an almost endless list of "implied" information. For instance: testing the water with his toes, implies that the he mentioned in the sentence wanted to know the temperature of the water; the water is most likely in the pool; he jumped, so the temperature must have been all right; he is probably going to swim in the pool, that is, the pool will be a swimming pool, etc., etc. We can think of a complete scenario in which this little scene supposedly takes place, without that scenario ever being given explicitly. For an adequate translation of that sentence, it seems unnecessary to first make explicit all this "surrounding" information, to then find target language words for just the amount of information that the original contains, in other words, throwing away all the extra information again. Is it not more likely that, providing the difference in the cultural settings of source language and target language are not too different, a more direct translation is possible; one that concentrates more on the explicit information? The current approach in the DLT system assumes that it is possible to translate from one human language into another without making explicit all implied knowledge (a task that is impossible anyhow). The basis for this assumption is that it seems possible to study two languages very carefully, comparing the way they use syntactic and lexical signals to convey meaning, and construct a contrastive grammar of the two, which not only makes many translation decisions possible on external, explicit grounds, but also clearly indicates where external information is insufficient and must be supported by deeper, semantic and pragmatic information. If such a contrastive grammar is constructed carefully, it could lead to a translation system that uses implied information in a limited, directed way, and often translates without any need to simulate really "total understanding", leaving the understanding to the human endusers, who are much better equipped for it.
2.3.8. Searching for reference It is now time to postulate some rules that can give some idea how reference is achieved by the variety of syntagmata a writer can use throughout a text. The keywords in such rules will be synonymity, recency and focusing. We may assume that a text will contain references to only a limited number of concepts and entities. A minimal requirement for any system that must deal with reference is that it is able to maintain a list of all concepts and all entities that have been brought up in the text. Without any selectional restrictions, this would mean that any new syntagma is compared with all syntagmata that occurred so far, to see if the new syntagma can be reference-identical with any of the old ones. Testing for synonymity, however, is not enough. A distinction must be made between definite reference and indefinite reference, as well as between semantic and pragmatic reference identity. To begin with the definite-indefinite distinction, any definite noun syntagma must either be taken to refer to an element elsewhere in the text, or else refer to some information assumed to be available to the reader as part of her/his common knowledge. In its simplest form, this rule implies searching through all 87
available concepts in the text for the purpose of finding a match and, when one is not found, searching through the stock of common world knowledge for the same purpose. A flexible natural-language understanding system must, more or less like humans, be capable of handling a definite reference, even if no reference can be found either in the text or in the world knowledge base. Such a reference should probably be marked as "unsolved", and treated as a new concept, the exact nature of which is unknown, but about which information can be gleaned from the text. In this way a truly intelligent system could even learn enough about the unknown concept to add it to its internal knowledge base, by collecting everything that is said about it in the text and storing this as its internal description.
Indefinite reference either introduces new concepts, not mentioned before in the text, or refer to concepts as opposed to entities. Distinguishing the two is primarily a matter of deep semantic understanding, involving consistency checking and inferencing with respect to the consequences of a certain interpretation. Finally, we must distinguish between semantic and pragmatic reference identity. This distinction only holds for definite references, and is not very easy to detect. It is largely based on the same consistency principles that guide indefinite references, and require a great amount of understanding to be solved correctly. If we initially ignore the problems of consistency checking and semantic inferencing (it is not at all clear how these functions can be modelled in a machine translation system) we now know what we must look for, but not yet where to look for it. The main factor in searching for reference seems to be the prominence of certain concepts in the text over others. Prominence is a complicated function which can be decomposed into two separate aspects: recency and focusing. In a linear text, given the memory model mentioned in 2.2.7., the most prominent concepts (i.e., the ones most easily retrieved) are those that were mentioned in the previous statement. A reader will therefore first look for references in that preceding statement. If we extend this model, such a search could be extended backward, from statement to statement. Although a text is linear, information structure is never truly linear, but shows clusters of information, centred round certain concepts, usually (but not absolutely) associated with the paragraph. Once a new paragraph has begun, the central concepts of the previous paragraph are more accessible than the other concepts that were mentioned there. Apparently, to keep the information within manageable bounds, previous information is summarized, abstracted in a way that "hides" the details, leaving only the central elements in view. Searching backward for reference would involve the central concepts only, ignoring any "hidden" details. That this summarizing actually takes place, with information hiding and all, can be seen by the fact that backward references to non-central concepts of previous paragraphs are introduced explicitly, using either special syntactic constructions (the aforementioned X, the X mentioned above) or by means of the indefinite reference constructions also used when a completely new concept is introduced. In this last
88
case, the reference is kept implicit, and may not even be detected by the reader without hindering his understanding of the text as a whole.
To summarize a cluster of information, certain elements must be recognizable as more prominent than others, more central to the "main line" of the text, than others. Such prominence can first of all be measured by frequency: the more a certain concept is mentioned (explicitly, or by definite reference) the more likely it is to be central to the line of argument. There are indeed natural-language processing systems that make use of (relatively) simple word counts to derive a rough summary of a text. By keeping track of the relative frequency of concepts a system could maintain a sort of "prominence table" it could use to control the search for reference. Using such a table, the most prominent (= most frequent) concepts would be considered first, working downward through the table as long as no suitable referent has been found. Next to frequency, prominence of certain concepts can be created by explicit signals, syntactic and/or lexical. Much of what we said about pronoun resolution in section 2.2. applies here. The explicit manipulation of prominence is often referred to as focusing: directing the receiver's attention to some aspect of the text. The basic clue for focusing is the use of unusual or marked patterns. The effect of a marked pattern is twofold: it first of all relies on the receiver's ability to recognize the pattern as marked, i.e., the receiver must see that the way the information is phrased is not the most neutral, .default way, but has undergone some formal change(s). Once the receiver has become aware of this, s/he has to able to recognize the reason for the formal change, in other words, s/he has to find the message that is implicitly contained in the marked pattern. For an machine translation system two things will therefore be important: a very detailed grammar of the source language, with a carefully constructed list of patterns, probably scaled from totally unmarked to highly marked; and a contrastive grammar of the source language and target language that relates the effects of the marked pattern in the source language to marked pattern representations in the target language. This is especially important for those cases where the (necessarily incomplete, see above) semantic capabilities of the system fail to "understand" enough to make a fully meaning-based translation possible. In such cases it is important to have "default" source-to-target language patterns available to fall back on. The DLT system, being a truly interlingual system, with a human language as intermediate language, will benefit from this arrangement in several ways. First of all, the contrastive grammars that are needed all have Esperanto as one of the two languages. This means that the patterns and signals of Esperanto, when carefully studied and analyzed, form the central framework, which not only all contrastive grammars have in common, but which can also provide the guidelines for the source language grammars that chart the patterns in each of the source languages. In other words, with the signalling possibilities of Esperanto as a starting point, all the languages to which the system will be applied can be analyzed, providing a common basis, even when there are individual differences.
89
The second important benefit from this arrangement is that, since all languages interface with Esperanto only, the source language grammars can be written in such a way that the semantic problems, a large part of which are closely connected with the kind of patterning we have discussed in this chapter, can be transferred into Esperanto, to be solved there, which means that only one semantic engine can take care of the semantic problems of all connected languages. To return to the search for referents, in general, the first candidate as referent is the either the theme of the previous clause, or the part that is in some way marked as focus. As long as we stay within a single paragraph, this search can be extended backwards as long as needed to find an appropriate (that is synonymous or related with one of Hirst's relations) syntagma, with the preference caused by focusing gradually losing its precedence (see Alshawi 1987). When we cross a paragraph boundary, however, local focusing looses its importance altogether, and prominence is now given to those syntagmata (or concepts in the semantic representation) that are central to the paragraph's argument, as can be deduced from their frequency and position in the thematic pattern (see next chapter). Semantic processing must be used to measure synonymity and/or other relations with the current syntagma, but the most difficult (and perhaps as yet impossible) task will be to measure the consistency of this candidate as a referent. There are two kinds of consistency: internal consistency, which means that the statements in the text do not in some way contradict each other or follow each other in an illogical way, and external consistency, which means that the combined statements about a single entity are consistent with our world knowledge about the concept. Readers often seem to use this kind of reasoning (see 2.2., on the referent of them in sentence 5) but it is not at all clear how this can be measured (in 2.6. we will propose a partial solution to this problem).
2.3.9. Conclusion Reference identity - and other reference relations that may be looked upon as derived from identity - among content words (syntagmata) in a text is an essential element of text coherence. Once referential links are established, a good deal of the semanticpragmatic ambiguity detected in sentences out of context can be resolved, since more than a local context becomes available and more textual and extratextual knowledge can be related to the choices in question. Reference identity is difficult to detect. The main guideline we suggest to follow is similar to the one given for deictic reference in 2.2.: first a syntactic preselection, then a semantic-pragmatic weighting of the candidates and, only in the case of an unsatisfactory outcome of the latter procedure, default rules based on focus and similar phenomena.
90
It remains to be discussed in the chapters to follow how deep the semantic-pragmatic analysis of context should go. Techniques which can achieve "full understanding" seem to be out of reach for the time being and are very likely impossible, so it appears highly advisable to stick to the form characteristics of the two languages involved as much as possible, in order to make the syntactic preselection step as powerful as possible.
91
2.4. Communicative functions: theme and rheme
The phenomena discussed so far, though involved in the establishment of text coherence, are in a certain sense of a fragmentary nature. The connections established for instance by reference identity constitute a series of networks of interrelated words, but they need not necessarily link up the whole text. Moreover, they connect mainly nouns and pronouns and have access to other words at best indirectly, via the syntactic dependencies among nouns and the other words in the sentences. A more complete view of texts is found in the communicative approach of the Prague School, called aktualni cleneni vety or functional sentence perspective, but is perhaps still better known under the key words theme and rheme (see 1.2.). The basic idea of a theme-rheme distinction is mainly linked to the work of Vilem Mathesius (1929, 1939 etc.), the terms Thema and Rhema are taken from related work by Hermann Ammann (1928). These German terms have spread most widely over various languages. They were taken over into English as theme and rheme, but in parallel the terms topic and comment came into use, especially in the Generative School. Elisabeth Giilich and Wolfgang Raible (1977) give a concise account for the dual definition of theme and rheme. This twofold character of the suggested distinction has not been clearly enough understood by all those who tried to apply it, which may be the source of the considerable diversification the original definition has undergone in various authors' works. Giilich and Raible (1977: 62) point out, however, that this duality has been inherent to the definition from the beginning: Seit Mathesius werden Thema und Rhema unter zwei verschiedenen Aspekten definiert: 1. einem satzbezogenen Aspekt, der die Funktion von Thema und Rhema in einer Äußerung betrifft 'Thema' ist danach das der Mitteilung Zugrundeliegende, d.h. das, worüber gesprochen wird; 'Rhema' ist der Mitteilungskem, d.h. das, was über das Thema ausgesagt wird [...]; 2. einem kontextbezogenen Aspekt, der die Beziehung der Äußerung zum Kontext betrifft. Danach wird unter 'Thema' die ableitbare Information verstanden, under 'Rhema' die unbekannte, neue, nicht ableitbare Information [...].
Gülich and Raible make two interesting points in this respect. Firstly, they say that the sentence-related aspect of the theme-rheme distinction pretty well coincides with the traditional definition of subject and predicate. It might be added that language philosophers such as Plato called the predicate rhema in Classical Greek. Secondly, Giilich and Raible emphasise that since Mathesius's times the context-related aspect
92
has always been related to both the linguistic and the extralinguistic context (Gulich/Raible 1977: 62f„ also note 9). Definitions of theme and rheme occur in many works. But not every author presents the two aspects of the theme-rheme distinction as clearly as Giilich and Raible. In the following quotation from Peter Newmark (1981: 176), for example, both aspects are present but the view on them seems a bit obscured, perhaps also due to the fact that in English the word subject (as well as object, topic etc.) is both a grammatical term and an everyday word: Theme and rheme. Theme states the subject of discourse, which is normally referred to in, or logically consequential upon, the previous utterance (sentence or paragraph). Rheme is the fresh element, the lexical predicate, which offers information about theme. Within the structure of a sentence, these lexical terms are sometimes referred to as topic and comment. "Theme plus rheme" need not be a surface grammar sequence, and its identification will depend on a wider context.
Newmark makes a difference between terms for theme and rheme in general and terms for the sentence-related aspect of the distinction. Another pair of terms is used in this regard: Halliday (1967) used theme and rheme at the sentence level and given versus new for the context-related aspect (Gulich/Raible 1977: 62f., note 10). A somewhat different, but interesting, division is given by Simon Dik. In his Functional Grammar - a model which in its core has a dependency semantics described in terms of predications (Somers 1987: 95; Schubert 1987b: 198) - Dik distinguishes "internal" from "external pragmatic functions" (Dik 1978: 19): Pour pragmatic functions are distinguished, two external and two internal to the predication proper. The external pragmatic functions are: Theme: The Theme specifies the universe of discourse with respect to which the subsequent predication is presented as relevant Tail: The Tail presents, as an 'afterthought* to the predication, information meant to clarify or modify it. and the internal ones: Topic:
The Topic presents the entity 'about* which the predication predicates something in the given setting. Focus: The Focus presents what is relatively the most important or salient information in the given setting.
We thus have the terms theme - rheme topic — comment given - new theme - tail topic - focus
93
and more variants can be found in the abundant text-linguistic literature. We shall in the present discussion stick to theme and rheme. As for text coherence, especially the second, context-related aspect as worded by Giilich and Raible is relevant. However, we also use Dik's term tail where appropriate, to identify "afterthoughts" not strictly belonging to the rheme. Focus, which we applied somewhat loosely already in 2.2.7., will henceforth be used, still a bit intuitively, for that part of the (possibly complex) rheme which receives prime attention. This is often the internal governor in the dependency tree of the rheme. 2.4.1. Thematic progression All texts show thematic progression: they move from theme to rheme in recognizable patterns. Frantisek Danes identifies at least five different patterns: linear progression, progression with a constant theme, progression with a derived theme, development of a bifurcated rheme and progression with a discontinuous theme (Danes 1968, 1970; summarised by Giilich and Raible, 1977: 75ff.). Four of them are treated in more detail below. Usually, a text shows a mixture of all five patterns. The patterns can occur on different levels: what is a linear progression on one level may very well be embedded in a constant pattern one level higher.
a) Linear progression A linear progression is recognized by the steady progression from theme to rheme:
Figure 21
T1
»• R1 T2 (= Rl)
»> R2 T3 (= R2)
> R3
What is rheme in clause 1 is taken over as theme in clause 2, etc. The following fragment is a clear example of linear progression (the sample text is taken from W. F. Clocksin / C. S. Mellish [1981]: Programming in Prolog. Berlin: Springer, p. 24): The third kind of term with which Prolog programs are written is the structure. A structure is a single object which consists of a collection of other objects, called components. The components are grouped together into a single structure for convenience in handling them.
94
The progression can be charted as:
Figure 22
T1 ('The third kind of term with which Prolog programs are written)
Ï
(is the structure)
T2 (R2 (is a single object which consists of a collection of other objects, called components) T3 (• R5 (is useful for writing operating systems) T6 (it = 15) • R6 (has been used equally well to write numerical, text-processing, and data-base programs)
This pattern is especially interesting for its ability to create "composite concepts" from a set of loosely connected statements, by means of which a set of concepts is linked to the central, invariable, theme.
96
c) Progression with derived theme Progression with a derived theme pattern is very similar to the constant theme pattern, with the difference that the constant element is not explicitly given. The meta-theme either is given in a preceding statement, or must be inferred from the common element of the sub-themes that are discussed:
Figure 25
TO (meta-theme) T1
T2
T3
This pattern is often used to discuss a number of elements that are part of or collectively implied by the meta-theme (sample text from Clocksin/Mellish op. cit. p. 22): Characters are divided into four categories as follows: ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789 + -*/\A=" :.?@#$& The first row consists of upper-case letters. The second row consists of lower-case letters. The third row consists of digits. The fourth row consists of symbols.
97
The thematic progression of this example is:
Figure 26
TO (meta-theme = four rows) T1 {the first row = 1 of 4) — > R1 (consists of upper-case letters) T2 (the second row = 2 of 4) — > R2 (consists of lower-case letters) T3 (the third row = 3 of 4) — * R3 (consists of digits) T4 (the fourth row = 4 of 4) ^——R4 (consists of symbols)
This pattern can often be found in technical descriptions and definitions, especially in cases where items have to classified. An introductory sentence will give the category or group decription, followed by a series of descriptions for the individuals or subclasses.
d) Development of a bifurcated rheme This patterns occurs when a single theme is used as introduction of two contrasting rhemes. Each of the rhemes provides the theme for one or more following statements (sample text from Clocksin/Mellish op. cit. p. 43): Lists are manipulated by splitting them up into a head and a tail. The head of a list is the first argument of the "." functor that is used to construct lists. [...] The tail of a list is the second argument of the "." functor.
98
The thematic progression is:
Figure 27
T1 (Lists) I——» R1 = are manipulated by splitting them up into a head and a tail Rl' = head I T2 = head I
*• R2 = is the first argument of the "."functor that is used to construct lists.
= R1" = tail I T3 = tail I
»• R3 = is the second argument of the "." functor.
As can be seen, bifurcated rheme is particularly suited to compare or contrast two (or occasionally more) concepts. As such, this pattern will often be part of the kind of rhetorical patterns discussed in section 2.6.
2.4.2. Analyzing thematic progression Analyzing thematic progression patterns can be of great help in finding pronominal referents and tracing coreferential concepts. It is not easy, however, to lay down in rules how things like theme, rheme, tail and focus can be recognized. There is even a kind of circularity involved, as Hirst (1981: 66) points out: "to determine the focus one must resolve the references, and to resolve the references, one must know the focus". Most likely, thematic patterning is brought about by a variety of linguistic phenomena, none of which can be taken as a foolproof indication of theme or rheme, but which, taken together, serve to place dominance or preference on certain elements while downtoning others. In the rest of this chapter we will go through the sample text again, this time concentrating on thematic progression only. What we will try to do is to find evidence of thematic progression in the sample text, study the mechanisms that make the patterns recognizable, and establish what apparatus is (minimally) needed to automate that recognition process. Special emphasis in this discussion will be given to those 99
cases where thematic patterning directly influences the translation process, to find out: a) how essential the thematic patterning is for a proper translation; b) how much effort must be spent on analyzing it; and c) whether we can set up a working set of algorithms that are feasible within a sentence-by-sentence incremental translation system. The sub-title: Turning turtle gives one a view of geometry from the inside out Esperanto translation: Testudigo permesas rigardi la geometrion elinterne 'turning-turtle allows to-view geometry from-the-inside-out' The proposed analysis of this clause is: THEME RHEME TAIL FOCUS
turning turtle gives one a view of geometry from the inside out = none = a view of geometry from the inside out -
=
The main reason to assign the function of theme to turning turtle is that of default reasoning. The clause looks like a normal active construction, which means there is no apparent change in word order, such as is often used to change the default prominence. Unless there is evidence to the contrary, the theme is in written English usually assigned to the first part of a clause (Dik 1978: 131). Information from outside the current clause can counter-act this default assignment, but since we have no "outside", there is no reason not to choose the default theme here. The rheme is in English by default assigned to the finite verb and its dependents, except the one in the theme. Again there are no syntactic indications not to apply the default assignment. The function of tail is (in terms of communicative function) defined as an "afterthought". This implies that the tail is not really a part of the predication proper. On the surface level, "the Tail will characteristically be set off from the predication by means of a break in the intonation pattern, which we symbolise by means of a comma" (Dik 1978: 153; although Dik's wordings are quite general, things like punctuation marks are of course language-specific; indeed he is reasoning about English). In general, the function of tail is restricted to the syntactically optional elements in the clause, which in English are characterised among other things by their directly modifying the main verb and their relative freedom of position. Since there is no such element in this clause (see 2.5. for a detailed analysis of this clause), we can leave the function of tail unfilled here. 100
As for the focus, in many cases syntactic or lexical marking is used to put extra emphasis on a particular part of the rheme. When such extra emphasis is absent, however, focus falls either on the internal governor (syntactic marking) or on the right-hand side of the rheme (sequential ordering). The latter is the case here. Though more can be said about the function of focus (see below) let us assume for the moment that the focus is assigned to the complete noun syntagma. Though the analysis of the sub-tide is not very surprising and may seem uninteresting, it is not without importance. We already saw that titles and sub-titles are often meant to give the reader a (rough) idea of what to expect from the text (see 2.1.). In other words, the title can determine the general lines along which the reader will approach the text. It should be noted that the analysis of the subtitle fits in so well with the general procedure we are applying, because it has sentence form, whereas tides often are mere syntagmata, in which case they may belong completely to the rheme. If we assume that the sub-tide is characteristic for the overall pattern of the text, we may assume that: a) the text is about something called turning turtle; b) the text will show (i.e., provide information) that turning turtle results in a view of geometry from the inside out. We can translate this into expectations: given the theme turning turtle, we can expect to be told what a view of geometry from inside out is, how turning turtle results in this view, and, possibly, why this is interesting in the general context given as computer recreations. Though these expectation patterns are only tentative, being aware of them can help to explain the flow of the text, in particular the succession of themes and rhemes across paragraphs. In the rest of this section, we analyse sentences 1 to 5 with regard to their theme-rheme structure. For each clause we give a table with the English original, its Esperanto translation and indications of theme, rheme, focus and tail. In a second table we show the text up to the clause in question with the deictic and referential links marked.
101
Sentence 1: 1. The new way of thinking about geometry has come to be known as "turtle geometry". Esperanto translation: 1. La nova aliro al geometrio diskonatigis 'the new approach to geometry has-come-to-be-known sub la nomo "testuda geometrio" (turtle geometry). under the name "turtle geometry"'
THEME 1 RHEME 1 FOCUS 1 TAIL 1
= the new way of thinking about geometry = has come to be known as "turtle geometry" = "turtle geometry" = none
Reference links:
Figure 28
COMPUTER RECREATIONS
Turning turtle gives one a view of geometry from the inside out
t
1
by Brian Hayes
n
i
1. The new way of thinking about geometry || has come to be known as "turtle geometry"
2.4.3. Given and new The first clause of the text has very regular thematic pattern: it is an unmarked active sentence with the subject as theme, the verb with its non-thematic dependents as rheme, no tail, and the focus on the right-hand noun syntagma. What is more interesting in this first clause of the actual text, is the way it uses information that is given in the sub-title to form an opening statement that opens the text proper. 102
Most of the elements in the clause come directly from the sub-title. The syntagma way of thinking about geometry is virtually synonymous with a view of geometry. The modification new more or less follows from gives, and signals the same result function that was present in the sub-title. That the theme of clause 1 is firmly established as given information, is signalled by its use of the definite article the. The default use of the definite article in English is to indicate unique identification. This implies that the object so indicated is either generally known (i.e., part of the assumed common knowledge) or has been explicidy introduced in the text. The definite article, as it were, "forces" the reader to search for the unique referent that is implied, in the preceding text and in her/his available knowledge. We will later discuss this mechanism in more detail, but let us assume for the moment that there is a preference mechanism that first searches backward through the preceding text, and when it fails to find a suitable referent, assumes the reference to be to common knowledge. A "suitable" referent is one that is either identical to the current object, or semantically related in some way. Figure 29 gives a number of semantic relations that can link a definite noun to its referent. In this case, we saw that a view of geometry is a near synonym of a way of thinking about geometry, and we can safely assume that we have found the appropriate reference (and see below for more reasons for this choice).
Figure 29 Some relations that allow definite reference
PART OF ATTRIBUTE OF SUBSET OF NEAR SYNONYM OF IDENTICAL TO
Though most of the words in sentence 1 are synonymous with or identical to the words in the sub-title, there is also new information (otherwise this clause would be nothing but a re-statement of the sub-title, which, however, would not be a too unexpectable case, since careful writers often do not include titles as necessary elements in their text). The new information is two-fold. The first part is the verb construction has come to be known as. The theme is said to have a name. The second part is that name, "turtle geometry". Though both words are borrowed directly from the sub-title, they provide new information in the way they are combined. In particular, the general term geometry is now said to be modified by the word turtle.
103
An interesting aspect of this combination is that it combines two words that have no known relationship with each other. That is, the standard meaning of both words come from completely different semantic fields. Though the noun cluster indicates there must be some relation between the two words, no explicit relation is given. This lack of explanation strengthens the focus function of turtle geometry, bringing it foremost in the reader's attention as the most likely continuation point for the text. The thematic progression from sub-title to clause 1 is that of linear progression: the focused element of the sub-title's rheme becomes - through the use of a nearsynonymous syntagma - the theme of clause 1. The focus placement in the current clause, together with the heightened expectation caused by an underspecified and unexplained combination of words, as discussed above, makes it very likely that the next clause will take over the current focus as its theme, in other words: we will expect the linear progression to continue.
Sentence 2: 2. It is closely connected with the programming language Logo [...] Esperanto translation: 2. 6i estas intime ligita al la programlingvo Logo, [...] 'it is closely connected with the programming-language Logo [...]'
THEME 2
= it
RHEME 2
= is closely connected with the programming language Logo
TAIL 2
= none
FOCUS 2
= the programming language Logo
104
Reference links:
Figure 30
COMPUTER RECREATIONS
Turning turtle gives one a view of geometry from the inside out
t
1
by Brian Hayes
n
i
1. The new way of thinking about geometry || has come to be known as "turtle geometry".
J 2. It || is closely connected with the programming language Logo,
The clause starts with the pronoun it, which, to be properly translated and to contribute in an appropriate way to the contextual meaning restriction of other words, must first be linked to a referent from the preceding clauses (or be identified as referent-less, such as placeholders are). There are two things that strongly direct a search for the intended referent: a) the focus of the preceding clause is a preferred referent when no other cues are found; and b) we noted that clause 1 "expects" a linear progression, i.e., that the focus of clause 1 is likely to be the theme of clause 2. Since it does indeed appear in theme position, both a) and b) seem to make the choice an obvious one. Since the verb construction is closely connected with provides no counter-arguments against the choice (one can connect almost anything with anything else, especially when connect is used in an abstract sense), the choice is very strong. The use of it in this clause is a good example how thematic progression and expectations set up in earlier clauses can influence the choice of words, thus strengthening the coherence within the text. The expectations of clause 1 for the theme of clause 2 result in a preferred default pattern. Unless the writer decides not to fulfil this default pattern, there is no need to use an explicit theme in clause 2, since a simple pronoun will automatically be interpreted correctly. In fact, when such strong defaults exist, using a pronoun is preferable over a more explicit phrasing. Apart from avoiding repetition, by the very fact that it relies on the default pattern for its interpretation, it confirms that default pattern, thus strengthening the relation between clauses.
105
2.4.4. Focusing rules Which element in a clause receives the focus has been the object of study of many scholars. One of them is Candace Sidner (1983), who gives a somewhat detailed mechanism for preselecting syntagmata as candidate focuses in English. She uses it to solve anaphoric reference. Her candidate selection is based on a set of rules which at the same time identify candidates in a probability order. The rules are based on syntactic functions and semantic deep cases, to be assigned to the syntagmata prior to the focus search. Sidner (1983: 287) gives these rules: Choose an expected focus as: 1. The subject of the sentence if the sentence is an is-a or i/^re-insertion sentence. [...] 2. The first member of the default expected focus list (DEF list), computed from the thematic relations of the verb, as follows: Order the set of phrases in the sentence using the following preference schema: - theme unless the theme is a verb complement in which case the theme from the complement is used - all other thematic positions with the agent last - the verb phrase [...]
Note that in this quotation theme is not what we call theme in this study, but a "thematic position", i.e. a semantic deep case, approximately what is traditionally called patient. These rules are just preference rules for English, and there are several ways to signal focus on normally un-preferred elements by moving them to preferred positions. Sidner (1983: 284) lists the following constructions: (pseudo-cleft agent) (pseudo-cleft object) (cleft agent) (cleft object) (agent) (object)
The one who ate the rutabagas was Henrietta. What Henrietta ate was the rutabagas. It was Henrietta who ate the rutabagas. It was the rutabagas that Henrietta ate. There once was a prince who was changed into a frog. There was a tree which Sanchez had planted.
Sidner uses agent and object (1983: 284, note 1) as terms for semantic deep cases. It is important to note that focus is part of a preference mechanism, which means that a) there is no single "correct" focus, only more or less likely ones; and b) mechanisms making use of focus (e.g. for pronoun resolution) cannot do so deterministically; i.e., they must use other mechanisms to confirm or negate the preferences following from the chosen focus.
106
Sentence 2: 2. [...] which in turn has its roots in the Massachusetts Institute of Technology. Esperanto translation: 2. [...] kiu siavice devenas de la Teknologia Instituto de Massachusetts. '[...] which in-turn stems from the Massachusetts Institute of Technology'
THEME 3 RHEME 3 TAIL 3
= which in turn = has its roots in the Massachusetts Institute of Technology = none
FOCUS 3
= the Massachusetts Institute of Technology
Reference links:
Figure 31
COMPUTER RECREATIONS
Turning turtle gives one a view of geometry from the inside out
Î
1
by Brian Hayes
J 1. The new way of thinking about geometry || has come to be known as "turtle geometry".
2. It || is closely connected with the programming language Logo, which in turn || has its roots in the
t
I
Massachusetts Institute of Technology.
The clause which in turn has its roots in the Massachusetts Institute of Technology in sentence 2 depends on the first clause of the sentence as a continuation of the programming language Logo. That it is not just a tail to clause 2, but must be given equal attention, is explicitly signalled by the syntagma in turn. The thematic pattern we 107
see here is again that of a regular linear progression (the focus in rheme 2 becomes theme 3). It is not so easy to find the focus in this clause. According to the sketchy rules experimented with in connection with Sidner's definition of focus (2.2.7.), its roots, being the dependent that in the linear string occurs immediately after the finite verb has, would be prime candidate. The expression to have ones roots in, however, is used so often that it is virtually an idiomatic expression, operating as a single construction. When has its roots in is the verb construction, the only candidate for the focus is the Massachusetts Institute of Technology. The Esperanto translation shows that the translator interpreted this construction as being an idiomatic expression, by translating it with a single verb devenas 'stems from'. In the Esperanto version, therefore, there is no ambiguity in focus assignment. Sentence 3: 3. Logo was conceived in the 1960's by Seymour Papert of MIT., language for introducing children to computers.
primarily as a
Esperanto translation: 3. Logon elpensis, en la sesdekaj jaroj, Seymour Papert ce MIT., 'Logo(-K)BJ) conceived, in the sixties, Seymour Papert of M.I.T. unuavice kiel lingvon por eklernigi infanojn pri komputiloj. primarily as a-language for beginning-to-teach children about computers'
THEME 4 RHEME 4
TAIL 4 FOCUS 4
Logo was conceived in the 1960's by Seymour Papert ofMJ.T. = primarily as a language for introducing children to computers = Seymour Papert of MJ.T. = =
108
Reference links:
Figure 32
COMPUTER RECREATIONS
Î Turning turtle gives one a view of geometry from the inside out Î by Brian Hayes
1
\
i 1. The new way of thinking about geometry || has come to be known as "turtle geometry". i
I 2. It || is closely connected with the programming language Logo, which in turn || has its roots in the
|
JJ
t
Massachusetts Institute of Technology. *
1
I 3. Logo || was conceived in the 1960's by Seymour Papert of M.I.T., primarily as a language for
introducing children to computers.
2.4.5. Breaking expectations, reintroduction of concepts The clause Logo was conceived ... has a passive construction, which is a marked (though not uncommon) construction in English. As opposed to the active voice, the passive voice is marked, that is, marked as for a sentence level comparison. At the text level, we are arguing, the passive voice may be the normal, unmarked form in a given context. Being marked if considered out of context means that there must be a (text level) reason for the form to be used. In this case, the reason can be found in the clause's thematic pattern, in relation to the preceding clauses. In other words, in a passive shape the sentence fits in with its context to convey the intended meaning, where its active couterpart would not bring about the same effect.
109
The active version of sentence 3 would be: Seymour Papert of M.I.T. conceived Logo in the 1960's, primarily as a language for introducing children to computers. with Seymour Papert of MJ.T. in theme position, conceived Logo in the I960's as rheme, and the focus on Logo, causing the new information (Seymour Papert, previously unmentioned) to be in the theme, and the given information (Logo, which was focus in sentence 2 and theme in the subordinate clause of that sentence) to be part of the rheme. This construction - though unusual - would not be totally impossible, as it could indicate a linear progression from the preceding focus (the Massachusetts Institute of Technology). The fact that Seymour Papert is modified by of MJ.T. does provide a link with the preceding focus. Re-changing the clause into a passive construction causes Logo to be the theme, and was conceived in the I960's by Seymour Papert of M.I.T. the rheme, with focus on Seymour Papert of M.I.T. Apart from resulting in a more natural and appropriate distribution of given and new information, the clause now clearly signals that not the focus of the previous clause is continued, but its theme. We have, in other words, a constant progression. This is a break in the expectation pattern, and such breaks are usually signalled. The passive construction in sentence 3, which allows Logo to be theme, is such a signal, the more so since one of its other functions - to avoid mentioning the agent of the verb - is overruled by the by syntagma. But an even stronger signal is the literal repetition of the word Logo, instead of, for instance, the pronoun it. In general, literal repetition of words and syntagmata within a short distance of each other will be avoided, unless where the repetition is needed to maintain the proper reference, as is the case in sentence 3. Using it instead of Logo, would activate the pronoun reference mechanism. As we discussed in section 2.2., this mechanism will select its candidates first and foremost from elements in a preferred position in the preceding text. If no other cues are found, a certain preference is given to those elements that were in focus position, even more so when those focussed elements are taken up in a subsequent clause; in other words, when their expected thematic progression has not yet been realised. In our case, the nearest "open" focus is the Massachusetts Institute of Technology of sentence 2. Since there is no semantic evidence to the contrary (the Massachusetts Institute of Technology is a man-made object, and as such can be conceived by someone), the reader would assume the Institute to be intended. To avoid such (mis)interpretation, the writer has to reintroduce - or refocus - the concept, either by repeating the word or syntagma used earlier, or by using an unambiguous reference. Figure 33 shows some instances of such concept reintroduction in our sample text.
110
Figure 33 Reintroduction of concepts in the sample text the programming language Logo > Logo the original turtle > the first such creature > a mechanical turtle > the turtle > such "floor turtles" ... "screen turtles" > the turtle itself instructions typed on a computer keyboard » commands or programs entered at the keyboard
As we can see, the Esperanto translation has the same order of elements as the English original. Its thematic patterning is identical too. This order, however, is achieved differently: instead of a passive clause, the Esperanto version has an active clause with the object moved to the front. This is possible since in Esperanto an object is marked morphologically and not, as in English, positionally. Through the Esperanto accusative marker n the object of a sentence is recognizable, irrespective of its position. As was discussed in 2.1.11., the passive is much less common in Esperanto than in English. To translate the English passive into an Esperanto passive would in this text create an unnatural Esperanto clause. By analyzing the function or purpose of the English passive - the thematic fronting of Logo - the translator can evaluate whether or not such an unnatural passive can be avoided in the Esperanto version. Since Esperanto has a perfectly natural way of thematic fronting, that mechanism is used instead (see 3.3.).
Ill
Sentence 4: 4. Many others have since contributed to its development and to its applications in education and in other fields.
both
Esperanto translation: 4. De tiam, multaj aliaj kontribuis al giaj evoluo kaj aplikadoj, 'since then, many others have-contributed applications,
to its(+PLURAL) development
and
ne nur en edukado sed ankau sur aliaj kampoj. not only in education but also in other fields.
THEME 5
= none
RHEME 5
= many others have since contributed to its development and to its applications = both in education and in other fields = its development and ... its applications
TAIL 5 FOCUS 5
112
Reference links:
Figure 34
COMPUTER RECREATIONS
t
Turning turtle gives one a view of geometry from the inside out
1
î by Brian Hayes
1 1 1. The new way of thinking about geometry 11 has come to be known as "turtle geometry".
, I
t
2. It || is closely connected with the programming language Logo, which in turn || has its roots in the |
t _ _
Massachusetts Institute of Technology.
t
3. Logo 11 was conceived in the 1960's by Seymour Papert of M.I.T., primarily as a language for t t introducing children to computers.
4. || Many others have since contributed to its development and to its applications both in education and
t
I
in other fields.
The syntagma many others is a good example of the way thematic patterns help to disambiguate potentially ambiguous references. The word others implies a contrast between two concepts: a given one, mentioned earlier in the text, and a new one. The new concept is identified by its contrast with the given one. The problem is first to correctly identify the intended referent. For any unresolved referent, the most recent focus is by default candidate, unless there are syntactic and semantic counterindications. The last focus was on Seymour Papert of M.I.T., so that would be the first candidate, the more so since the expectation 113
pattern of sentence 3 is still open. As there is no semantic reason not to have Papert as agent of contributed, this referent is most likely the intended one.
Seymour
We must be careful not to fill in Seymour Papert as the concept referred to by many others. As mentioned previously, others defines a concept by means of a contrast. In fact, others defines a group which does not include Seymour Papert, but is in some way defined by what we know about the given concept. It seems that the concept introduced by others is a class description. Its reference is to the class to which Seymour Papert belongs and describes an undefined number of members of that class, excluding Seymour Papert. There is more than one way to fill in this definition: we could take the class to stand for 'researcher', or 'employee of M.I.T.' The most obvious class description, however, is probably that of 'person'.
Sentence 5: 5. Among them are Harold Abelson
and Andrea A. diSessa
of M.I.T„
[...]
Esperanto translation: 5a. Inter ili trovigas Harold Abelson kaj Andrea A. diSessa, de M.I.T. 'among them are-found Harold Abelson and Andrea A. diSessa, of M.I.T.'
THEME 6 RHEME 6 TAIL 6 FOCUS 6
= none = among them are Harold Abelson and Andrea A. MJ.T. = none = Harold Abelson and Andrea A. diSessa
114
diSessa
of
Reference links:
Figure 35
COMPUTER RECREATIONS
Î Turning turtle gives one a view of geometry from the inside out
t
by Brian Hayes
r
r
1
1. The new way of thinking about geometry j| has come to be known as "turtle geometry".
, I
t
2. It || is closely connected with the programming language Logo, which in turn || has its roots in the
T
t
Massachusetts Institute of Technology. t 3. Logo || was conceived in the 1960's by Seymour Papert of M.I.T., primarily as a language for t Î introducing children to computers.
4. || Many others have since contributed to its development and to its applications both in education and
t
t
I
in other fields.
5. II Among them are Harold Abelson and Andrea A. diSessa of M.I.T.,
The reference of them in the syntagma among them is somewhat problematic, and requires a semantic decision based on knowledge of the world and familiarity with the conventions of written text. The focus of sentence 4 is on to its development and to its applications, which satisfies the plural restriction posed by them (we considered both in education and in other fields to be tail information, so the otherwise appropriate candidate fields receives low preference here). However, the resulting statement:
115
Among [/is development and its applications] are Harold Abelson and Andrea A. diSessa. will immediately be rejected by a human reader. The point is that Harold Abelson and Andrea A. diSessa are obviously names of persons, and persons do not match with development and applications. Having recognized the semantic incompatibility of the thematically preferred referent, another referent must be found. In Sidner's (1983) approach the (semantic) focus-based referent search is prior to other semantic compatibility checks, but also in such a procedure, when the preferred referent is incorrect, other possible referents can be found in what she calls an "alternative focus list". This list contains the elements of the preceding clauses that either did not receive focus, or had the focus before being replaced by a new focus in a subsequent syntagma. To begin with the latter, they are: "turtle geometry" the programming language Logo the Massachusetts Institute of Technology Seymour Papert of MJ.T. A simple check on syntactic features shows that none of these is compatible with the plural pronoun them, so all of them can be dropped from consideration. Of the elements that did not receive focus, the theme of the preceding clause is a preferred referent (it can fulfil the constant-progression pattern). The theme of sentence 4 is many others, which took its semantic content from the person Seymour Papert. This clearly fulfils both the syntactic plural and the semantic person restriction, and we may assume it to be the intended referent. The above demonstration of how a referent of a pronoun can be found illustrates the following principles: -
though thematic patterning can indicate preferred referents, syntactic and semantic features must be checked before a choice is made;
-
thematic patterning can provide a ranking of preference, which can tell which elements to try first. As long as there is no syntactic and semantic agreement, however, no choice can be made; and
-
correct semantic checking can depend on such seemingly obvious, but very difficult to define, pieces of knowledge such as the writing conventions for names, or the use of punctuation.
Any automatic system will have to have an extensive rule base to cope with such things.
116
Sentence 5: 5. [...] who have set forth the ideas underlying turtle geometry in a remarkable expository work: "Turtle Geometry: The Computer as a Medium for Exploring Mathematics". Esperanto translation: 5b. Ili eksplikis la ideojn, sur kiuj bazigas la testuda geometrio, 'they explained the ideas, on which is-ased the turtle geometry, en elstara enkonduka verko: "Turtle Geometry: The Computer in a-remarkable introductory work : ...' as a Medium for Exploring Mathematics" (Testuda geometrio: la komputilo kiel rimedo por esplori la matematikon).
THEME 7 RHEME 7
TAIL 7 FOCUS 7
= =
who
have set forth the ideas underlying turtle geometry in a remarkable expository work: ... = "Turtle Geometry: The Computer as a Medium for Exploring Mathematics" = the ideas underlying turtle geometry
117
Reference links:
Figure 36
COMPUTER RECREATIONS
Ì t Turning turtle gives one a view of geometry from the inside out
t
1
by Brian Hayes
1
1
1
1. The new way of thinking about geometry || has come to be known as "turtle geometry".
, l
!
j
2. It 11 is closely connected with the programming language Logo, which in turn 11 has its roots in the
_ j
t
t
i'
Massachusetts Institute of Technology. t 3. Logo || was conceived in the 1960's by Seymour Papert of M.I.T., primarily as a language for -t t introducing children to computers.
4. || Many others have since contributed to its development and to its applications both in education and
t
t
in other fields.
I " 5. || Among them are Harold Abelson and Andrea A. diSessa of M.I.T., who || have set forth the ideas
.
t
I
underlying turtle geometry in a remarkable expository work: Turtle Geometry: The Computer as a
t
I
Medium for Exploring Mathematics.
118
I
This last clause of the first paragraph offers no further problems concerning thematic progression. It is a dependent clause, continuing the focus of the main clause it depends on. It is interesting to see how the element that receives focus, the ideas underlying turtle geometry, is presented in a form that indicates it is given rather than new information. Looking back through the text, we find no suitable reference for the syntagma, so we must assume that the writer took the ideas underlying turtle geometry to be uniquely identifiable (hence the definite article) by reference to our common knowledge. Knowing that geometry is a human artefact, an abstract mental construction, does indeed imply that it is based on ideas. This concluding clause of the paragraph jumps back nicely to the theme set by the sub-title, but abandoned in favour of a "side-line" concerning Logo and the history of that language. This has two consequences for our perception of the thematic patterning in the text. First of all, the link between the many others who have ... contributed to Logo and Harold Abelson and Andrea A. diSessa who, surprisingly, have written about turtle geometry instead of Logo, more or less "equates" Logo with turtle geometry. This means that all the preceding sentences from 2 onward, though never once mentioning it, were talking about the origins and development of turtle geometry as much as they were talking about Logo. Secondly, the return to an element that was focussed earlier, but not taken up in subsequent clauses, effectively "closes up" the "side-line" about Logo, returning the focus of attention to turtle geometry again. This "return" is modelled by Sidner (1983) as a "pop" of the stack on which previously focussed elements are kept. The effect of such a pop is that the elements focussed between the original focus and the return of the focus to it are no longer directly accessible for backward referencing. That is, any subsequent reference to these elements will have to involve an explicit reintroduction of the concept in question, necessary to replace the concept on the stack. A nice touch in the return to turtle geometry is that the still unexplained link between computer, turtle and geometry is strengthened (but not explained) by the citation of the book's title Turtle Geometry: The Computer as a Medium for Exploring Mathematics. We have already been told that turtle geometry is closely linked to a programming language, so this was the link with computer, but the turtle in turtle geometry remains a mystery. And now, through the book's title, turtle, computer and geometry are back together. The effect of this is that though the clause's actual focus is on the ideas underlying turtle geometry, the concepts turtle geometry and computer are also prominent in the reader's mind, and s/he will certainly expect to be told more about them in the subsequent clauses.
119
2.4.6. The complete pattern The way the printed text is presented on the page clearly shows that sentence 5 was the last sentence of the paragraph. Though such explicit paragraph markings are not absolutely reliable (they are not always used consistently in a text-grammatically interpretable way), it usually indicates that a particular line of argument has been closed, and a new one will be taken up. As we already noticed, the closing clause of paragraph 1 nicely returns to the theme set in the sub-title and opening clause, which strengthens the impression that this paragraph is indeed a self-contained unit. We can now draw a diagram of the thematic progression of paragraph 1, to give us an overview of its patterning and the way concepts are introduced, expanded, and/or dropped again.
120
Figure 37 Thematic progression in paragraph 1
"the new way of thinking about geometry" previous passage
"Many others ... contributed"
••"^v--
"H. Abelson & A. A. diSessa..."
R6
T7 "a remarkable expository work" R7
121
2.4.7. Thematic patterns as a summary mechanism We already saw how an analysis of thematic progression influences the expectation patterns of the reader, and supports the inter-clausal reference mechanism. Another useful aspect of a thematic-pattern analysis is that it can be used to derive a "summary" of a length of text (provided it is coherent enough to produce a single structure). Looking at the analysis in paragraph 1, above, we can summarize as follows: turtle geometry is "linked" to Logo, of which a short history is given in terms of place {the Massachusetts Institute of Technology), time (the 1960's), and inventor Seymour Papert. Logo's development is then connected with two specific names: Harold Abelson and Andrea A. diSessa, whose contribution was a book on turtle geometry, which brings us back to the original theme. This summary shows that there are four main themes, turtle geometry, Logo, many others and Harold Abelson and Andrea A. diSessa. We could draw a kind of semantic representation of the way they are connected:
Figure 38
turtle geometry
wrote about I Harold Abelson and Andrea A. diSessa
connected to
members of
>
Logo
>
developed and applied I I Many others
The fact that this little network starts and ends in turtle geometry suggests that turtle geometry is the central theme, with all other information depending on it. An extreme abstraction of paragraph 1 could therefore be that it "is about turtle geometry", thus hiding all other information from sight (cf. Geist's "key word" method; Geist 1987: 742). Such an extreme abstraction might be useful in an automatic analysis system that has the capability of storing the analysed information in a database for later use, such as information retrieval systems etc. Storing that information in a hierarchical format reflecting the structure of the text, would mean that a) for any indexed theme, only the information depending on it would be available; and b) a whole paragraph can be indexed and accessed under a single theme, which can then be used as a single constituent in a higher level of textual patterning (see below). 122
2.4.8. The complete pattern, continued We will not go through paragraph 2 in as much detail as paragraph 1. Instead, we will first give the diagram of thematic patterning, which we will use to discover some of the interesting and problematic aspects of thematic patterning, both in relation to the our sample text, and in general.
123
Figure 39 Analysis of paragraph 2
124
The first clause of paragraph 2 picks up the turtle of turtle geometry in the preceding paragraph. The syntagma the original turtle tells us several things: -
there is a single original turtle or a class of original turtles that is smaller than the overall set of turtles spoken about here;
-
original implies that there have since been others; and
-
this way of starting the paragraph rouses the expectation that the explanation of turtle not given in paragraph 1 will be given here.
The explanation of the concept 'turtle' starts off with a bifurcated rheme. Two descriptions of the same object are given, a general classification (a mechanical device) and a more detailed description, very much like a dictionary definition of a word (a small wheeled vehicle). Of this bifurcated rheme, the second part is continued first. In a pattern of constant progression, headed off by the relative pronoun whose we are given information about its operation (movements could be controlled by instructions typed on a computer keyboard), origin (the first such creature), and appearance (a dome-shaped cover). The name turtle is explained in a tail construction: somewhat like a turtle's shell, which hinges on the reader's understanding that the turtle of the preceding passage was not the animal, but a mechanical device resembling the animal. In other words, any mental image the reader may have had of the turtle so far, will have to be revised to incorporate the new information. The "side-line" is then abandoned to return to the first rheme of sentence 1, through the repetition of mechanical and turtle. Its name and origin having been explained, we are now given a detailed description of its operation. A "break" in the thematic progression of the paragraph occurs in sentence 11 (Today such "floor turtles" are less common than "screen turtles"). The word floor in "floor turtles" is rather strange, as there has been no previous mention of floor at all. Though most readers will, with some imagination, be able to construct a mental image of a mechanical turtle moving around on a piece of paper lying on the floor, this is a far from obvious inference. In our opinion, the use of the unintroduced word floor here, is an example of "inconsiderate" reference (Hirst 1981: 60): an apparent reference for which no actual referent exists. Such words are particularly hard to disambiguate and translate, as no semantic context is available to place them in (see also 2.3.).
The word such points back to the complete description given so far, which was an elaboration of the very first theme, the original turtle. The "break", or "return", is used to move to something that was already implicit in the opening clause: the fact that there are other than original turtles. The other turtles are introduced by means of an explicit contrast, in the syntagma less common than. The contrast is contained in the relative clause which move and draw on the surface of a cathode ray tube, but requires a non-trivial inferencing step to be understood. Finding the intended contrast comes down to realizing that move and 125
draw are properties of "floor turtles", already mentioned explicitly (move forward, and backward) and implicitly (it leaves a record of its path). The contrast must then be in on the surface of a cathode-ray tube. The solution seems to be an implication of 'surface' in over a sheet of paper in the preceding sentence. To solve such obscure reference requires sophisticated semantics, which may well be beyond an automatic analyzer. The question is now whether it is absolutely necessary to completely understand the reference to produce an adequate translation. A system that is capable of detecting thematic patterns and their signals might make up for a lack of understanding by taking care to translate the signals as faithfully as possible, relying on the human reader of the translation to solve the reference. In other words, in the case of the "break" in the thematic progression, brought about by an explicit contrast marker such as less common than, signalling the break and translating the marker may be sufficient to make the reference identifiable, even though a lack of understanding on the machine's part causes the referee to be a less than perfect translation of the original. It would certainly be worth investigating the translation supporting task of thematic patterning further, especially in relation to the (in our opinion inevitable) lack of depth of the semantic module(s) we can expect in the coming years. In the DLT system, thematic patterns can certainly be used to guide the interactive dialogue, which is used to ask the human user to solve certain problems the computer cannot properly solve itself. Copying thematic signals and explicit markers in the questions can help to direct the reader to the proper answer, even if the computer fails to find a solution. After all, to ensure that the answers to the dialogue are the best possible, the questions must be as clear and "directed" as possible. A system that "understands" thematic patterns has a much better chance of asking the right questions than one that operates on a sentence-by-sentence basis. Though paragraph 2 is much less straightforward than paragraph 1, the diagram does allow a summary to be constructed, which clearly reflects its underlying structure. From the first paragraph's turtle the original turtle is introduced. This is simultaneously described as a mechanical device and a small wheeled vehicle, to which some details of origin, inventor, and shape arc added. Returning to the mechanical turtle, details of its operation are given. This description is then used to introduce the new concept screen turtle by means of contrast. The paragraph ends with a contrastive description of the new concept. In a semantic network, the paragraph can represented as:
126
Figure 40
the original
turtle
IS-A
mechanical
CONlKAST
screen turtle
small wheeled
IS-A
built by
device
W. Grey Walker
leave a record
vehicle
shaped like
late
turtle's
shell
1940's
move forward and backward and can change direction
2.4.9. Conclusion The communicative bifurcation of clauses as expressed in theme-rheme structures was in the previous sections presupposed and used as an instrument for suggesting default rules for semantic-pragmatic referent choice. In this section we display a sample theme-rheme analysis for a short paragraph as an illustration to the more general considerations of text structure described in terms of thematic progression. We have already pointed out the need of counterchecking the text structure brought about in the target language, to insure that it is understood in as much as possible the same way as the source text. As for the theme-rheme division, this need has two consequences: First, the theme-rheme distinction is applied as one of the instruments for resolving deixis and reference, and it should be used for this purpose also in the target language. But second, the theme-rheme distinction is not without reason called communicative, since it makes a contribution of its own in shaping the content of the text into a coherent whole. How to translate this function is taken up in 3.3.
127
2.5. Reduction to verbal elements
Several times in the previous chapters we have talked about expectations that supposedly are roused in the reader by certain words and constructions. It is not always easy to identify exactly what causes those expectations. It is commonly accepted that a large part of a reader's expectations is caused by her/his knowledge of the world, that allows her/him to create extensive "scenarios" on the basis of minimal textual information. A considerable part, however, of what a reader expects seems to be tied in to the words of her/his language, or more precisely, in the reader's lexical knowledge. As is extensively argued by Papegaaij, Sadler and Witkam (Papegaaij 1986: 95ff.; Papegaaij/Sadler/Witkam 1986), words carry with them constraints on the contexts they are likely to occur in. A language user's internal dictionary apparently contains information about the other words a certain lexical item typically interacts with to form clauses. In the DLT system, lexical knowledge is stored in a manner based quite straightforwardly on these ideas, as can be seen in the sample entries from DLT's lexical knowledge bank shown in 2.5.2.
In this section we will examine the way words constrain certain expectation patterns to be fulfilled by other words, between them. Where it seems relevant we will consider the translating those patterns; the possibilities and impossibilities,
their context and set up both within clauses and problems associated with and possible solutions.
2.5.1. Case grammar Many grammarians are concerned with the concept of grammaticality (see 3.5.1.). It was often addressed by more or less purely syntactic means, but scholars soon reached a point where they felt purely form-related criteria could not achieve what they wanted them to do: describing all and nothing but the "grammatical" sentences of a language. A well-known attempt to augment the set of syntactic instruments with semantic regularities is Charles Fillmore's case grammar (1968). This dependency-semantic model was taken up as a fruitful idea by very many researchers. In theoretical linguistics the high tide of interest for case grammar now seems to have passed, but, as Harold Somers (1987: viii) points out, in computational linguistics it is still "an extremely popular linguistic theory".
128
The basic ideas behind case grammar are: -
next to syntactic classification there is semantic classification, usually by means of semantic features. Combining syntactic and semantic classification results in more precisely defined groups of words, which makes more accurate predictions about their behaviour possible;
-
the syntactic functions in a clause are complemented by semantic functions. While syntactic functions describe the formal relations between words, semantic functions describe the basic meaning relations between them.
These two ideas combined into case grammar provide a semantically oriented description of clauses that can help to describe more accurately how words constrain their context. One of the most prominent functions of case grammar is that it made explicit the tertium comparationis on which the ideas about meaning-preserving transformations in the generative theories of that time could be based. The classical example is the active-passive distinction in which the case analysis is the same for both clauses, even though the syntactic analysis differs considerably. In this way, case grammar reflects the intuition of native speakers that the passive counterpart of an active clause basically means the same. It has cost scholars considerable effort to explain what the word basically exactly means in argumentations of this kind (cf. also Schubert 1987a). In view of our present interest for text coherence, we may say that active and passive sentences are - under appropriate conditions - interchangeable as long as no text level context is taken into account. This function as a tertium comparationis is possible, since case patterns are more stable than syntactic patterns, i.e., they are less subject to surface variations, since they are closer to the meaning of the clause. By describing the expectation patterns of words in terms of deep cases (as they are often called in order to avoid confusion with morphologically marked "surface" cases) and semantic features, it becomes possible to predict possible combinations of words on a more meaning-oriented level (figure 41), which is often more accurate and in any case adds an extra dimension of discrimination to the grammar.
129
Figure 41 A word with its semantic expectations in terms of deep cases
to saw:
AGENT PATIENT INSTRUMENT
= HUMAN, MACHINE = INANIMATE
= ARTIFACT, TOOL (saw,
knife)
2.5.2. The verb as central element As in other dependency-oriented models, the central unit in case grammar descriptions is the verb. Apparently, it is the verb around which other words cluster, and it is the verb that dictates which combinations of deep cases are possible within a clause. In other words, the expectations about combination patterns of words are in case grammar described largely verb-oriented. There are possible extensions, in which other syntactic word classes can be related to each other as well, but the basic idea is that the verb determines the possible deep cases in a clause. The direction of dependency relations, thus among other things the question whether the verb should be a governor or a dependent of a given word or word group, is something to be decided upon by the grammarian in a purposeful, but ultimately arbitrary way. We have earlier discussed this fact with respect to syntax (Schubert 1987b: 40), but it pertains to other dependency relations as well. But since we have opted for the finite verb as the main internal governor of a clause at the syntactic level (which is the usual choice in dependency syntax), it is expedient not to depart from this decision in semantic dependency patterns.
If we translate the purposefulness of this view on verbs to expectations, the main subject of this chapter, this means that verbs generally have a far more detailed expectation pattern for their context than words of other word classes. This has also an intuitive dimension. For most verbs, most people find it possible to list numerous examples of words that are typically used in combination with it (figure 42). For other word classes this is often much less so.
130
Figure 42 Some verbs with expected deep cases
to drink:
AGENT PATIENT
= ANIMATE: = LIQUID:
man, woman, dog, bird water, milk, beer
to shoot:
AGENT PATIENT INSTRUMENT
= HUMAN: = ANIMATE: = TOOL:
man, woman, boy, girl deer, man, duck, fowl gun, revolver, pistol, arrow
The SWESIL system of DLT (Papegaaij 1986: 75ff.), which, basically, contains nothing but lists of expectations for the immediate context of Esperanto words clearly shows the greater detail of verbal expectation patterns over that of other words. Figures 43 to 45 show example entries for a verb, a noun and an adjective from DLT's lexical knowledge bank (LKB).
131
Figure 43: Example LKB entry for the verb protekt'i 'to protect'; the English glosses have been added for the reader's comfort here.
protekt'i 'effectively, officially protect'
e
efik'e, oficial'e
protekt'i 'protect in facility'
en
facil'aj'o
protekt'i
per
protekt'i
as
protekt'i 'protect from state'
de
protekt'i
n
protekt'i
kontrau
protekt'i 'protect and protecf
kaj
favor'o, influ'o, karton'o, potenc'o, trance'o 'protect by-means-of favour, influence, cardboard, power, trench'
amik'o, anim'o, instituci'o, karton'o, lamen'o, mont'o, patr'o, patron'o, potenc'o, reg'o 'friend, spirit, institution, cardboard, metal sheet, mountain, father, patron, power, king protects' stat'o
al'o, amas'o, animal'o, best'o, font'o, hejm'o, individu'o, infan'o, kongres'o, labor'o, person'o, popol'o, scienc'o, viv'o 'protect wing, mass, animal, source, home, individual, child, congress, work, person, people, science, life' best'o, danger'o, korod'i, korod'o, minac'o, split'aj'o 'protect against animal, danger, corroding, corrosion, threat, splinter' protekt'i
akciz'o, gard'i, garn'i, impost'o, por protekt'i kas'i, medicin'o, separ'o, sin'o, ten'i, voc'o 'excise, to guard, to fit out, tax, to hide, medicine, separation, bosom, to hold, vote to protect' asekur'i, sirm'i, zorg'i per protekt'i 'to insure, to shield, to take care of by protecting' unu'o 'unit for protecting'
pri
protekt'i
dev'i 'to have to protect'
i
protekt'i
132
Figure 44 Example LKB entry for the noun bulten'o 'bulletin' bulten'o 'bulletin about the USA'
pri
uson'o
bulten'o 'bulletin of association, institute'
de
asoci'o, institut'o
ekonomi'a, kun'a, lok'a, region'a, semajn'a, tiu 'economic, accompanying, local, many, regional, weekly, that bulletin'
bulten'o
a
bulten'o 'information bulletin'
pe
inform'o
anstatau'i, aper'i, ekzist'i 'bulletin replaces, appears, exists'
as
bulten'o
atend'i, el'don'o, foto'kopi'o, de nom'o 'to expect, edition, photocopy, name of bulletin'
bulten'o
pov'i 'to be able by-means-of bulletin'
per
bulten'o
el'don'i, pres'i, redakt'i 'to publish, print, edit bulletin'
n
bulten'o
bulten'o, revu'o 'bulletin, journal and bulletin'
kaj
bulten'o
133
mult'a,
Figure 45 Example LKB entry for the adjective blank'a 'white' blank'a 'white in light'
en
lum'o
blank'a ' white as ivory, milk, snow'
kiel
ebur'o, lakt'o, neg'o
blank'a pro 'white because-of cleanliness, soap, fear'
pur'ec'o, sap'o, tim'o
blank'a 'white and white, red'
kaj
blank'a, rug'a
farb'i, lav'i, pur'ig'i 'to paint, wash, clean white'
est'i
blank'a
alumini'o, cerv'o, dom'o, farb'o, a blank'a flor'o, har'ar'o, har'o, haut'o, kart'o, karton'o, klav'o, kolor'o, kostum'o, makul'o, mur'o, nub'o, or'o, paper'o, part'o, plafon'o, plast'o, plat'o, plum'o, pulvor'o, sabl'o, sap'o, stri'o, saum'o, stof'o, televid'o, tol'o, tuk'o, vin'o, vizag'o 'white aluminium, deer, house, paint, flower, hair, skin, card, cardboard, key, colour, suit, spot, wall, cloud, gold, paper, part, ceiling, plastic, plate, feather, powder, sand, soap, strip, foam, material, television, linen, cloth, wine, face' blank'a, flav'a, nigr'a, pal'a, rug'a kaj 'white, yellow, black, pale, red and white'
blank'a
2.5.3. Nominalized verbal elements In principle, we can say that verbs determine the expectation patterns for the complete clause, while other words have only local expectations. This is evident from the central role of the verb in case grammar, and is also clearly illustrated in Simon Dik's Functional Grammar, where language utterances are analyzed in terms of predications: "the application of a predicate to an appropriate number of terms functioning as arguments of that predicate" (Dik 1978: 15). Basically, predicates are given as words in the lexicon, though they can also be derived with predicate formation rules. Though Dik recognizes nominal and adjectival predicates as well as verbal ones, from the analyses he gives it is clear that the most 134
powerful (i.e., governing the largest number of terms) predicates are the verbs in the dictionary. In Dik's model, the dictionary lists only "basic" predicates, but it is possible to derive new ones by means of formation rules. This means that basic predicates may be used in ways that are not given in their dictionary definition, in other words, predicates can change their function in a clause. The most striking example in English is the ease with which verbs can be nominalized (and possibly the ease with which nouns can be "verbed"). Nominalization means using a verbal predicate "as i f ' it is a nominal one, i.e., it can function as a term to another verbal predicate in positions that are normally reserved for nouns. When we look at it from the point of expectation patterns, nominalization is an interesting process. We have already said that verbs have by far the most extensive expectation patterns (in Dik's terms, they govern predications with the largest number of terms). Where normally the terms in a predication do no more than constrain their immediate context (e.g. nouns constrain their adjectives), nominalized verbs can, in principle, constrain much wider contexts. In this chapter we will try to find out whether those unfulfilled expectations are truly unfulfilled, or whether they play a role in a text's coherence by implying links between otherwise unrelated sentences. The question will be: does knowing the underlying basic predicates help to find implied relations between clauses in a text.
2.5.4. Some sample analyses Let us first see what an analysis in basic predicates looks like when applied to some of the sentences in our sample text. To begin with the sub-title:
Turning turtle gives one a view of geometry from the inside out or, in Esperanto:
Testudigo permesas rigardi la geometrion elinterne
135
An analysis of the English clause could be:
Figure 46
give AGENT
BENEFjCIARY
PATIENT
turn
one
view
AGENT
GOAL(?)
???
turtle
AGENT
PATIENT
MANNER
???
geometry
from the inside out
or, in predicate notation: give( [AG] turn( [AG] X, [GOAL] turtle), [BEN] one, [PAT] view( [AG] Y, [PAT] geometry, [MAN] from the inside out))
136
For the Esperanto clause the analysis would be:
Figure 47
permesas 'allows' AGljiNT
BENEFICIARY
igi 'become'
777
AGENT 999
GOAL(?)
AGENT 999
testudo 'turtle'
MANNER geometno 'geometry'
elinterne 'from the inside out'
which is in predicate notation: permesi( [AG] igi( [AG] X, [GOAL] testudo), [BEN] Y, [PAT] rigardi( [AG] Z, [PAT] geometrio, [MAN] elinterne)) Two things are interesting in this analysis. First of all there is the obvious similarity between the underlying structures of the English and Esperanto version. In fact, many researchers in the machine translation field claim that it is precisely the underlying deep case structure (also called case frame) that makes translation possible. They maintain that translation is a process of finding the underlying deep case structure of the original and then "filling in" the target language realization for that deep case structure. However, this presupposes that deep cases are language independent, an 137
assumption that is far from proven (see Tsujii 1986: 656; Schubert 1987a). But even if deep cases are language-specific, analyses such as the above may help to find a translation as close as possible to the original, provided we are aware of the differences in deep-case systems between the two languages involved. In the DLT system, in fact, semantic processing of all source language sentences is done by translating them into Esperanto-specific "deep case-like" structures (Papegaaij 1986: 108ff.). No source languagespecific deep-case notation is used.
The second thing the analyses reveal is in the English sentence the existence of implied agents and, in the Esperanto sentence, of an implied beneficiary. (It could be that a more elaborate analysis brings even more implied terms to the surface, but for the moment we will stop here.) These are unfilled slots in the expectation patterns of the underlying verbs. That they are unfilled is because the verbs they would be governed by have been nominalized, which blocks the syntactic possibilities for their realization. Semantically it seems likely that at least the two implied agents are supposed to be the same "hypothetical" entity which, in the English original, is given as the beneficiary of give in the word one. Since, as the word one indicates there is no definite entity intended, however, it seems that the actual "unification" of the agents in the clause (i.e., declaring them to be the same, though unknown, entity) is not necessary. The lack of necessity for an identification of the entity given as one is even stronger in the Esperanto version where the beneficiary of permesas is simply left out completely, although it would syntactically be perfectly possible to translate it (as al oni). The second English sentence can be analysed as: has( [AG] think about( [MAN] the new way [AG] X, [PAT] geometry)), [???] come( [AG] Y, [RES] be{ [AG] Y, [ATTR] know{ [AG] X, [PAT] Y, [MAN] as "turtle geometry"))))
138
or for the Esperanto version: igi( [AG] la aliro{ [ATTR] nova [AG] X, [GOAL] al geometrio)), [RES] koni( [AG] X, [PAT] Y, [MAN] sub la nomo "testuda geometrio", [ATTR] dis)) In 2.1. we already showed how the complex verb construction has come to be known as is translated into the single Esperanto verb diskonatigis. Interesting is that the manner slot, indicated in the English version by the adverbial as, is explicitly paraphrased in the Esperanto by means of the preposition syntagma sub la nomo. The underlying deep case, however, remains the same. The way deep-case relations can be realized on the surface level can range from extensive paraphrases (under the name of, in the manner of), single function words (as, like), or left totally implicit, without any surface marking. As we see here, the extent to which a deep-case relation is marked on the surface may very well differ from one language to the next. The prefix dis has been added by the translator in accordance with current Esperanto practice. It makes more explicit the implied "generality" of the statement. In both English and Esperanto the agent of the main predicate is the patient of the know - Icon predicate. An interesting point is the change in deep-case slot between think about and aliri al. The English patient has become goal in Esperanto (though deep-case analysis remains dubious and controversial when it comes to other than agent/patient labels). This kind of deep-case transformation is not uncommon, and shows that -
deep cases are not language-independent; and
-
a translator must be aware that choosing a particular translation may require reworking of the underlying semantic structures as well.
The kind of deep-case transformation we have here is word specific. Instead of the original, the Esperanto syntagma chosen by the translator determines the deep case structure. It is therefore most likely that such transformations are stored in the lexicon, as part of the word-specific metataxis rules (metataxis is described by Schubert, 1987b: 130ff.).
139
2.5.5. Using extended expectations to solve definite reference Now that we have demonstrated a few aspects of how reduction to verbal elements works, let us show how the larger expectation patterns of the verbs can help to solve a typical text coherence problem, namely definite reference. As we saw in 2.3.3., definite reference requires that the referent can either be found in the text, or is uniquely identifiable in the receiver's knowledge of the world. Since no good example occurs in our sample text, we will for once have to revert to the (questionable) practice of making up an example. Consider the sentences: He went to the bank to get a loan. At home he counted the money and put it safely away. There are two definite references: the bank and the money. One could argue that bank, loan and money are so obviously connected that the definite references rely just on the reader's common knowledge to be identifiable. This may be true, but finding the underlying verbal elements can help explain (and then eventually formalise for a computer system) how this knowledge is retrieved. An analysis of the first sentence in verbal elements looks like: go(
[AG] he, [PLACE] to the bank, [GOAL] to get( [AG] he, [PAT] loan( [AG] the bank, [PAT] X, [BEN] he)))
This reveals that loan, as a verb, has an agent (the bank) and a beneficiary (he), and an (as yet) unknown patient. The patient slot of loan, however, has a strong preference for money, as most dictionaries will confirm. In other words, loan expects money, an expectation that is not fulfilled in the current sentence. The second sentence, however, contains the definite noun syntagma the money, as if it has already been mentioned elsewhere. This definite reference is now easily resolved. The concept money is so strongly anticipated as patient of loan, that any reader will almost automatically expect money to be intended. When the second sentence then mentions money explicitly, the concept money is fresh enough in the reader's mind to allow a definite reference to it.
140
2.5.6. Coherence through "deep" reference Our sample text contains an interesting example of how underlying verb patterns can strengthen text coherence, even when this is not strictly necessary for a correct understanding of the sentences involved. Consider the syntagma in sentence 4: 4. [...] both in education and in other fields which is a bit "marked" in that the indefinite noun education is set apart from other fields in a way that implies it has already received some attention. The word education, however, is here used for the first time. How then can we explain the marking? We may get some indication when we look at the previous sentence, which ends in the syntagma: 3. [...] as a language for introducing children to computers The predication in that syntagma is: introduce( [AG] X, [PAT] children, [GOAL] to computers) The predication in the syntagma containing education is: educate( [AG] X, [PAT] Y) In the semantic dictionary, educate will have several expectations, both for the agent (parent, teacher) as for the patient (child, student). Through the patient expectation of educate for child and the occurrence of children as patient of introduce, it is now possible to recognize a semantic similarity between the two syntagmata. The concept education was already implied by the syntagma introducing children to computers, which is the reason for setting education apart from other fields. Note that the recognition of the semantic "echo" found here is not necessary for understanding the passage. Both sentences are perfectly understandable on their own. Recognizing the echo, however, helps to give a feeling of coherence, a "linked-ness", that can make the difference between a collection of separate sentences and a text. The Esperanto translation, interesting enough, has an even stronger echo. translation of introducing children to computers is: 141
The
eklernigi infanojn pri Icomputiloj 'cause-begin-to-learn children about computers' Semantically, cause (begin) to learn is virtually synonymous with educate. This synonymity, coupled with the patient similarity that was present in the English version as well, makes the coherence between the two sentences extra strong.
2.5.7. Translating metaphorical expressions The following interesting clause is the subordinate clause in sentence 2: 2. [...] which in turn has its roots in the Massachusetts Institute of Technology. Two analyses seem possible: has( [AG] which, [PAT] its roots, [PLACE] in "MJT') where has is the main verb with an agent and patient and a preposition syntagma functioning as a place descriptor. This analysis ignores the metaphorical interpretation of the syntagma to have ones roots (in), taking its roots as literal patient An alternative analysis is: has roots( [AG] which, [ORIG] in "MJ.T.") where the syntagma to have ones roots (in) is taken as a metaphorical expression, almost idiomatic. In the metaphorical interpretation, the whole syntagma represents a single meaning, which can be paraphrased as to stem from. It is up to the translator to recognize such metaphorical usage, the clue to which is in this case the very low compatibility between a programming language and (literal) roots. The Esperanto translation replaces the metaphorical expression with a single, non-metaphoric, verb: deveni( [AG] kiu, [ORIG] de "MJ.T.") The result is a different surface structure (there is no object in the Esperanto clause corresponding to the object its roots in the original), but rather an exact match in underlying deep-case structure.
142
2.5.8. When is a word a verb? So far we have assumed that finding the verbal elements in clauses is a straightforward process, and that it is always possible to separate not only the words that function as verbs in the given clause, but also those that are deverbal nouns, thus have some basic verbal traits. In reality, however, this is not always the case. In English, in particular, it is often so easy to use words both as nouns and as verbs, that even native speakers are unable to tell which actually is the basic syntactic function out of context, the verb or the noun. In English there will, therefore, be many cases in which it is not clear whether the reduction to verbal elements truly reflects the underlying basic meaning. In Esperanto, the situation is for words out of context markedly different. Even within complex multi-morpheme words, each content morpheme can be identified as belonging to one of the three semantic morpheme classes (event, quality, item; see 2.1.5.). If the governing morpheme of a word is an event morpheme, the word is one of the "verbal elements" we are looking for - regardless of its (syntactic) word class. We demonstrated this in 2.1.5. with the word al'ir'o 'approach'. Its internal governor in a word-grammatical dependency analysis of morphemes is ir 'go', which in Esperanto dictionaries can be found to be an event morpheme (lexicographers often use another term such as "verb stem" or do not mention this explicitly, but it can be inferred since the verb ir'i is the main entry). The strictly agglutinative character of Esperanto words and the concept of semantic morpheme classes are of great help in the process of semantic analysis and disambiguation. Reduction to verbal elements as shown in this section can be done automatically, using the transformation rules mentioned above, which are part of an extensive word grammar (cf. also Schubert forthc. a). In the semantic dictionary, we then need, in principle, only the semantic expectations for the stems, not for the words derived from them. Not only can this help to reduce the size of the semantic dictionary, it also helps to achieve a degree of generalization, by finding the underlying patterns of expectation rather than the single pattern of the complex surface word. Many complex words describing processes or events can be reduced step by step, thus revealing the more simple processes that constitute them (figure 48).
143
Figure 48 An example of iterative pair reduction This example shows how an expression such as 'drinking water contamination' can be reduced, thanks to the regular derivational system of Esperanto, to the more basic combination of concepts 'clean water'. Obviously the transformations are not meant to be meaning-preserving. Input word pair: mal'pur'ig'it'ec'o de trink'akv'o ' un-pur-if-ied-ness of drinking-water' First transformation: —»trink'akv'o a mal'pur'ig'it'a ' drinking-water un-pur-if-ied' Second transformation: —»mal'pur'ig'i n trink'akv'o 'un-pur-if-y drinking-water' Third transformation: —» trink'akv'o a mal'pur'a 'drinking-water un-pure' Fourth transformation: —> trink'akv'o a pur'a 'drinking-water pure' Fifth transformation: akv'o a pur'a 'water pure'
Many semanticists have argued that a similar simplification process could be carried on much further, that is, until only the most basic concepts remain, the so-called semantic primitives. In practice, however, total reduction of words turns out to be very difficult, and several decades of research have not resulted in any consensus over what the ultimate set of semantic primitives should look like. 144
Esperanto, in contrast, offers us a ready made alternative. Its vocabulary is large, but a comparison of the number of different morphemes in, say, 100,000 words of running text in parallel English and Esperanto translations would probably yield a significantly lower number for Esperanto than for English. It is in this sense that the vocabulary of Esperanto is sometimes said to be much smaller than that of a language such as English, where most derived words must be listed separately. The result is in Esperanto a more manageable dictionary, with considerable powers of analysis, but with a set of stems that is extensive and realistic enough not to get lost in the tangles and pitfalls of "complete" semantic reduction. This "compromise" between extensiveness and generality, moreover, has been used for over a hundred years now as a living language, which proves that it can function as a fully operational human (read: natural) language, which is a prerequisite for translation (as argued by Schubert forthc. c).
2.5.9. Finding parallel structures Apart from resulting in more semantic information in the form of wider expectation patterns, the reduction to verbal stems can also help to bring to light certain regularities in patterns that are not immediately visible on the surface. Such regularities can account for part of the coherence (or feeling of "like-ness") that readers sense between two sentences, even when their surface representations seem to be very different. A good example of this kind of underlying regularity can be found in our sample text The sentence: 3. Logo was conceived by Seymour Papert of MJ.T., primarily as a language for introducing children to computers. can be analyzed as: conceive( [AG] Seymour Papert of MJ.T., [PAT] Logo, [MAN] as a language( [GOAL] for introduce{ [AG] X, [PAT] children, [GOAL] to computers)))) where X can be either a general, unidentified agent, or language (= Logo), as in Logo introduces children to computers. The next sentence is: 4. Many others have since contributed to its development and to its applications both in education and in other fields. 145
In the analysis of this complex sentence several predications are embedded; have( [AG] many others, [???] contribute( [AG] X, "and"( [BEN] its develop( [AG] Y, [PAT] Z) [BEN] its apply( [AG] Y, [PAT] Z), "both-and"( [MAN] in educate( [AG] A, [PAT] B), [MAN] in other fields)))) where we know that X = many others and Z = its = Logo. What is important about this analysis is that it shows Logo as the patient of three verbs, two of which were not given as verbs in the clause, but as deverbal nouns: conceive( [AG] Seymour Papert, [PAT] Logo) develop( [AG] Y, [PAT] Logo) applyi [AG] Y, [PAT] Logo) It is now easy to see that the three verbs form a logical sequence. Semantically, conceive, develop and apply are closely related words, which ties the two sentences together semantically, as well as syntactically (in 2.1. we already showed that others is a typical linking word). And we can go even further. The agent of contribute is many others, whereas the agent of develop and apply remained unspecified. It does not seem illogical to "carry over" many others to the agent positions of develop and apply as well, since the agent of a verb like contribute is semantically strongly tied to any agents of any verbs it governs (cf. He helped push -> He pushed). We now have:
146
conceive( [AG] Seymour Papert, [PAT] Logo) develop( [AG] many others, [PAT] Logo) apply( [AG] many others, [PAT] Logo) which helps to solve the semantic resolution of the problematic pronoun others. If syntactic and thematic clues are not enough to make others point to Seymour Papert as the source of its semantic content, the undeniable regularity of the underlying verb patterns in the two sentences leaves no doubt whatsoever.
2.5.10. Conclusion Verbs are those words that have most structure-building properties in clauses and sentences. Especially when a dependency-oriented approach to grammar is used as in dependency syntax or case grammar, a good description of the characteristics of verbs is essential. As has been noticed by many grammarians, the structure-building features of verbs come to light also to a certain extent in deverbal words such as nominalizations, participles etc. The semantic classification of basic morphemes into events, qualities and items is certainly not universal, but as experience shows, it can be transferred from one language into another quite often with reasonable reliability, [n the DLT setting, semantic processing is preferably carried out in the Esperanto steps of the overall translation process, and it is thus expedient to this approach that morpheme classes are uniquely identifiable in Esperanto. Procedures based on the identification of event (or verbal) morphemes can with profit be used also in the analysis of other languages, as has been demonstrated in this section for English.
147
2.6. Rhetorical structures and logical connections
The aspects of text coherence we have discussed so far were all closely linked to the syntax and semantics of the sentence. We tried to improve our interpretation of individual sentences using possible links between the other sentences in the text. The closest we came to a "text understanding" approach is the thematic progression discussed in section 2.4., but that was still to some extent sentence-oriented. In this chapter we will try to "move away" from the individual sentences, to see if there is a more global way of looking at texts that may help to get a better understanding of the way texts convey their message. We will see that there are indeed general rules, largely pragmatic in nature, for the way statements in a text follow each other. Interesting is that these global patterns are very often "traceable" through the text because of single lexical items that have no other function than to single out such patterns. And often content words that function normally within the sentence may carry "overtones" that help set and/or recognize global patterns as well.
2.6.1. Rhetorical patterns An interesting bunch of related works on English comes from a group of grammarians linked to Eugene Winter. Their approach is remarkable for two things: their concern with the surface of the language; and their concentration on discourse as the main object of study. With regard to their interest in surface level processes rather than deep, semantic, analysis, one of these scholars, Michael Hoey (1983: 10f.), relates an interesting experiment which transformed a coherent text into a "garbled" one by consistently replacing each content word by a nonsense word (the same for all instances of the same original word), but leaving their syntactic endings and all function words, including personal pronouns, intact. Students were then asked to divide the text (which was now semantically practically empty) into paragraphs. There turned out to be a surprisingly high consensus among the students about the division into paragraphs. Moreover, of all the available variations (in theory, any sentence could be the start of a new paragraph) only a few were ever chosen, in other words, there was considerable agreement even in the number of variations possible. The conclusion of this litde experiment is that absence of understanding, in the sense 148
of being able to connect the textual statements to existing knowledge of the world, does not completely remove a sense of text coherence and the ability to recognize structures beyond and above the sentence boundaries. It should be possible, therefore, to construct a "contextual grammar" based on surface phenomena, rather than on the vaguer and more problematic semantic and pragmatic aspects of a text. If this is really possible, the result might be a text syntax. Seemingly on the other end of the spectrum (but, as we will see later, intimately related) is the analysis of a text in terms of rhetorical functions. Assuming that the main function of a text is to communicate a message as efficiently as possible, one could study the statements in a text to see how they contribute to the efficiency and effectiveness of a text. The study of rhetorical functions originates from the ancient Greek language philosophy, thus from the same source as what we today call linguistics. The main purpose of classical rhetoric was to analyse a speaker's message and intentions and provide rules to determine the most effective linguistic form in which to utter the message and realize his intentions. The emphasis was on "persuasion": trying to convince an audience of one's own point of view; which is illustrated by the predominant interest of classical rhetoric in legal and political oratory (a concise history of rhetoric is given by Dixon, 1971). Through the ages, the word rhetorical has acquired also rather negative connotations, and with the rise of modern linguistics, the study of rhetoric seemingly disappeared from the scene (though it held a place of its own in literary critique). Lately, however, renewed interest in the processes underlying human communication seem to have led to a renewed interest in matters related to rhetoric, of which the grammarians of Winter's group are an example. In the view of Michael Hoey (1983) and Michael Jordan (1984), the need to communicate a message efficiently and effectively dictates much of the form of the individual sentences. In particular, there are several regularly occurring steps in the communication process, and these steps are evident in the surface structure and lexical realization of the statements in the text By studying the signals on the surface lexical and syntactic - it is possible to reconstruct the functional structure of the text, which leads to a better understanding of the text in two ways: it can show the logic behind the sequence of statements (in other words: it can help explain why certain statements follow each other in that particular order); and it helps to capture in rules surface variations that, from a local (sentence level) perspective, appear to be random deviations from the default pattern. For our purposes, a rhetorical analysis, in as far as it helps to explain stylistic variation in the source language, can help to find more closely matching forms in the target language. It might be discussed whether rhetorical patterns are largely language independent (do people communicate along the same basic lines, whatever their language?), while their surface realisation (i.e. choice of words and structures) is 149
language dependent But in view of the current state of the art in machine translation, we are far removed from possible translation rules for language-specific rhetorical patterns. So, as a first approximation, we aim at preserving the patterns encountered in the source language. By analyzing a source language text in terms of rhetorical patterning - i.e., using rhetoric to explain the surface realisation of the statements in the text - and then mapping those rhetorical patterns onto the target language statements - i.e., using rhetoric to shape the surface realisation of the statements - it may be possible to translate not only the statements, but the "persuasive content", the means to convince the audience, as well. In his discussion of discourse structures, Hoey (1983: 3 Iff.) identifies a number of basic patterns and relations that can be found in a text and are summed up in figure 49.
Figure 49 Basic discourse patterns and relations INSTRUMENT CAUSE GENERAL SITUATION
ACHIEVEMENT CONSEQUENCE PARTICULAR PROBLEM
RESPONSE
EVALUATION
These basic patterns occur in many different combinations and on various levels in the text, ranging from a single sentence to a whole text. Very often within a global pattern local patterns are embedded, e.g. to present in detail some piece of information that is regarded as a single item one level up. Michael Jordan (1984: 17) concentrates on one of the patterns listed above, what he calls the "Metastructure 'Situation - Problem - Solution - Evaluation'" (Jordan's solution = Hoey's response). Jordan assumes that this metastructure provides the basic pattern for all discourse, and that texts are formed using any appropriate combination of those four elements. Let us turn once more to our sample text to see what can be found there in terms of the rhetorical patterns mentioned above. The two paragraphs we have used so far are only a fragment of the whole text, as a consequence of which we do not find the complete situation-problem-response-evaluation pattern. In fact, the two paragraphs seem to be primarily a description of the concepts turtle and Logo. Nevertheless, some of the basic relations in figure 49 do seem to be present.
150
To begin with sentence 1: 1. The new way of thinking about geometry has come to be known as "turtle geometry" It is basically a situation statement, but the word new implies a contrast with an "old" way of thinking. Such contrasts are typically used to indicate a switch from one aspect to another, and new here might have just that function, even though it occurs in the first sentence of the text. More technically speaking, contrasting words like new can function as indications of pragmatic nonidentity between semantically synonymous expressions (see 3.5.1.)- If. in another text, a new way of thinking contrasts with a previously mentioned way of thinking, the attribute new tells us that there is no referential identity link between the two, although the semantic content of the words would allow for such a link.
When looking for basic patterns, it is important to keep in mind that a writer may well decide to take one or more aspects of the pattern for granted, i.e., s/he may, for instance, decide to skip the situation part, because s/he assumes the situation to be common knowledge among her/his audience. The same holds for the other aspects of discourse patterning: parts assumed to be part of the common understanding of the audience may be left out, or only marginally indicated. The word new in sentence 1, for instance, implies there was (or is) an "old" way of thinking about geometry, different from the one that has come to be known as "turtle geometry". Even without knowing anything about geometry, many readers, when seeing the word new used thus, will assume that the "old" way was somehow insufficient, outdated or just old-fashioned and was therefore replaced by something new. In other words, the word new implies change. Change contrasts with the situation in two possible ways: either the situation contained some problem and the change is the response to that; or the change is the problem. When a problem is described, this is usually indicated by the choice of lexical items. As Jordan says, there are typical "problem" words: "not just the word problem itself, but its near-synonyms difficulty, dilemma, drawback, danger, snag, hazard, and so on" (Jordan 1984: 5). In other words, the choice of words can clearly indicate the function of the statement within the discourse pattern. Similarly, there are typical response words ("avoid, counteract, reduce, prevent or overcome ...", Jordan 1984: 5) and evaluation words ("excellent, important, quick, unique and failure", Jordan 1984: 5). The presence of each of these words, or a near-synonym, clearly indicates problem, response or evaluation. Since sentence 1 contains none of the words listed above, it denotes basically a situation. However, the change indicated by new implies that the current situation is the result of a response to an "old" situation, presumably because there was some problem. So the little word new can imply as much as a whole underlying, but unmentioned pattern.
151
One could ask whether the analysis proposed above is not somewhat farfetched, going on the scarce evidence provided by a single word. It may be, but we should not forget that readers do expect such underlying patterns to be present, and often need only minimal cueing to perceive them. The effect of sentence 1, for instance, is that the situation it describes will not be seen as containing a problem (which is a very common pattern: the situation we begin with contains some problem to which a response is required), but as the response (or the result of the response) to a problem. Most readers will expect to be told what the problem was somewhere later in the text. And this happens to be the case. In later paragraphs, the text introduces methods of drawing with a computer, or which the turtle approach is said to be one. The problem to which turtle geometry is the response is given in a single sentence (taken from a paragraph of the sample text on turtle geometry, which was not included in the previous analysis; emphasis added): One idea that can be awkward to formulate in a coordinate system but that comes forth clearly in turtle geometry is that of curvature. The word awkward unambiguously indicates problem and turtle geometry is contrasted with that. A pattern such as the one described above need not always stretch over sentences and/or paragraphs. Very often part of the basic pattern is also the basis for individual sentences. Consider the sentence: 3. Logo was conceived in the 1960's by Seymour Papert of MJ.T., primarily as a language for introducing children to computers. The key to this sentence is the instrument-achievement relation, given in as a language for introducing children to computers. Logo is the instrument, introducing children to computers is the achievement. The instrument-achievement relation runs parallel to the problem-response pattern: the problem is how to obtain the achievement, the instrument is the response to that. On the basis of the as an X for Y construction, which is a clear marker of the instrument-achievement relation, we can analyse the whole sentence as: PROBLEM —> introducing children to computers RESPONSE —> Logo was conceived in the 1960's by Seymour Papert of MJ.T. On a global level, the problem of introducing children to computers plays no significant role (though it is mentioned again later in the text). Locally, however, it is used to give a reason for the existence of the programming language Logo, which does play a significant role in the rest of the text.
152
Hoey recognizes two relations between text elements that are especially frequent in informative texts. Under the general name "General-Particular relations" he distinguishes "Generalisation-Example" and "Preview-Detail" relations, which he defines as, respectively: "[The Generalisation-Example relation] occurs whenever a passage can be projected into dialogue in such a way as to include the reader's broad request 'Give me an example or examples'" (Hoey 1983: 137) and "A test of the existence of a Preview-Detail relation is whether or not the passage can be projected into dialogue using the broad request 'Give me some details of x' or 'Tell me about x in greater detail'" (Hoey 1983: 138). The technique of "projecting" a passage of text into dialogue and analyzing it by finding out which (fictional) reader's question(s) the passage seems to answer seems to have been worked out by Winter (1982) and is used extensively by both Hoey and Jordan.
In informative texts, we will frequently find both relations, and not necessarily in the given (general-particular) order. The move from particular to general is also possible, though it will occur less often. In our sample text we find several examples of both types of relation. The second sentence, for instance, introduces the programming language Logo as the preview, immediately followed by the detail: which in turn has its roots in the Massachusetts Institute of Technology. More details about Logo are given in sentence 3, one of them being that Seymour Papert of MJ.T. was the one to conceive it. While sentence 4 gives more detail about Logo, there is also a reverse relation, from particular to generalisation, between many others and Seymour Papert. This demonstrates that such patterns can occur on various levels, and even, as is the case here, "cross" each other: what is particular on one level is general on another. A good example of the generalization-example relation is found between sentence 4 and 5. Many others is a generalisation, of which Harold Abelson and Andrea A. diSessa of MJ.T. are introduced as examples. The general-particular relations are often recognizeable because of the occurrence of lexical indicators, that typically function as signals of such relations. In the examples shown above, a typical indicator of the particular-general relation is the word others, which causes a single instance to be taken as "model" for the class of that instance (see also 2.5.). Exactly the opposite function is performed by among them, which phrase selects one or more instances from a given class. In an machine translation system, where the "projected dialogue" way of analysis is not available, it is precisely such lexical clues that aid the analysis. Moreover, it is important not only to analyse them correctly, but to translate them correctly as well, since they form indispensable "coherence links" between the sentences of a text.
153
2.6.2. Logical connectives To conclude our survey of text coherence patterns and devices, let us look briefly at a group of lexical items, the primary function of which is to link sentences and/or sentence fragments together. Jordan provides lists of examples of such words in English, which he divides into three separate groups. Some of the words in Jordan's lists are the following (this is not a literal quotation; we cite from Jordan's index, 1984: 145ff., only some of the words, leaving out also the explanatory text and the page references): Words of Coherence also, as well as, but, compare, as ... as, either ... or, however, in addition, nevertheless, not only ... but, only, such Signals of Logic accordingly, as a result, backed by, by ... ing, cause, dictated, effects, hence, in turn, lead to, logical, (by) means of, mean(s), reason, result(s), showed, so, stemming from, thereby, therefore, thus, yet Subordinators apart from, although, as, because, despite, due to, if, (in order) to, in spite of, provided, since, though, unless, whatever, whereas, whether, while (logic)
The lists Jordan gives are far from complete, but can serve to illustrate some of the principles underlying the usages of such words. More detailed study is necessary, but falls outside the scope of this book. To find examples in the text, let us first look for the occurrence of the words (or similar words) in Jordan's lists in our sample text. The first lexical item we find that is also in Jordan's list is in turn in sentence 2: 2. [...] which in turn has its roots in the Massachusetts Institute of Technology. The phrase in turn signals a logical sequence of statements. The combining logic behind the sequence in this example is that of "connectedness": "turtle geometry" ... is closely connected with ... Logo, which ... has its roots in ... [MJ.T.]. Figure 50 shows how these statements can be connected in a graph. To recognize the logic behind the use of in turn here, it is necessary to recognize the near-synonymity of connected and has its roots. In other words, in turn signals there is a logical link, but it is up to the receiver to establish the precise nature of that link.
154
Figure 50 From "turtle geometry" to MJ.T.: "turtle geometry"
connected with
» Logo
connected with
> MJ.T.
The next lexical item of interest is since in: 4. Many others have since contributed to its development [...] The word is not in Jordan's list, but is an unmistakable connective, clearly linking the current sentence to the previous one by signalling a sequence in time. Jordan (1984: 150) states that "overt indicators of time are often important signals of the structure of the text", and there are several examples of such indications in our text. The point about since is that it refers back to the "real time" (the time in the external world) mentioned (or implied) in the previous sentence. In this case this time is explicitly mentioned: in the 1960's, but in many cases time is only implied or indicated indirectly by mentioning some action or event as a marker of time, e.g.: He buried the money in the garden. Since then things have been going badly for him. The logical implication of since is that of an ordering in sequence, which can be important, especially in cases where the "real time" ordering of events is not the same as the order in which the events are mentioned in the text (see 3.4.). A word often used to indicate a generalisation-example relation, and which as such contributes to the coherence of a text, is the word such, which we find in sentence 7: 7. The first such creature was built by the British neurophysiologist W. Grey Walter in the late 1940's. Such usually means 'of the type just mentioned', and it is up to the receiver to find out what that previously mentioned type is. The reference of such is to the description immediately preceding sentence 7: 6. [...] a small wheeled vehicle whose movements could be controlled by instructions typed on a computer keyboard. Note that this description could be about a single instance (as seems to be indicated by the original turtle) or to a class or type. In the first case (an individual) it is up to the receiver to construct a class notion on the basis of the given instance, in the second case the class is readily available. The word such then implies an instance of that class. 155
A somewhat more difficult instance of the use of such is found in sentence 11: 11. Today such "floor turtles" are less common than "screen turtles" [...] The word such implies that "floor turtles" are instances of a class of turtles mentioned previously. The preceding sentences do indeed contain a description of a mechanical turtle, so one will assume that such refers to that, but a problematic word is floor in "floor turtles". Where does it come from? Nowhere in the preceding text is the word floor mentioned, or even implied. Most readers will be able to make a logical "jump", somewhere along the lines: "if it can wander over a sheet of paper, it can probably be used on the floor as well", though this is never made clear in the text. What we have here is an instance of "inconsiderate" backward reference, probably overlooked by the writer as a result of his intimate knowledge of the subject. It is a fact that the mechanical turtles of Grey Walter were used primarily on the floor (mainly because of their size) and as such it is logical to call them floor turtles in contrast to screen turtles, which can be seen on the computer screen. If one does not know how the original turtles were used, however, the indication floor in floor turtles is rather arbitrary. Not many readers will find the use of floor in floor turtles problematic. They will be able to "reconstruct" the reason for its being used (see above) or, not unlikely, ignore it altogether, since it is not absolutely necessary for the correct understanding of the text. For a translator, however, it is a problem. Still focusing on a particular target language, we note that the English word floor can have at least two meanings: the surface one walks on, and a level in a building. If a referent for floor can not be established, it may not be possible to choose between the two. In translation, the choice to be made is steered by the number of translation alternatives given for a word like floor in the bilingual dictionary. A translation system, lacking the power of humans to broadly generalize over given information and, for instance, choose an alternative that (without being able to assert its correctness) at least seems to give a minimum of semantic clashes, may find it impossible to choose at all. If this is so, in the DLT system, that is the moment to initiate a disambiguation dialogue, to ask the writer of the text (the one most likely to know the intended meaning) what was meant.
2.6.3. Conclusion There are all kinds of patterns in texts, both within and beyond the sentence. Perceiving them helps to see the text as a single unit rather than a series of statements. As for translation, though it may be possible to produce an essentially correct translation based on local decisions only, an awareness of the larger patterns and the overall flow of the argument in the text greatly helps to produce in the translation the same sense of unity the original has. 156
Recognizing global patterns is often difficult, requiring considerable power of inference and large amounts of knowledge. There are, on the other hand, numerous small indicators that can help find the patterns, if not label them. Especially lexical items such as the ones listed in this chapter are of great help, the more so, since they often have direct equivalents in other languages, which can help the translator preserve the source language patterns in her/his target language text.
157
2.7. Towards the translation of text coherence
In this chapter we tried to apply various analysis mechanisms to text coherence, not yet trying to arrange them in an overall system and not yet even deciding which ones to incorporate into such a system. The devices we reviewed and applied to our sample text contribute to its coherence in quite diverse ways. With the insights won in this experimental chapter, we shall therefore in the following chapter view text coherence from the standpoint of translation grammar. This analysis gives rise to a more systematic account of coherence devices and yields a model of three distinct types of coherence relations. The following chapter will also focus more explicitly on the question how to translate text coherence in a machine translation system.
158
Chapter 3
Text coherence within translation grammar
Chapter 2 shows a series of different indications of text coherence and describes various methods for recognizing and rendering them. These indications have to do with that specific interrelatedness which makes up the difference between a coherent text and a bunch of unrelated sentences. Most of the regularities belong to the realm of text grammar. Since the majority of them are formulated in an explorative and to a certain degree tentative way, their links to the more established, sentence level, parts of grammar are not obvious in any trivial way. Indeed, it is difficult to draw a sharp line between sentence grammar and text grammar, although this borderline may be found desirable, especially with the modularity in mind that is required for a computational implementation. But since such a line is not readily at hand, the role of translation rules for text coherence within the larger framework of text grammar and of the entire grammatical and extragrammatical process of translating should be examined. This is done in the present chapter. The grammatical process of translating is here viewed as a whole, and those subprocesses that contribute to text coherence are assigned a role in the overall account. It is hoped that this way of looking at the problems discussed here will yield a good basis for transposing text coherence from one language to another in a consistent and integrated translation grammar. In section 3.1., the question of translation-relevant language elements is addressed with the aim of linking translation grammar at the text level to the lower levels. The text coherence devices discussed in chapter 2 are summed up in sections 3.2. to 3.4. under three different headings: coherence of entities, coherence of focus and coherence of events. In section 3.5., the impact of pragmatic knowledge on translation is treated. In section 3.6., we look ahead to ways of implementing the insights won in this study.
159
3.1. On the translatability of language units
Before dealing with translation grammar, and with text coherence in particular, a few definitional words about grammar in general may be in place (see also 1.4.). A grammar describes the intrinsic system of a language. Like any other system, a language system can be accounted for in terms of its elements and the relations among them. As for a grammar, the elements of the system are the linguistic signs. Grammar comprises rules about both the form (syntax) and the content (semantics) of the linguistic signs, and about their relations. The smallest linguistic signs are morphemes. What translation grammar is about is thus transforming morphemes and their relations from one language into another (Schubert 1987b: 130). Morphemes are by definition the smallest linguistic signs, but they are not the only ones. Higher-level signs can be composed from interrelated morphemes, such as words, syntagmata, clauses, sentences and texts. Which of these levels is the one on which to translate? The answer seems fuzzy at first sight: the link from one language to another cannot be established on a single level. The question can accordingly be reformulated as follows: Which parts of the overall translation process should be done at which level, and in particular, which ones have to do with text coherence. It is obvious to everyone who is at least superficially acquainted with the experience gathered in this field so far that the answer to this question can only be found in a rather complicated interplay of rules and units on various levels. It sharpens the view on this intricate situation, if one addresses the problem by asking why things are so complicated. Having established the morpheme as the basic linguistic sign, why not translate morpheme by morpheme? The answer is very well known: There are no one-to-one correspondences either between the morphemes of different languages, or between the relations among them. It is therefore necessary to take into account more context than just a single morpheme in order to choose the right target language morphemes and to arrange them in the right relations. The ultimate goal, the translated text, is an arrangement of appropriately related target language morphemes that together represent the same content as the source text. But even if a wider context has a bearing on the translation decisions, couldn't the morpheme nevertheless be the unit of translation? In other words, if morphemes cannot be translated in a context-free way, why not try to translate them context-sensitively,
160
but still one by one? Unfortunately this is impossible, because morpheme-bymorpheme translation, even if context-sensitive, would assume that all higher-level structures - words, syntagmata, clauses, sentences and texts - are made up of morphemes combined in regular ways only. This is not so in language. It would require texts to be totally decomposable down to the morpheme level. Not only formal decomposability is needed (which in the most obvious way could exist in an ideal agglutinative language), but also full decomposability on the content side. A complex word, a syntagma etc., and ultimately a text would have to mean nothing but the sum of the morphemes in it. The "adding-up" of morphemes could obey very complicated rules, but it could not escape the need of being totally regular. Such extreme decomposability is not found in human languages, and one might indeed suggest that it is impossible. It would imply that the possible semantic relations between morphemes in a language be a limited list, which is in clear contradiction to the infinite expressive capacity of human language (Schubert forthc. c). As a consequence there are morpheme combinations whose content cannot be reduced to a regular function of the content of their elements. This seemingly very abstract diagnosis is reflected quite concretely in current dictionaries and grammars. Dictionaries normally have entries for words, but they never contain all the words of a language. This is partly because languages constantly accept new words, but the main reason is that there is so much redundancy in words. In most languages, sets of words which are partially or completely different are defined as being forms of a common stem or root, and only one of them is entered into the dictionary (e.g. mother's, mothers, mothers' —» mother, am, are, is, was, were, been, being —» be). Which one is taken as the basic form is often traditionally determined; usually nominatives, singulars, infinitives etc. are chosen. This is morphological redundancy. The appropriate forms of a root can be derived by means of grammatical rules, thus by regular morpheme combination (agglutination, inflection etc.). In the case of a bilingual dictionary derived forms can also be translated by applying rules to the basic forms given in the entry for both languages. Another form of redundancy in dictionaries is due to productive word formation. When morphemes or words can be combined to form new, complex words, this is normally a productive potential which makes it impossible to list them all. Dictionaries therefore tend to take up only those compounds that cannot be derived regularly (and, normally, a selection of frequent forms, although derivable), and leave the rest to the grammar. The appropriate grammatical rules work on the words and morphemes found in the dictionary and analyse or synthesise new compounds. This is possible, because dictionaries contain not only words used in isolation, but also words that function as morphemes in complex words. Often they also contain morphemes that cannot function as selfsufficient words, e.g. frequent affixes like Engl, grand-, pre-, hyper- etc. While dictionaries thus often contain not only words, but also mere morphemes, they usually take up higher-level units as well, such as syntagmata and even entire clauses or sentences. This is necessary when the meaning of the unit in question cannot be inferred from the meaning of its words and morphemes. English examples for syntagmata are fall in, as soon as, and a large number of noun pairs and clusters where the exact semantic relation between the elements cannot be read from any rules, 161
e.g. cabin air pressure. Even sentences may occur in a dictionary, such as Here you are! A sufficiently exhaustive, but still workable grammar, and in particular a translation grammar, cannot be written for the units and rules of a single level of language structure alone, but it has to account for an interaction of different levels. After the discussion in the above paragraphs, this can now be said more precisely. If lower-level elements such as words and sentences are not translatable in a context-free way, this may be taken to imply that these elements are not really independent linguistic signs. The linguistic sign is - if this argument is continued - nothing smaller than an entire text (cf. Hartmann 1971: 10). One might try to imagine a text-by-text translation method: A big dictionary of texts and no grammar at all. A given source language text is either contained literally in the dictionary - or untranslatable. This mental experiment shows that with a extremely large dictionary, no grammar would be needed for translation. Since this method obviously does not work, a grammar has to be introduced in the form of redundancy rules over the hypothetical text dictionary. The dictionary entries are modified accordingly, by decomposing complex entries into smaller units. This process is sufficient to the extent that the complex entries can be derived from their elements by means of the grammar. The lower the level on which the remaining entries stand, the more complex the grammar becomes. In order to limit the size of both the grammar and the dictionary, one normally chooses the word level as the basis of the dictionary, but allows for units of lower (moiphemes) and higher levels (syntagmata, clauses ...) to occur among the entries when needed. To sum up, the text is in principle indeed the linguistic sign that has to be translated, but everything that is redundant on a high level is preferably expressed in redundancy rules over units of a lower level, which are normally smaller and fewer. This insight about levels and redundancy perhaps does not appear too surprising, since it is tacitly applied by many. But having made it explicit, it is possible to reword the question asked at the beginning of this section: The question is now no longer which is the level of language structure at which one can translate, but rather, which morphemes and which relations can be translated at which level. What is translated are essentially morphemes and their relations. A model of translation grammar for the syntactic relations among words in sentences, called metataxis, has been described earlier (Schubert 1987b). Therefore the focus should now be mainly on the morphemes. Which ones can be translated at which level? It is common to divide words into content words and function words, although different authors' opinions vary as to the precise delimitation of the two sets. In a similar way, morphemes can be divided into content morphemes and function morphemes. Roughly speaking, content morphemes are the stems of verbs, adjectives, nouns and other words, whereas function morphemes are those that denote the features of words, such as tense, aspect, mood, voice, person, number, case, gender, etc. 162
Grammatical features play an important role in translation. Function morphemes should be considered in connection with grammatical features more generally, since not all features are expressed by morphemes directly. This may need an illustration. Take the features number and gender that by virtually all grammarians are used in explaining the properties of nouns in languages like German, Dutch, French and many others. In these languages the plural of nouns is normally expressed by adding one or other morpheme to the singular stem, whereas the singular is unmarked. So, if a noun's number feature carries the value "plural", this can in general be read from a morpheme in the noun in question. The gender of nouns, on the contrary, has no formal expression of its own. It can be read from the choice of form alternatives of other function morphemes in the noun itself (case and number endings etc.) or in syntactically related words (case, number, definiteness of attributive adjectives, articles etc.). Gender can in these languages be said to be a feature that has no morphemes of its own, but is expressed in form choices in other function morphemes. As in many other occasions in grammatical system design, this description is a deliberate choice of the grammarian. Alternatively one could say that every feature has its morpheme and some morphemes represent several features simultaneously. In such a model, all the case, number and definiteness markers could be considered portmanteau morphemes, since they represent gender as well. In this particular case, it is obviously much easier to say that gender is a property of nouns, even if they do not have a special morpheme for it in all their forms. Whatever view on morphemes and features is chosen, it is important to take into account all the features of words that may be translationrelevant. Reviewing the features and function morphemes of a language one should first distinguish those that are linguistic signs from those that are not. The question is whether or not a certain unit (feature or morpheme) carries content. If so, it is a sign, otherwise it is not As an example, the number and gender of nouns can again be examined. The number feature (in languages like the three mentioned) is clearly a linguistic sign. It carries content. Roughly speaking, singular refers to a single instance denoted by the stem and plural to several ones, although there are a series of subtle augmentations and exceptions to this definition, such as for generic and non-actual use, pluralia tantum etc. These refinements notwithstanding, the number feature of nouns can normally be translated quite straightforwardly, provided that the target language has the same distinction as the source language, which often is the case. The gender feature on the other hand is not a sign. It does not cany meaning. A strong indication of this is the fact that nouns have no gender paradigm. A single noun cannot be inflected for gender, it just belongs to a certain gender, and no meaning-bearing gender alternation is possible in any grammaticalised way. However, the existence of a paradigm for a certain feature does not seem to be a necessary, but only a sufficient condition for the status of a linguistic sign. The gender of nouns is not a sign, since there is no gender paradigm, but what about the gender of adjectives? In German, for instance, adjectives are inflected for gender, so there is a gender paradigm for adjectives (masculine: großer [Baum], feminine: große [Straße], neuter: großes [Haus]). But nevertheless gender is not a linguistic sign in adjectives either, since it does not carry content. The gender feature of an attributive German adjective is by 163
means of form determination redundantly triggered by the governing noun. Adjective gender could only be a sign if it had a non-redundant paradigm, independent from the noun (which it has in some languages due to pragmatic implicature, e.g. for expressing degrees of politeness). In conclusion, a feature is a linguistic sign, if its value in a given word is subject to a free semantic or pragmatic choice of the speaker or writer, that is, if it only depends on what the speaker wants to say and not on any grammatical rules triggered by choices made for other words. When a feature is not a linguistic sign, is it superfluous? If this were so, such features would not be likely to have survived millennia of language development. But can we nevertheless conclude that such features are superfluous for machine translation? The answer is, not surprisingly, no, but the question nevertheless gives rise to a useful distinction. Machine translation is a complicated process which consists of many subprocesses. Indeed non-sign features are not relevant to the translation step proper (i.e. when actually source language forms and structures are replaced by target language ones), but they certainly do play an important role in other parts of the overall machine translation process. Before translation proper begins, a text to be translated is parsed or otherwise analysed, and for the parsing process all information about formal government and agreement in any features is very helpful. A feature like adjective gender can thus serve as an identifier of a syntactic dependency relation. Given the rule (still taking German as an example) that a noun and a dependent attributive adjective have to agree in gender (and in other features), candidatedependents of nouns can be sorted as to their gender. Possible dependents can be identified in this way, others discarded. Once this identification has taken place and has been rendered in one or other explicit form, e.g. in a dependency tree structure of the sentence in question, the feature may have fulfilled its role in machine translation. When all the features in question have been used in this way, it is often appropriate to remove them, converting the word into its basic form, which can be found in a dictionary and translated. It is in this sense that features can be classed as translation-relevant or -irrelevant (Schubert 1987b: 153). Translation-irrelevant features do not carry meaning themselves, but they may well serve to make distinctions of meaning. German noun gender, for example, in some occasions happens to distinguish homonyms. Compare See [masculine] 'lake', See [feminine] 'sea'. This example shows that it is advisable to be careful about the stage in the translation process at which a feature really can be removed. If noun gender were removed after the monolingual process of parsing, it would no longer be available to distinguish between the two See entries in the bilingual dictionary during translation proper. Since in the present study the focus is on text grammar, the lower levels need not be discussed in too much detail. Nevertheless a comprehensive view on grammar, which is inevitable for practical implementations, makes it necessary to have these levels in mind as well, when working at the text level. 164
A few lines, however, may illustrate how the details of the complex translation process interact in the present morphemes-and-relations approach. A feature which is often translatable at the word level is the number of nouns, pronouns etc. In most cases, it is enough to know that a certain noun is a plural noun in order to be sure that also its translation should be given a plural form. No context above word level is needed. There are various exceptions to this general picture: In Russian, for instance, the situation is normally as described here, but when a single noun has several coordinated attributive singular adjectives or pronouns, these exercise a form-determining influence on the noun, rather than the other way round (staraja [sg.] i novaja [sg.] masiny [pi.] lit. '[an] old and [a] new cars', i.e. 'an old and a new car'; Schubert 1987b: 154). There are pluralia tantum, like Engl, scissors, which syntactically behaves like a plural noun, but is (semantically) translated as a singular. These have to be catered for in the bilingual dictionary. The number feature may in some languages be distributed in a complex construction seemingly in a violation of agreement rules due to a pragmatic (polite) meaning or connotation. What does this fuzzy situation imply for an approach to translation making use of levels? It shows that there are exceptions to the word level-translatability of number whenever there are several independent non-redundant number features in the same syntagma or construction. These phenomena are no longer exceptions, if one is careful to state that the number of a word can be translated whenever it is a non-redundant linguistic sign. Where several signs interact in the determination of a single feature, the translation step has to be taken at a level sufficiendy high to cover all these influences. This idea sounds quite abstract, but it can indeed be taken as a guideline for either resolving a given translation problem at the present level or transferring it to a higher level. At the text level, of course, everything must be resolved, since it is the highest level. This makes it impossible to streamline a text grammar in the same way as lower-level grammars. This is one reason text grammar is so complex. Case assignment by a preposition (or postposition) to its argument (e.g. the dative in German mit dem Angestellten 'with the official') is redundant form determination. The case feature in this example is not a linguistic sign and is unnecessary for translation purposes. But as soon as there is a free semantic choice among several possible cases in a preposition argument, this is a feature that can be translated at the syntagma level: German in einem Haus [dative] 'in a house' vs. in ein Haus [accusative] 'into a house'. The clause level is not so different from the syntagma level, since a clause in dependency syntax is nothing but a special syntagma, namely a syntagma with a finite verb as its internal governor (Schubert 1987b: 104ff.). If in the source language a subject is not obligatory, but the finite verb carries a person feature (as in Italian, Spanish, Russian etc.), this feature is normally translatable at the clause level.
165
Before entering the text level for good, the suggestions of this section can be summed up: -
Translation grammar is concerned with meaning-bearing elements (morphemes) and their mutual relations.
-
Those grammatical features of words etc. are translation-relevant that are linguistic signs. A feature is a linguistic sign if it in the current word forms a paradigm and if the particular instantiation in the given occasion is nonredundant and due to a free semantic choice of the speaker/writer. Features that are not signs may be indications of translatable signs at other levels, and they may distinguish meaning (but do not bear it).
-
It is desirable to build up a rule system for translation in such a way that as much as possible is done by the grammar and as little as possible remains to be explicitly entered in a bilingual dictionary. For this purpose, all phenomena that arc redundant at a given level (such as the word, syntagma, clause, sentence or text level) should be formulated in grammatical redundancy rules making use of elements at a lower level.
166
3.2. Coherence of entities
Chapter 2 shows phenomena that cannot be handled other than at the text level. The problems they bring about for machine translation may be roughly summarised as word choice and word order. In addition, there are a number of grammatical features which are mainly beyond the scope of this book, such as tense, aspect, mood etc. This section and the following two take up the findings of the previous chapters in an attempt to place them in the framework of the morphemes-and-relations approach outlined in 3.1. In this way translation grammar is made to comprise means for rendering text coherence. In this section the problem, roughly characterised as word choice, is addressed from two points of view: lexeme choice proper (3.2.1.) and deixis and reference (3.2.2.).
3.2.1. Lexeme choice The present subsection deals with word choice in translation. This plain statement does not really signal the full weight of the problem lurking in the term "word choice". Indeed word choice is the main semantic problem of translation. It looks quite simple: If for a given source language expression your bilingual dictionary entry suggests not only one, but several alternative target language translations, which one should you choose? How does word choice qualify as a matter of translating morphemes and relations? In the light of what was said in 3.1., the problem should be delimited more precisely: In languages where words can consist of several morphemes, the question at issue is not really the choice of entire words, but of their content morphemes. The function morphemes are in the majority of occasions catered for by other, sometimes lowerlevel, regularities. Stripped of its function morphemes, a word may still contain more than one content morpheme. Thus, what this section really is about is lexeme choice. Lexeme choice fits in with the ideas about translation grammar outlined in 3.1. It is obvious that the main content morphemes of words in a text are subject to the writer's free semantic choices. They are undoubtedly linguistic signs. In the translation context, where the source text is, as it were, given and unchangeable, even the idea of a paradigm of morphemes turns up, as was discussed in 3.1. in connection with the question of what is translatable at which level: Indeed, the target language alternatives 167
from the bilingual dictionary entry form a paradigm. Syntactically, they all fit in the slot opened by the source language expression to be translated. (The term "slot" should not be taken too literally, since the requirements of translation syntax may perform all kinds of rearrangements before or while words are translated, so that in the structure of the target text, there will not be exactly one "slot" for each word or syntagma of the source language; cf. Schubert 1987b: 136.)
There are many approaches to the lexeme choice problem in machine translation. For the DLT system, a solution has been implemented in the form of a semantic word expert system on a lexical knowledge bank. It is described in great detail by Papegaaij (1986: 75ff.). The 1986 version of that system uses a sentence level context to choose among translation alternatives for words, but also among semantically significant syntactic alternatives (e.g. preposition dependencies). The present study is part of an attempt to expand that system to text level. This is not the place to plunge into system details of DLT's particular solution, but it may be useful to study what basic assumptions and premisses this and similar approaches rely on. The next step should then be establishing a link to text grammar, especially with text coherence in mind. The fundamental, and often tacit, premiss of context-reliant approaches to lexeme choice in translation is the assumption that the source text conveys the writer's message to the readers in an appropriate and purposeful way. It does so by means of well-chosen and properly interrelated words. Especially in the case of written texts for the time being the prevalent domain of machine translation - the instruments for conveying a message are words and their explicit or implicit grammatical relations. Communication is ultimately made possible by writer and readers sharing a certain amount of knowledge. This so-called world knowledge is also addressed in 3.5. The present discussion should focus on the concept of "context". The assumption that the text to be translated consists of well-chosen words implies that a given word fits in well with its context. Context is, at least in a first approach, understood as co-text (see 1.1.), that is, the way in which the word fits in semantically with the surrounding text, with the meaning of the neighbouring words. For each word where the bilingual dictionary requires a lexeme choice, this can be done by selecting the alternative that semantically is best in concord with the given context. Such a procedure for lexeme choice presupposes a method of measuring semantic proximity in order to establish to what extent different words harmonize with the context. We cannot discuss here how this task is carried out, but refer to Papegaaij (1986: 75ff.). Obviously it is in a context-reliant procedure easier to fit in the last bit of the puzzle than to begin. These questions have also been dealt with earlier (cf. the "island-driven method", Papegaaij 1986: 187). A problem that should be mentioned here, however, is the question of how to represent context in a lexeme choice procedure. Should it be rendered simply in its source language form, must it already be translated into the target language or is there a third solution? Many system developers rely on a third solution, namely on some kind of formal or artificial meaning representation. There is an abundance of literature and
168
ongoing discussion about meaning representation for various types of natural-language processing, and a critical review of these approaches and viewpoints is far beyond the scope of the present study (cf. Schubert forthc. c). Nor shall the DLT solution be described in any detail here either. However, it may be mentioned, that in the DLT system both the first and the second solution are used. This is so because of the fact that DLT has an intermediate language (a slightly modified version of Esperanto). Accordingly, a text is translated twice, for instance from English into Esperanto and from Esperanto into French. These two processes are kept clearly apart, also in time and place (hence the name "Distributed Language Translation"). The main strategy of DLT system design has been to concentrate in the Esperanto kernel of the system as many parts of the overall translation procedure as is reasonably possible, in particular those that are not dependent on a certain source or target language and can thus be used for arbitrary new language modules to be added to the multilingual system. The whole semantic-pragmatic process of lexeme choice has been found to belong to this category and is thus carried out entirely in the Esperanto kernel. The meaning representation in this process is nothing but Esperanto morphemes purposefully arranged in labelled tree structures. So, in the first half of the DLT translation procedure the meaning representation is the target language (of that half!), Esperanto. In the second half, the meaning representation is the source language of that half, which is, again, Esperanto. DLT's semantic-pragmatic lexeme choice process works on semantic tree structures. They are made up of Esperanto content words on the nodes and Esperanto function words or morphemes as labels for semantic relations among them. These trees are built up by a procedure that looks for semantic relations at first hand along the lines of syntactic dependencies. As opposed to the syntactic tree structures, the semantic "trees" need not be true trees in the mathematical sense, but may be more general directed graphs. That is, cross-links between nodes are allowed in the semantic "trees"; in other words, a single node may have more than one governor. What is important for the present discussion about text coherence, and more generally, about a link from sentence level grammar to text grammar, is the fact that at sentence level semantic relations are detected in close connection with the sentence's syntactic structure. Although the web of semantic relations is obviously not a strict copy of the syntactic structure, it is nevertheless established along very well defined lines and rests on a firm foundation. One of the phenomena that make text grammar so difficult is the virtually complete absence of a text-syntactic structure. On the form side of grammar, which in DLT is syntax (see 1.4.), a number of regular relationships at the text level may be established, but the structures received are certainly not sufficient to bind together a whole text. As a consequence, text-syntactic structures cannot serve as a basis for establishing the text-semantic links needed for describing a text level context. What one can do in such a situation is try to establish the required relations on purely semantic grounds. Due to an inherent property of semantics, however, this implies that the result will - much more so than in sentence semantics - be based on probability, rather than on certainty. Out of a large number of possibilities, syntactic rules may discard many and approve others, whereas semantic rules can only rank them all as
169
more or less likely. The syntactic instruments are often more powerful, but unfortunately they appear not to be available for many of the text level problems.
3.2.2. Deixis and reference What kinds of links at text level are being sought? Within the realm of the present study, the answer is: That sort of links are wanted which help to convey coherence from a source language text into a target text being generated in machine translation. One kind of such links have to do with the fact that in a text the same extralinguistic entity (item, quality, event ...) is often mentioned not only once, but a number of times. These referential coincidences form an important element of the text's message, and rendering them appropriately is a substantial contribution to the establishment of text coherence in translation. We call this portion of the overall desired result coherence of entities. As the discussion in 2.2. and 2.3. shows, at least two classes of such links can be distinguished: deixis and reference. Deixis and reference have to do with the relation of words (syntagmata, clauses, sentences ...) to extralinguistic entities, so that regularities about them mainly fall under the realm of pragmatics. The difference between the two, however, is of a semantic nature: Both phenomena describe, roughly speaking, situations in which a given word (or another unit) denotes an entity denoted also by other words in the same text. In the case of reference, each of the words in question has the entity within the scope of its semantic meaning. In the case of deixis, however, the extralinguistic entity is not in the meaning of the word, but the word is more or less void of any specific meaning of its own. Rather than a meaning it has a function, namely the function of pointing to something else, which can be another (non-deictic) word in the text or something assumed to be known. This is an attempt to word in a general way what is better known from concrete examples: Pronouns like they or it, but also other words, do not denote any concrete entities by themselves, but can in a text (or utterance) indicate something denoted by a noun or the like. Adverbs like here, there or then have similar functions. The distinction between deictic and non-deictic words is not sharp. The term pro-noun suggests that pronouns should replace (or point to) nouns. Maybe there are "pro-verbs" (do), "pro-adverbs" (so) and "pro-adjectives" (such)? And isn't thing a "pro-noun" as well? Although there may be hesitations about specific instances, the distinction of deictic and non-deictic words is generally expedient to semantic context handling. Among other reasons, practical motives support it. Text grammar is, despite all the available literature, still a new departure for machine translation, and it seems safe to imagine that the first solutions attempted will not be perfect. So, what happens if co-referent words are not identified or linked? Then lexeme choice will still work, albeit at the sentence level, for non-deictic words, since they have a meaning of their own and can therefore be measured for their semantic proximity to immediately surrounding words. 170
Deictic words, however, are virtually meaningless or have a very vague and general meaning, so that they can hardly be handled in any reasonable way unless their deictic link is traced. A failure to perform lexeme choice when deictic words are involved has a twofold negative influence on the semantic quality of the target text. On the one hand a lexeme choice may be necessary for the word itself (Engl, it —> French il or elle, or possibly cela, ceci ...?). But on the other hand, even if the deictic word itself happens to be straightforwardly translatable, such a word does not only have a context, but it is of course at the same time part of the context for other words. Uncertainty as to the semantic content pointed to by a pronoun may thus spread to other, syntactically directly or indirectly related, words in the sentence and make lexeme choice for them a good deal less reliable than it would be with access to the deixis information. If, for instance, it is the subject of a verb and there is a lexeme choice in the bilingual dictionary, determined by different semantic types of subject, then a failure to trace the deictic link of it results in a substantial lack of context information needed for translating the verb. The verb will be translated on the basis of insufficient information, and, in the worst case, at random. The missing piece of information is in this case exactly the bit that makes up text coherence. A failure to trace referential links results at first sight in less obvious mistakes than in the case of deixis. A possibly acquired lexeme choice may be based on a sentence level context. However, the translation achieved in this way can only be as good as its context: The word in question will be translated with the lexeme alternative that fits in best with the other words of the sentence. The larger context of the text, however, in many occasions would require another choice. In a manual, the sentence Then mail the file to the secretary may occur. If we take into account only this sentence and do not make any referential links to the previous text, the most likely translation might take file to denote documents in a folder, which in Dutch, for example, becomes dossier or map. This is not the right translation in all contexts. A file may also be a data set in a computer, denoted by Dutch file. If file in the sentence could be linked to earlier sentences in which it obviously occurs in a computer-related context, the information needed for triggering the appropriate lexeme choice would be available. The establishment of deictic and referential links is one of the means for extending the contextual scope of lexeme choice to text level. What is attained in this way may be described in two key words: Lexeme choice is based on all the accessible context information and in this way text coherence is enhanced. Another mental experiment may help to define more thoroughly what is meant by coherence. Couldn't it be possible that a short text about files remains ambiguous throughout as to the choice between a computer setting and a traditional office setting? If the word file has two different translations, say map and file, and occurs five times in that text (and if no other ambiguities occur, which is rather unlikely), then 2x2x2x2x2 = 32 different translations of the text are possible. But only two of the 32 alternatives are coherent in the sense that the word in question in all its occurrences in the text is taken to be part of the same setting. How can one be sure that Engl, file 171
does not refer to a data set in three instances and to a document folder in the other two? If the words have been detected as co-referent, this is impossible. Rules for text coherence thus play an essential role in the selection of reasonable target texts out of a series of translation alternatives. In so doing, they contribute substantially to the efforts to avoid combinatorial explosion in machine translation.
172
3.3. Coherence of focus
In one way or other, a (machine) translation process always comprises the analysis of a source text and the synthesis of a corresponding target text. In 3.1., target language synthesis is described as generating morphemes and their relations. In the intermediate stages of a machine translation process the relations among the morphemes may be made explicit by means of syntactic labels, semantic relators and the like, but in the final target text such non-linguistic symbols are of course out of place. A normal written text contains besides words only a single type of relation: the sequential order of words. The grammatical relations originally given explicitly must therefore at an appropriate moment be transformed into either morphemes (function words, inflectional endings etc.) or a specific ordering relation of words. The final product is nothing but ordered words. The overall translation process consists of a number of consecutive and interleaved subprocesses. In one of them content morphemes (lexemes) are chosen from a bilingual dictionary. The previous section describes how lexeme choice is supported by an attempt to find text level links which could provide for coherence of entities. When the required lexeme choices have been made, grammatical rules add possible function morphemes to the lexemes, producing complete words as the result. Syntactic rules at the syntagma, clause and sentence level will assign the words syntactic functions. Some of these functions may require more function morphemes as markers, others mark syntactic functions by arranging the words in & specific sequential order. In this way syntagmata and sentences come about, and ultimately a text is produced. For most target languages, the subprocesses of translation may actually interact approximately in the way sketched here. However, the interaction of these rule systems is not sufficient for generating exactly and unequivocally one target text. In most cases, they will allow for several alternative solutions. The reason for this is that a number of decisions on the basis of these rules still cannot be made except as a process of random selection. Which are the unresolved options, are random decisions appropriate here, are more rules and more knowledge needed, and where can they be found? If we assume that, wherever necessary, lexeme choices have been successfully performed and syntactic-function assignment has been accomplished, two types of decision still remain to be taken. These are decisions about more grammatical features and about word order. Indeed, features and ordering have already been dealt with in 173
the process so far, but it is perfectly possible that the subprocesses mentioned do not assign all the necessary features, nor decide all the ordering options. The previous section deals with lexeme choice, this section takes up word order and grammatical features that are related to ordering, and the following section deals with a set of essential features at the text level. Thus, 3.2. is on content morphemes, this section on word order (and some function morphemes) and 3.4. on function morphemes. Together these three sections describe the text level functions in translation in a morphemes-and-relations approach.
3.3.1. How "free" is word order? Why should word order be dealt with at the text level? Isn't it catered for by relatively low-level syntactic rules? Indeed word order does not have only a single function in language structure, but rather fulfils a variety of functions at different levels. And, as it is perhaps still more important to realise when dealing with translation, word order plays different roles in the systems of different languages. This fact is often referred to by saying that there are languages with bound and others with free word order, which is a rather sketchy description of a much more complicated situation. It would be more appropriate to say that languages may be arranged on a scale between the two poles fixed word order and free word order. But for an investigation of translation grammar even the idea of a scale of freedom is insufficient, since a monodimensional scale suggests that word order has only a single function in grammar. For the purpose of this study it is more expedient to ask which are the functions of word order in which language and at which level they are effective. What is meant by a labelling like "bound word order" is normally that word order is used for marking syntactic functions at a low level. This is at first hand the clause level, but the syntagma and the sentence level may be involved as well. Generally speaking, syntactic functions may be marked by either morphemes (inflection markers, affixes, prepositions etc.) or word order or both. English is an example for a language in which syntactic functions at relatively low levels quite frequently are marked by word order, or sometimes by word order plus some function morphemes. At the syntagma level there are rules like: The article is left of the noun (a book). The preposition argument is right of the preposition (about a book). At the clause level English has word order rules like: The subjectverb is left of the finite verb (they notorious exceptions). The finite is left of the infinite verbs walk; (wouldwith have been coming).
174
Whenever ordering rules like these are not subject to the writer's free semantic choice, the corresponding relationships are not linguistic signs, but mere indications of syntactic functions. In this case the ordering relation itself cannot be translated, but marks a function which in turn may be relevant to translation proper (see 3.1.). These rules are specific for English; other languages have different rules. Among the languages of the world the weight of word order in the clause level syntax of English is "very exotic", as Igor Mel'cuk (1988: 4) puts it. The so-called free word order languages have more ordering alternatives to choose among especially for the verb and its dependents, that is, they have freer rules at the clause level. Indeed the labellings "fixed" or "free" most often refer to the clause level rules, especially to the position of the finite verb with respect to its main dependents such as subject and object. Language typologists base an entire classification system of languages on the ordering rules for these three elements (cf. VSO, SVO, OSV languages etc.). At the clause level, ordering rules normally do not describe the position of single words, but of syntagmata, so that the phenomenon often might more properly be labelled syntagma order. At the sentence level there are rules for clause order. One might suppose that text grammar is looking for rules on sentence order at the text level, but of course this is normally not so. It is usually taken for granted that sentence order remains more or less unchanged in translation. The phenomenon that involves text grammar in the study of ordering rules may be found by asking what it means to say that syntagma order is "free" in a language like, say, Russian or Finnish. Does it mean that there are two or five or ten different Russian translations of an English sentence, each one as good as the others? It has been argued earlier (Schubert 1987b: 180) that the word order of the languages in question is not "free" in that sense of a random interchangeability of sentence alternatives with differently ordered syntagmata. Rather, a word order which is "free" at the clause level is free to fulfil other functions. The text-grammatical approach to word order is concerned with these higher-level functions. The following subsections show that the rules sought may well make ordering decisions at levels below the text level, i.e. at the clause or the sentence level, but that they nevertheless belong to the realm of text grammar. This is so, because the phenomena that trigger these ordering rules are normally found beyond the sentence boundary. That is, word order may be "free" within a clause or sentence, but there are text phenomena that lead to choices among the allegedly free alternatives. These phenomena have to do with text coherence in a way similar to lexeme choice: from the offered ordering alternatives that one should be identified which fits in best with the context established by the surrounding text and with relevant extratextual background knowledge.
175
3.3.2. Communicative syntagma ordering Section 2.4. describes methods for tracing the communicative function of certain parts of clauses by introducing a theme-rheme distinction. In a first approach, this may be said to be a binary distinction which comes down to finding a single borderline in the clause. Everything in front of the borderline is attributed to the theme, everything behind it belongs to the rheme. (Things become more complicated as soon as sentences with more than one clause are analysed, which may become necessary; see 3.3.3.) If this is feasible as a cross-linguistically valid analysis, the theme-rheme division is one of the text level phenomena that may trigger certain ordering rules at lower levels. If we assume for a moment that this division is something that should be preserved in the target language counterpart of a source clause, this would give rise to a clause level ordering rule. A clause is a syntagma with a finite verb as its internal governor. The ordering rule would establish a theme-rheme borderline and place those syntagmata that translate elements of what is the theme in the source clause before the borderline, and the rheme syntagmata after it. How syntagmata can be located at particular places is taken up in the next subsection (3.3.3.). The question that should be addressed here is different: Is there evidence for cross-linguistic validity of the theme-rheme division? And if not, are there other translation-relevant ordering correspondences between languages? Precise answers to these questions can only be given for individual pairs of a source and a target language. Yet, it is possible in this study to point out two opposing regularities to which the ordering of communicative syntagma is subjected. The first regularity is indeed a communicative ordering in which the theme (if any) in a clause occurs before the rheme. As long as one speaks about languages in general, this ordering cannot be taken to be more than a tendency, with which many other regularities interfere. Leaving aside all the intricate possibilities in spoken language to mark theme and rheme for instance by intonation rather than by ordering, in written language the main counter-tendency seems to derive from word order being locked by functions in syntactic marking. The second regularity is thus the degree of "freeness" in word order. The closer a language is to the boundness pole of the scale, the more this tendency is contrary to the first one, the communicative ordering. This does not mean, however, that communicative ordering is possible only in languages with a free word order. The term "free word order" usually refers to the possibility of arranging the words of a clause in another sequence without changing their form. But the form of words is not untouchable. The next subsection (3.3.3.) describes form changes whose main function is to bring about another syntagma order. Before approaching that question, a few more words should be said about the communicative order aimed at in target language generation. Even if the assumption that there is a binary theme-rheme division in many languages may be a reasonable working hypothesis, this does not always bring about an unequivocal choice among the ordering alternatives that may be possible in a given clause or sentence of a specific target language. If alternative orderings are possible at 176
all, that is, if the possible positions are not totally exhausted by the needs of syntacticfunction marking, it is still in many languages the case that a certain order is normal, whereas others are less common. Indeed the typological classification into VSO, SVO, OSV languages etc. (see 3.3.1.) captures part of these ordering preferences (it also captures the syntactically bound orderings). For a given language pair the basic rule for rendering the communicative division of sentences therefore need not be simply the preservation of the source language order. And even if the theme-rheme division can be preserved through the translation process in a straightforward manner, long sentences often require more precise ordering decisions within the theme or the rheme respectively. The observation that there is something like a "normal" syntagma order which is specific for a certain language is not new. Indeed, it is in place for the machine translation purpose to revive an old idea from the beginnings of theme-rheme theory: Vilem Mathesius (1929; see 2.4.) spoke of marked and unmarked word order. Mathesius's idea seems to be a useful starting point for contrastive studies of word order. What one has to do is identify corresponding clauses or sentences in the two languages compared and figure out which orderings can be considered to be unmarked ("normal") and which ones are marked. Correspondence relations between these orderings in the two languages should be established. Certainly this sounds much simpler than it is, in particular if an exhaustive account is aimed at. However, analyses such as those sketched in 2.4. for English and Esperanto, augmented with appropriate default choices, will provide a reasonably fail-safe rule system for the syntagmaordering decisions required in a machine translation system. The close-up analysis given in 2.1.8. and 2.4. also shows what the role of communicative syntagma ordering is in an overall approach to text coherence. The writer's "message" is normally more than what is said in a clause or a sentence. It is a whole story in which different items are introduced either as shared knowledge or as new information. The communicative functions link up the clauses of a story. By focusing the reader's attention upon the items in these distinct ways, the writer (unconsciously) inteiTelates the predications made and establishes a coherent text. A certain syntagma's function as either theme or rheme is thus indeed a communicative function: it is part of the content that piece of text conveys. In a machine translation process it is thus of utmost importance to convey this coherence of focus in the target text.
3.3.3. Grammatical features determined by syntagma ordering When the syntax of a language does not allow for the sequential order of syntagmata which would be desirable from the point of view of communicative ordering, the influence of theme-rheme structure is often strong enough to enforce grammatical changes in the words involved in order to bring about the needed order. The expressions ordering and change in this description imply a procedural view of 177
grammar which may not apply to language as such, but which harmonizes well with the functioning of a machine translation process. Indeed, talking about a change suggests that there first is a "normal" translation of a clause which is then changed. The "normal" translation need not really be generated at any intermediate stage of the machine translation process, but for the discussion here it is a useful picture of something that in a real implementation may remain implicit in one or other half-way step between metataxis rules (i.e. syntactic transfer rules; cf. Schubert 1987b: 130ff.). This virtual version of the clause is rearranged in accordance with the additional requirements that derive from the need to fit the preliminary translation into its context. In other words, the rules discussed here directly provide for a part of text coherence.
In languages with a relatively bound syntagma order, such as English, the main verb dependents - such as subject, direct and indirect object, predicative etc. where appropriate - have certain prescribed positions with respect to the verb and to each other. In a dependency-based syntactic transfer process such as metataxis, the target language syntagmata in question are assigned their syntactic functions in accordance with the contrastive syntactic rules of the given language pair. It is a characteristic of dependency syntax that at that stage of the process word order is not made explicit (see, however, 3.3.4.). If it were, the virtual translation might appear in the form of a clause or sentence in a "normal" word order. The communicative ordering may then override the normal order. How can such a change be performed? Three units are involved: A syntagma, a syntactic function and a position in the clause. In the discussion so far, the situation has in varying formulations been described as a sort of a mismatch between the syntactic function of a syntagma and its position ("The subject should occur after the verb and after the object."). For understanding how different re-ordering mechanisms work, it is in place to look upon this situation from another standpoint: There is a mismatch between the syntagma (in its position assigned by communicative ordering) and its syntactic function (which must not occur in that position). The difference is subtle: In the first description the syntagma seems to be locked to its syntactic function (and is referred to in terms of that function: "the subject"), in the second it is linked to its position. If we assume that the communicative ordering overrides the normal syntactic order, the latter view may be more to the point. If communicative ordering has this prevalence, the problematic situations are like this: The ordering requirements assign to a certain syntagma a position in the clause, but the metataxis rules prescribe a syntactic function which cannot occur in that position. This mismatch may be resolved in various ways. At least three different ones are quite common: 1. Change the syntactic form of the governing verb so that it assigns appropriate syntactic functions.
178
2. Take a different governing verb which assigns appropriate functions.
syntactic
3. Transfer the syntagma in question into another clause at a convenient position in the sentence. Changes of this and similar types obviously must not be carried out in an unrestricted way. A strong restriction applies to them all: None of these changes may alter the basic meaning of the preliminary translation, since the final result is still meant to be a translation equivalent of the source clause (or sentence). What is a change in the syntactic form of the governing verb, that alters the verb's pattern of dependents, giving different syntactic functions to dependent syntagmata? This is actually a rather technical dependency-syntactic paraphrase of the grammatical effect of a well-known phenomenon: voice change. In the languages discussed in these chapters, voice change mainly means shifting from active to passive and vice versa. It entails shifts in the syntactic functions the verb assigns to its main complements. In English, for instance, a syntagma that depends as a direct object on an active verb becomes the subject of the verb's passive form. There is a parallel shift from the subject of the active form to a prepositional adjunct with by of the passive verb. The consequence that is interesting for communicative ordering is that in a language of the "fixed" type syntactic functions are bound to certain positions, which means the same syntagmata that in the preliminary translation depend on an active verb depend on the passive form in another order. A change in syntagma order is thus made possible by a shift in the governor's syntactic form. Many scholars have paid attention to the function shifts involved in the active-passive relation. One of the more systematic and cross-linguistically valid accounts comes, characteristically enough, from a school of language typology, namely the Leningrad group around Cholodovic, also connected with Mel'cuk. The Leningrad authors distinguish categories of form and of content and take voice (Russian: zalog) to be a (morpho)syntactic category. They define another category, diathesis (Russian: diateza), which allows for cross-linguistic comparisons. In their definition (Cholodovic 1970: 6ff.; Mel'cuk/Cholodovic 1970) diathesis is a conrespondence pattern of syntactic and semantic units. The semantic units which they use are semantic deep cases like those discussed in 2.5.1. It has been argued that the basic function of the passive voice can best be described in terms of diatheses: The passive voice removes the agent from the subject function. All the other effects of an active-passive shift can be said to be consequences or corollaries of this diathesis shift. However elegant this explanation sounds, it only works when both the syntactic and the semantic units of a diathesis pattern are welldefined in a language-independent way. But the difficulties in exactly defining semantic roles like agent are notorious. It can be shown that the basic function of the passive voice can be described as a non-correspondence of subject and agent only, if the semantic role agent is tailored for this purpose, that is, if an agent is defined in a 179
rather syntax-oriented way, more or less as "what is the subject of the active verb" (cf. Schubert 1982: 55ff.). Such a definition is too form-oriented for a cross-linguistically valid, purely semantic unit. In the present approach to machine translation we therefore completely refrain from making semantic roles explicit. For the needs of translation, it has turned out to be sufficient that semantics is referred to as a tertium comparationis, but not spelled out in the form of semantic roles or other extralinguistic means. This insight is part of the implicitness principle in machine translation (Schubert 1987a, forthc. b).
Voice change, the first method of ordering syntagmata in a language with bound syntagma order is a syntactic form change in the verb that governs the syntagmata to be re-ordered. The second method is more than that: in some occasions the governing verb can be replaced by another verb which denotes the same situation but assigns different syntactic (and semantic) functions to dependent syntagmata. Often-cited examples are pairs like German gefallen - mogen, French plaire - aimer or English give - get. It is sometimes argued - although not in these terms - that the difference in meaning between verbs like give and get is a difference of diathesis. This would mean that the two events give and get have the same semantic roles (termed for instance agent, beneficiary and object) and the difference is in the correspondence of these semantic roles to syntactic functions, i.e. in diathesis: The pattern give would then be: He gives her a book. / He gives a book to her. agent beneficiary patient
-
subject indirect object or to complement direct object
The diathesis of get would accordingly look like this: She gets a book from him. agent beneficiary patient
-
from complement subject direct object
Such an analysis presupposes a distinction of basic and derived events (on the term event see 3.4.). In the example give is tacitly taken to be basic since the semantic roles are chosen to fit that event, whereas the description of get is derived from it. If get were taken as the basic event, she as the performer of the event might more naturally be taken as the agent. Reasonings of this kind are an attempt to derive from a coincidence of (extralinguistic) situations a semantic congruence of the (linguistically expressed) predications about the situation. Indeed, some authors do not seem to make the distinction between the situation and predications about it. In our view there are no convincing reasons for this 180
direct derivation of semantic equivalence from situation coincidence, nor for the tacit assumption that there are basic and derived events. These verbs do not mean the same but denote different events within the same extialinguistic situation (Schubert 1987a). Or to put it differently: They express different predications about the same situation. As a consequence, the replacement of a verb with such a counterpart is not meaningpreserving in a strict sense. This is not a shortcoming, but a functional feature of the replacement operation: It is the (slight) change in meaning that make the shift in syntactic-function assignment possible. Accordingly, the prevalent requirements for ordering of communicative syntagma may make a replacement of this kind desirable in view of the ultimate goal of an adequate translation. All this discussion about verb replacement for communicative syntagma ordering is not at all concerned only with some subtle refinements which in real machine translation implementations may be postponed until high-flown dreams of sophisticated style perfection can be realised. Verb correspondences like give-get are admittedly not too frequent in word lists (although their score is less low in current-text frequency), but similar correspondences are considerably more frequent between two languages, thus in bilingual dictionaries. They are especially indispensable in many cases where certain voice paradigms are defective. For example, a translation equivalent of English own is German besitzen. However, the English passive form, is owned, cannot be rendered as a passive of besitzen, since there is no such form in German. The way out is another verb with - in our view - a different meaning: gehören. Thus The house is owned by an elderly gentleman [passive] becomes Das Haus gehört einem älteren Herrn [active].
The third method of syntagma ordering has still wider scope than the first two. The first method affects a feature of a verb and the second method changes the stem of the verb itself. Both operations may introduce or delete an auxiliary verb, thus locally changing the tree structure they work on. The third method does more: It has an effect on the position of the entire clause in the sentence. The third method is also a notorious topic of discussion in syntactic theory and has many names: extraposition, unbounded dependency, cleft sentences etc. Seen from a standpoint of communicative ordering, extraposition is a step outside the clause in which syntactic functions are bound to positions. What in the "normal", preliminary translation is a dependent of a certain verb, may through a clefting operation become a dependent of another verb. In English, this new verb with the clause made up by its dependents often precedes and governs the rest of the preliminary translation. The newly introduced verb has an auxiliary syntactic function, as is also emphasized by the fact that it is a semantically very general verb, more or less void of specific meaning such as was in the following example (taken from Leech/Svartvik 1975: 215): John bought an old car last week. It was an old car that John bought last week. 181
Extraposition is a phenomenon that intrigues the grammarians of some languages quite widely, and many different facets of an explanation for its basic function, its side effects and its limits have been suggested. The brief description here, concentrated on a single function, is not meant in any way to discard all these valuable investigations. One may indeed discuss whether extraposition in a given language has a basic function, since this would probably yield the most elegant explanation. One may also discuss whether its basic function, if any, is today necessarily the same function that in language history has brought about the phenomenon. All we are suggesting here is that communicative syntagma ordering is at least one of the basic functions of extraposition in languages such as English or the Scandinavian languages. If word and syntagma order is relatively bound at the clause level, extraposition provides an escape to a higher and, in this particular respect, freer level. Since extraposition is an operation that may move syntagmata outside the clause they belong to in the preliminary translation, it is in those cases necessary to describe a theme-rheme division not only at the clause, but also at the sentence level.
3.3.4. Excursus on word order in dependency syntax Before proceeding to the third kind of coherence (3.4.), it is in place still to dwell for a while on the problem of word order in the framework of dependency syntax. As was pointed out in 1.3., this entire study has been written with dependency grammar in mind. In particular, reference is made to the machine translation version of dependency syntax (Schubert 1986a, 1987b: 28ff.) and dependency semantics (Papegaaij 1986: 95ff.) devised for the DLT system. Dependency syntax is sometimes criticised, or even without much reflection entirely rejected, "because it does not account for word order". More careful investigation easily leads to the conclusion that this judgement is mistaken. Dependency syntax does account for word order, although not always in the way some readers may expect from their acquaintance with syntactic descriptions of English in other grammatical models. What has been said in the preceding parts of this section (3.3.1.-3.3.3.) completes, albeit at times in a sketchy way, the picture drawn in earlier studies (Schubert 1986a: llff., 1987b: 28ff.), adding text level regularities for word order. It is therefore now appropriate to give an overall view of the treatment of word order in dependency syntax.
The alternative to dependency syntax is constituency syntax (Schubert 1987b: 17ff.), a group which comprises a wide range of different models. From a translation-oriented viewpoint one of the main differences between the two types of syntaxes, brought about by the fundamental difference in approach, is the fact that dependency syntax primarily describes syntactic function for this purpose making use of syntactic form where appropriate, whereas constituency syntax at first hand describes syntactic form making syntactic function inferable. In this special sense dependency syntax can be characterised as the more function-oriented model of the two. The discussion so far has shown that word order (and, at a higher level, syntagma order) in many languages 182
has not got just a single function, but several, which it is expedient to distinguish. Dependency syntax, accordingly, does not directly account for word order which is an element of the syntactic form of a text, but it accounts for various syntactic functions some of which in a given language may be identified by means of word order. Let us take German as an example. On the bound-free scale German is situated somewhat closer to the "free" pole than English, but much farther from it than for instance Russian. At the syntagma level word order identifies some of the syntactic dependency relations. A preposition always precedes its argument, an article its noun, an attributive adjective or participle its noun etc. At the clause level German syntagma order has a certain freedom which is limited by the well-known verb-second rule and other ordering restrictions. Inasmuch as word order is needed for identifying the syntactic functions of certain wbrds or syntagmata, it is "bound", that is, it cannot fulfil other functions simultaneously. But where it is "free", it expresses communicative functions such as a theme-rheme division in clauses and sentences. When a dependency-syntactic analysis renders a sentence in the form of a tree structure with labels for syntactic functions, these functions are explicit and the features that identify them in the linear sentence need no longer be taken into account. This is why a dependency tree, properly labelled, seemingly does not contain information about word order: In a labelled dependency tree word order is redundant and inferable, and may thus be kept implicit. But syntactic-function identification is not the only function of word order! Thus, it is more correct to say that in a dependency tree those elements of word order are redundant that only identify syntactic dependency relations. A communicative syntagma order should indeed be rendered in a dependency tree, however, either by means of labels such as "theme", "rheme" etc. (explicitly) or just in the order of main branches under a governing node (implicitly). If the latter solution is preferred, it is not even true to say that dependency trees are unordered. In conclusion, it can be said that dependency syntax does not deal with word order as such, but rather with the grammatical functions it fulfils. The latter are thoroughly accounted for in either explicit or implicit ways.
183
3.4. Coherence of events
Two kinds of coherence have been discussed in the above sections, namely coherence of entities (3.2.) and coherence of focus (3.3.). In a way, the morphemes-and-relations approach (3.1.) in these reasonings functions as a safeguard that should insure complete coverage of the translation rules. This is necessary, because at the text level where all remaining decisions for the ultimate goal of an adequate translation have to be made, nothing can be left out. But has the safeguard already accomplished its task? In other words, are there still unresolved questions when the target text being generated is coherent with regard to both entities and focus? The discussion in 3.1. proceeds mainly from the source to the target language and is concerned with how to recognize the morphemes and the morpheme relations that have the nature of a linguistic sign, that is, how to recognize the translation-relevant elements. All this is done in the source text, and the idea is that all its translationrelevant elements should be transfen-ed into an appropriate form in the target text, in order to achieve an adequate translation. The linguistic signs are the meaning-bearing elements of the source text; so they are transposed into their target language counterparts and are arranged in an appropriate syntactic form. This may be a fruitful approach, but when phenomena not previously covered are looked for to achieve complete coverage, it turns out that it is insufficient to be concerned with the source text only. This sketch of the basic idea is in an essential point over-simplified: it is built on the tacit assumption that the target language-specific requirements that should be met after the content of the source text has been transferred are purely syntactic. This would mean, in other words, that the semantic elements of the target text are determined exclusively by the source text, while only the syntactic configuration of the target text may be subject to target language rules. This is not so, however. Languages, thus also target languages, often require certain semantic elements to be contained in a sentence or a text. Before saying more specifically what these obligatory elements are, let us further pursue the general abstract reasoning in order to understand this phenomenon in a manner that has a cross-linguistic bearing. Isn't an obligatory semantic element redundant and couldn't it be inserted in its place by a simple redundancy rule? Unfortunately, the elements in question are not redundant. Normally the obligatoriness is not valid for a single linguistic sign, but for a whole category, and the problem in target language generation is to select one out of a paradigm of possible fillers and insert it in the obligatory slot. Finally, what are these mysterious obligatory elements that should be in a target text, 184
though lacking a source language counterpart? The obligatory presence of a certain element is a grammatical regularity, and we are accordingly concerned here with grammaticalised expressions of meaning. In 3.3., it turned out that there are grammatical features to be decided upon at the text level, e.g. the voice feature in verbs. In a close relationship with voice, one finds a number of features still not chosen. These are the central verb features, tense, mood, aspect etc. They are grammaticalised meaning-bearing features. How is it possible that these essential features of verbs are not translated at a lower level? Anyone acquainted with only a fraction of the abundant literature on these grammatical phenomena can easily see the reason: there are hardly any one-to-one correspondences between the units of these categories in any given pair of a source and a target language. An English perfect tense, for example, may translate into German as a perfect tense, a preterite tense, a present tense and in special cases various other forms. This is one of the more subtle examples; there are also more obvious contrasts such as the absence of entire categories in one of the languages. Russian verbs, for instance, have an obligatory aspect category which is absent from German verbs. When translating from German into Russian, the aspect information must be infeired somehow. So, some translation rules may work at lower levels, translating or at least preselecting some of these features, but the bulk of the decisions cannot be made unless the entire (preceding) text is accessible. The situation is obscured by the fact that these features are determined by different simultaneously effective regularities at various levels. The criteria for making decisions about tense, mood and aspect for a target language verb may be of either a syntactic or a semantic nature (or both). As far as they are syntactic, they often have to do with form government or agreement between the verbs in different clauses or sentences, which means that they are redundant in all instances but one. In these cases the decisions in question concern redundant features - thus features not relevant to translation proper in the sense of what was said in 3.1. - but it is nevertheless an effect involving form determination beyond the sentence boundary, which justifies the place of these decisions in text grammar. But there are more reasons for dealing with tense, mood and aspect at this level. Syntactically redundant features, even if these are tense, mood and aspect units, can in many cases be identified relatively straightforwardly. Where there is a free semantic choice, however, the feature in question is a linguistic sign of its own and should be carefully taken into account in translation (see 3.1.). The differences among languages in this respect derive in large part from the fact that different semantic choices have become grammaticalized in different languages. Thus a bit of meaning expressed in a tense, mood or aspect unit may be obligatory in one language while it is optional in another and may not even have a standard expression. An example is the aspect category in German and Russian, respectively. When filling in the tense, mood and aspect slots in a target text, three types of situation may occur:
185
1. There is a direct form correspondence between the categories and their units in the two languages. 2. The information needed for deciding on the target language has no grammaticalized expression in the source text, but may be inferred. 3. The required information is not contained in the source text. There is a fourth possibility: 4. There is grammaticalized meaning in the source text which has no directly corresponding form in the target language. In the fourth case, the question is whether the information should be rendered in a free formulation and how often this should be repeated if the feature occurs more often, or whether and when it can be omitted. These questions are not at issue here. The first situation is the less complicated one. It may in some cases even be resolved at the clause level. The second situation requires the most complex mechanisms, because the decisions to be made depend on the results of a structured semantic search of well-defined places throughout the previous text, especially in the sentences immediately preceding the current position. It is far beyond the scope of this study to illustrate in detail for various language pairs what kinds or regularities may be found and applied. A single, simple example may suffice. In an unmarked case, a German present tense translates into English as a present tense: Er wohnt in den Niederlanden. He lives in the Netherlands. There may, however, be indications that bring about another tense and aspect choice in English: Er wohnt schon zwanzig Jahre in den Niederlanden. He has been living in the Netherlands for twenty years. In the second pair of sentences, the indication is ready to hand: the free adjunct with schon which brings about an inclusive perfect in English. In various language pairs, there are many regularities of this kind, and only a few of them are as clear-cut as this. The third situation seems awkward. How can one use information that can be neither found nor inferred? This is not, as it might appear, one of the cases where machine translation reveals one of the insufficiencies that would make it inherently inferior to human translation, but is, on the contrary, one of the cases in which machine 186
translation faces the same imponderabilities as a translator. A machine translation system should in such cases perhaps try to imitate a translator's way of working. If it is clear that the information needed cannot be found by any means, then a translator too must choose a default solution. For machine translation, it is expedient to study what these defaults are.
The most interesting of the three situations seems to be the second one. In this study we shall not attempt at to add anything to the abundance of monolingual and contrastive studies in the field of tense, mood and aspect grammar. Our concern here is the relevance of these regularities to text coherence. As far as tense, mood and aspect are subject to the writer's free semantic choices, they are linguistic signs of the translation-relevant type discussed here (see 3.1.). Their relevance to text coherence can be seen from the content they convey. The question what this content is, however, leads profoundly into the intricate semantic and pragmatic phenomena investigated in linguistic literature. In view of the voluminous evidence as well as of the diverse interpretations given by different authors and schools, we cannot in the present study account for the nature of tense, mood and aspect in much detail. But in spite of the complexity of the problem, it is necessary to furnish the discussion of text coherence with at least a sketchy outline of the essence of the content expressed in these categories, from a translation-oriented point of view. Tense, mood and aspect are syntactic categories, thus form categories, and as such their units seldom express only a single meaning but are influenced by several regularities of both a semantic and a syntactic nature. Moreover, there may in some cases be portmanteau morphemes or function word constructions with various combinations of meanings and functions rather than distinct morphemes for distinct functions. Inasmuch as tense, mood and aspect units are used in free semantic choices, the essence of their meaning may be described very roughly in the following way. Tense, mood and aspect are typically verb features, and as such they modify the predication expressed by the verb about extralinguistic events. (We take the term event in a very broad sense, including acts, occurrences and states.) A unit of the tense category contains a predication about the location in time of the event. It may either assign the event a time location or mark it as generic or nonactual. Time locatiQns are defined as temporal relations to a certain point of reference which may be either the point of utterance or another event. A unit of the mood category contains a predication about the truth value of the predication expressed by the main verb. The main predication may be marked as certain, uncertain, possible, irrealistic, reported, reported from hearsay etc. Sometimes the communicative status of declarative, imperative or interrogative utterances is also rendered in mood units.
187
A unit of the aspect category contains a predication about the completeness of the event denoted by the main verb. It has quite distinct effects on verbs that denote events of telic or atelic aktionsart, respectively. Aktionsart is a category of a semantic classification of events, expressed prevailingly in verbs. An event is called telic (other terms with a slightly different meaning are bounded, terminative etc.) if it has an inherent boundary which, once attained, does not allow for continuing (but at best for repeating) the event. A telic event is aspectually complete as soon as the boundary is attained, an atelic event as soon as it has begun. The latter idea shows that there is a difference between aspectual and temporal completeness. Various aspect systems mark either the incomplete (English) or the complete (Russian) events and leave the others unmarked.
In such an extremely short, oversimplified, and not by any means exhaustive form, these descriptions are certainly controversial. They are, however, only meant to illustrate the impact of tense, mood and aspect on text coherence. When the tense category, among other things, relates predications to each other in time, marking a given event as occurring before, at the same time as or after another event, or as generally valid, it may not only contribute to text coherence, but it indeed provides for the backbone of the whole message. The same is true for aspect. The completeness or incompleteness of events at a given point in time is essential to temporal ordering. And what a text tells us about order in time often has relevance to other logical or semantic relations, the most important among which may be the causal relation. An event can be said to be the cause of another event, only if it occurs before or at least simultaneously with the alleged consequence. Implicit meaning of this kind is often lost in translation unless the aspectual and temporal information is painstakingly conveyed in the target text. Mood is even more directly linked to the exact content of the predications made in a text. What we call coherence of events is thus the structure in which the events that are spoken of in a text are arranged. This structure is a structure in time, completeness, truth value and other elements of meaning. Taking into account the coherence of events in a source text and its target counterpart does indeed make it possible to render part of the explicit and equally important implicit meaning contained in the text.
188
3.5. Pragmatics - an escape from text grammar?
At the lower levels of text analysis it is always possible to leave one or other problem unresolved until the analysis can "escape" to a higher level. At the text level, the highest level, this is no longer possible. Everything not dealt with before has to be resolved there or nowhere. But nevertheless text grammar is not really the last and most powerful instrument of machine translation, because there is more in text analysis than just grammar. There is no higher level to escape to, but still an escape is possible: outside the language system, from grammar to pragmatics. Indeed in machine translation, and in natural-language processing in general, many extragrammatical sources of knowledge are used, or at least efforts are being made to use them. Text analysis may be supported by a variety of rule systems. Key words are knowledge of the world, special-domain knowledge, scripts, scenes, scenarios, logic, etc. It would be imprecise to say that pragmatics comprises all these fields. Yet the aim of the exercise is translating, which implies a text analysis according to the laws of the source language and a corresponding synthesis in the system of the target language. Accordingly, world knowledge, logic etc. are not needed as such for this particular purpose, but only insofar as they help make decisions on lexeme choice, syntagma order, grammatical features and the like, thus only insofar as they have an effect on language. The influence of extralinguistic phenomena on language is the subject of pragmatics, as it is defined in this study (see 1.4.).
As for the role of extralinguistic knowledge in machine translation, three basic questions should be answered: 1. What kind of knowledge is needed for translation? 2. How deep should the analysis of various kinds of implicit knowledge go? 3. How can pertinent extralinguistic knowledge be found, represented, stored, retrieved and consulted? These three questions are addressed in the subsequent subsections (3.5.1.-3.5.3.).
189
3.5.1. The kind of translation-relevant pragmatic knowledge The first question may be answered by reviewing the devices discussed in chapter 2 in the light of what was said in the above sections about the three types of text coherence. The question is which of the three coherences can be better or more easily or reliably detected by means of extralinguistic knowledge. -
Coherence of focus seems to be prevailingly detectable by grammatical regularities. Translation rules in this realm draw heavily on the markedunmarked distinction at all levels of syntactic form, from morpheme to text, including word and syntagma order. Where hard rules do not exist or have not been discovered, statistical data and similar results of corpus analysis are a good help. The appropriate translation rules may be devised with conclusions about the author's extratextual and extragrammatical communicative intentions as a tertium comparationis without necessarily making them explicit in any way. (In 3.5.3. we return to another facet of theme-rheme structures which has to do with pragmatic knowledge.) However, detecting the coherence of focus is a process within the text. The recognition of coherence of focus is apparently not much enhanced by pragmatic knowledge.
-
Coherence of events is to a large extent grammar-based, but information as to the temporal location and duration of events, as to aspectual completeness, truth value etc. can also be supplied by pragmatic considerations where it remains unexpressed in the text itself. The recognition of coherence of events can to some extent profit from pragmatic knowledge.
-
Coherence of entities relies much more than the other two coherences on pragmatic knowledge. A purely grammatical, thus syntactic and semantic, approach to lexeme choice may lead to choices even without pragmatic knowledge, but - unless excessively many hardly interesting questions are answered by the user in a disambiguation dialogue - the outcome will be a pretty bad translation. Along with semantic knowledge, these selection processes need most of the pragmatic knowledge available. The recognition of coherence of entities profits much from pragmatic knowledge.
Perhaps the most fruitful text coherence devices for enhancing lexeme choice are those which establish coherence of entities, that is, those which link up the content words in a text to a series of networks of coreferential words or syntagmata. In addition to coreference, other kinds of relations may be used (part-of relations, antonym relations etc.; see the end of this subsection), but reference identity seems to be the most straightforward and most directly applicable one. 190
The steps required for establishing such networks of reference identity display in a neat way the interplay of syntax, semantics and pragmatics. To establish such networks, syntactic conditions preselect candidate coreferents, semantic conditions assess the degree of coincidence in meaning between candidate coreferents, and pragmatic conditions then finally suggest whether possible coreferents denote not only the same type of item (which is ensured by semantic coincidence or synonymy), but also the same instance of an item, the same entity (which is coreference, as opposed to mere synonymy). As for the coherence of events, another network should be established, notably that of a temporal order of the events denoted by appropriate pieces of text (i.e. mainly verbs and their dependents). Pragmatic conditions may help reconstruct and use both implicit events and unexpressed properties of events, such as those features that may be obligatorily grammaticalized in a given target language, but are lacking in the source text, as discussed in 3.3.3. Coherence of entities is in very many languages mainly linked to nouns, noun syntagmata and pronouns, while coherence of events is connected with verbs and deverbal words (nominalised verbs, participles etc.). Interestingly enough, there does not seem to be a similar connectedness among adjectives, the third main group of content words. This observation gives rise to semantic and pragmatic considerations as to the differences between events, items and qualities. These thoughts, however, are of a wide scope, far beyond this application-oriented study.
All these pragmatic considerations and rule applications have to do with probability. A network of events, ordered in a temporal sequence, enhances lexeme choice, in that the available syntactically possible translation alternatives for events can be enriched by pragmatic knowledge about probable sequences of events in such a way that the degree of coincidence with the probable event sequence is computed and used to rank translation alternatives as more or less probable. This would mean transferring the idea of ist and soil structures from syntagma level semantics to pragmatic event sequences (for the syntagma level application cf. Papegaaij 1986: 95ff.). This approach implies a comparison of alternative event sequences from the text (ist sequences) with a dictionary or knowledge base with typical event sequences (soli sequences). "Soil event sequences" of this kind are widely discussed in connection with machine translation, and more generally with natural-language processing. They are usually called scripts, scenes, scenarios, etc. In addition to the knowledge about typical sequences of events that may be stored in the form of scenarios or the like, rules about relations among events may also be formulated. Such rules are first of all found in logic. Logical rules are more powerful than scenarios. They may contain hard conditions about events that are necessary prerequisites for or inevitable consequences of other events. The content of such rules, like that of scenarios, need not be very sophisticated for a human observer; they may contain information of the kind "If a something is opened, it must, at the moment when the opening event began, have been in a closed position".
How pragmatic knowledge is used in specific decisions in machine translation is 191
illustrated at various places in chapter 2. Here, however, the pertinence of both scenarios and logic to translation should be addressed critically. Indeed, a word of warning may be in order. Assigning probability scores to translation alternatives on the basis of a proximity match with scenarios implies a presupposition which need not necessarily hold. Pragmatic scores of the type discussed here will rank those translation alternatives (ist) higher that are more similar to typical event sequences (soil). There is, however, no a priori guarantee that the content of a text to be translated consists of typical situations. In a similar way, the application of logical rules may allot a low score to a translation alternative, because the event sequence expressed in it is illogical or impossible and thus far removed from the scenarios of the knowledge base. Again, however, there is no guarantee whatsoever that the source text is logical or is intended to be so, nor that it describes only events that can possibly happen in reality. We feel that in part of the literature about applications of scenarios and logic in machine translation this fact is overlooked. And not only there: Much of the discussion in theoretical grammar about degrees of grammaticality has to do with the considerations involved here. There are various classifications of grammaticality, but most of them roughly distinguish something like syntactic, semantic and pragmatic grammaticality. According to such accounts, in syntactically deviant utterances the form of words and syntagmata violates the rules, in semantically deviant utterances the formal arrangement is correct, but the combination of meaning-bearing elements runs counter to selection rules and in pragmatically deviant texts semantically correct utterances do not fit into any imaginable situation. In our view, a precisely describable fraction of these rules and rule violations are not at all of a grammatical nature and, therefore, do not describe grammaticality. We maintain that only syntactically deviant utterances qualify as ungrammatical. (Although we do not use grammar as a synonym of syntax, as some authors do.) There is nothing like "semantic ungrammatically" (although semantics is in this study taken to be a branch of grammar). As we have argued earlier (Schubert forthc. d), every syntactically correct structure (of words that exist in the language in question) has a meaning. This meaning may or may not make sense, it may or may not be true, logical or likely to happen. But there is always meaning in a syntactically correct structure. There may also be meaning in syntactically incorrect utterances, or, alternatively, one may redefine one's notion of syntactic correctness to include all meaningful utterances also of spoken language, learners' language, foreigner language etc. Yet, all these are not at issue here. The point made here is merely that a syntactically correct sentence (utterance, text...) always has a meaning. This claim is not as bold as it might seem at first sight. Indeed it is not even really surprising that this should be so, since syntactically correct utterances have a meaning, as it were, by definition. We cannot motivate this in any detail here; let it thus at least be mentioned as a claim, that in our view this is the semantic anchorage of syntax.
Pragmatics is in this study not considered to be a part of grammar. But the argument made for semantics holds in much the same way for pragmatics as well: There is
192
nothing like a syntactically (and semantically) correct, but pragmatically deviant utterance. An utterance is not ungrammatical if it does not fit into any situation in reality. Language can express more than the real world. It is perfectly possible to be mistaken, to lie or to make false, illogical, senseless or fictional predications. As far as grammar, the internal system of language, is concerned, all these utterances may be perfectly grammatical, understandable and interpretable. And if this is so, they must also be translatable.
It is in view of this argument that the possible contribution of pragmatics, that is, of world knowledge, logic etc. to machine translation should be assessed. And it is from this argument that our "warning" derives: However "hard" the rules of logic are, they apply to the world, not directly in the same way to texts about the world. (We shall not enter a philosophical discussion about whether or not and in which respect texts are part of the world.) As a consequence, pragmatic knowledge of whatever kind can never discard a translation alternative. It may be used for assigning probability scores, but such an application is only feasible if indications to less expectable readings are identified and understood properly. No alternative, even the most unlikely one, is impossible. Moreover, it cannot be the task of a translation system to gear the content of translated texts towards some kind of typical situations. On the contrary, the ultimate goal of an adequate translation entails also the need of translating pragmatically "deviant" predications with the same degree of precision as predications about typical situations. In the framework in which the present study is a mosaic piece, these considerations have led to a certain prevalence of syntax, in that semantic and pragmatic choices are only made from the alternatives that are offered as syntactically possible. Let us emphasize this point with a somewhat exaggerated example: When the author of a text wants to say that the grass eats a cow, semantic-pragmatic choices may be required as to the translations of the involved words, but no calculations of pragmatic likelihood must bring about a translation in which the cow eats the grass. This example is so extreme that one might feel that it is irrelevant to the kind of texts processed in machine translation systems, but one would be greatly mistaken to assume that pragmatically unexpected predications are absent from any particular text sort, whether technical writings, maintenance manuals, economic news, administrative conference minutes or whatever. Practical experience shows that a text from any text sort whatsoever may contain elements of any other text sort.
Returning to the question of what kind of pragmatic knowledge is translation-relevant, we can now approach an answer. In a way the answer is already implicitly contained in the initial considerations of this subsection, where the three types of coherence are discussed separately. Typical events and event sequences, stored in the form of scenarios or the like, can enhance translation, inasmuch as they contribute to the establishment of the networks mentioned: Networks of reference identity and networks of temporal event sequences.
193
In this characterisation, reference identity and temporal sequence are used as collective terms under each of which an entire bundle of phenomena is subsumed. Reference identity can be used as a common denominator for various relationships such as part-of relations, antonym relations, and others in which a subset or a superset of the semantic content of a word or syntagma is repeated in another expression at a different place in the text. Temporal sequence is taken as a basic relation among events. It may be conceptually linked to, or derived from, other relationships, such as causal and implicative links, etc.
3.5.2. Full understanding? The second question asked in the beginning of this section (3.5.) leads to a discussion alluded to earlier in this study (see the introduction to chapter 2), the discussion about "no understanding" versus "full understanding" in machine translation. Let us first consider how a translator works. A translator is able to reformulate the content of a given text in another language for two reasons. First, because he or she understands the source text and second, by virtue of pertinent background knowledge. Background knowledge comprises both specialised information about the domain or field from which the text is taken and about the objects and actions described, but also more generally about the social, national or cultural setting in which the text was written, among many other things. The two conditions are not independent of each other. Background knowledge is in part a prerequisite for understanding. The distinction is actually a bit artificial. Understanding may be explained in terms of a more general, but not essentially different type of background knowledge. Understanding implies an acquaintance with the extralinguistic events, phenomena and qualities which words and texts refer to. This is true for everyday domains, but it becomes still more obvious in specialised fields: Even if one knows the source and the target language very well, it is still very difficult, if not impossible, to translate the maintenance manual for an aircraft engine correctly, if one has never seen an aircraft engine. There are hundreds of examples of this kind, and their message is valid however precisely the text is worded. Without reference to at least some background knowledge as well, the most carefully written text may remain incomprehensible. Together with many other indications, this suggests that there is no absolute precision in semantics, in the same way as there is no absolute unambiguity in meaning. Only if exhaustiveness were possible in this respect, would translation by means of linguistic knowledge only and without reference to the underlying extralinguistic events perhaps be feasible. Where there is no absolute degree of precision and unambiguity, however, there cannot be anything like full understanding. In view of this insight, we can now more precisely take a position in the discussion already mentioned of "no understanding" versus "full understanding" in machine translation: Since full understanding does not exist and since "no understanding" is generally thought to be insufficient, the question cannot be whether or not it is 194
required for machine translation, but only, to what depth understanding should go to allow for a good translation. This question applies in principle to all kinds of translation, but it may of course be the case that different answers have to be given for human and machine translation, respectively. In the framework of this study the answer appears to be clear: what is needed is exactly as much information as is sufficient for translating all morphemes and all relations (see 3.1.). But that answer is only seemingly clear, unless there is a good answer to the subsequent question, what "sufficient" in this regard is. If we consider only one of the decisions to be taken at the text level, namely lexeme choice, it immediately becomes obvious that there cannot be a hard, grammatically or otherwise theoretically motivated, threshold above which the amount of knowledge at hand and the depth of understanding automatically are "sufficient". As was shown earlier (Papegaaij 1986: 86), lexeme choice - already at the syntagma level and not less so at higher levels - is not a process in which a single "correct" solution is sought and found from among lots of "incorrect" ones, but rather a process in which a (possibly quite sizeable) set of syntactically correct alternatives is brought into an order of semantic likelihood. When this is done, either th& most likely solution may be taken as "the" translation, or a number of most likely solutions may in a disambiguation dialogue be submitted for a final choice to the user of the machine translation system. When evaluations of long-standing user interaction can be condensed into improvements of the translation system, the frequency of dialogue questions may decrease or the kind of alternatives submitted may become more subtle, but there will never be a single case in which an alternative can be automatically discarded with absolute certainty. As for lexeme choice, this entire study presupposes that the system should have access to disambiguating information even at the text level, before bothering the user. Due to the nature of the problem, however, at this level also no absolute limit for the amount of required information can be given. So, it is impossible to state in general how much information is sufficient for the task. That sufficient cannot be sharply defined is true at least, if it is understood in an absolute sense, that is, if the available information is not accepted as sufficient, unless it brings about a clear choice of a single one from among the translation alternatives for a given word, syntagma, sentence or text. In this sense there is no such thing as sufficient information indeed. This is the reason why the quest for total information or full understanding seems inexpedient and fruitless. For machine translation the consequence should be a trade-off that leads to a good enough translation. In other words, the high-quality translation considered as the ultimate goal of machine translation efforts cannot be anything but a quality defined by the system designers. What Geoffrey Sampson (1987: 97) says about analysis, holds in fact for the entire machine translation process: "There does seem to be a real danger that people may require MT systems to be better at the job of analysis than we humans are ourselves." For a practical implementation, the need of defining the desired quality comes down to trying to determine how much deep understanding of the text and how much background knowledge are required in order to produce a translation of a reasonable quality. Chapter 2 gives indications, mainly for translation from English into
195
Esperanto, but a universally valid, language-independent rule cannot be derived from a single language pair.
3.5.3. Knowledge representation The third question asked in the beginning of 3.5. is concerned with knowledge representation, storage, retrieval etc. As it is worded there, it is a very far-reaching question which obviously cannot be exhausted in this study in any substantial fraction of its aspects in linguistics and computer science. We therefore in this subsection concentrate on a concise sketch of the instruments applied in the framework of the DLT machine translation system. We focus especially on the linguistic characteristics of this solution. In 3.5.1. we discuss the interplay of grammar and pragmatics. Pragmatic knowledge can be made applicable for achieving text coherence, in that syntactically possible translation alternatives are compared with knowledge about extralinguistic phenomena, which from a pragmatic point of view is taken to be probable or typical. The question now is, how this knowledge can be represented and in a way that allows for straightforward comparisons with translation alternatives. Within the architecture of the DLT system semantic and pragmatic comparisons of this kind are at the syntagmatic level carried out between translation alternatives and entries in a lexical knowledge bank that are both written in the same formalism: Semantic dependency trees in DLT's intermediate language Esperanto. The trees (or rather; networks) are of a special nature, in that they are semantically implicit. Whereas syntactic dependency trees explicate the syntactic relations between words by means of (extratextual or metalinguistic) labels, DLT's semantic networks do not contain anything but Esperanto morphemes. The nodes of these trees carry content words, the relators between the content words are function words (such as prepositions) or function morphemes (such as subject and object markers). This confinement to linguistic devices for meaning representation is based on a number of theoretical insights. Firstly, language is the best meaning representation. Maybe this is controversial. What we need here is in fact not more than to claim that language is the best representation at least for rendering meaning that is initially expressed by means of language, a claim which sounds less controversial. So the claim is that there is no better metalanguage than language. Secondly, the difficulties in defining, delimiting and subdividing cross-linguistically valid, explicit semantic relations such as deep cases or semantic roles are notorious. Indeed, it may be taken for granted that there is no end to subdivisions of semantic concepts such as the relations in questions. It has been argued that the best way out for machine translation is to not make semantic relations explicit at all, but to keep them as an implicit tertium comparationis. This is part of what we have previously called the implicitness principle (Schubert 1987a: 116, 1987b: 204, forthc. b). 196
At this stage of development, it seems safe to say that larger pieces of text can also be represented in semantic networks of the kind described. We accordingly suggest extending the sentence networks of earlier DLT implementations to the text level quite straightforwardly and also representing pragmatic knowledge in the form of the same type of networks. This approach implies that scenarios need not be entered in an enormous bulk of "scenariographic" work, but that they are implicitly contained to a sufficient extent in event descriptions in an appropriate syntagma level knowledge bank.
197
3.6. Towards implementation
At the end of a study on text coherence in machine translation, it is in order to ask what has been achieved. Maybe the reader is somewhat unsatisfied and has a feeling of having been offered just another series of sample analyses and just another, sketchy, model of text structure. This may appear to be so, at least to a casual reader. However, a closer look at our endeavour in this book should, we hope, also make clear what our main guideline was: in text coherence, we confront a phenomenon which we on the one hand inevitably need to master if we want to achieve high-quality machine translation of texts, but which on the other hand is so intricate that very many scholars' efforts have not yet led to a readily applicable theoretical account. We have therefore tried in this study to review just some of the solution mechanisms that have been proposed, and to devise a model of translation grammar that allows us to build a machine translation system which delivers translations which are as good as the present state of our understanding of text coherence allows, and which at the same time is open to improvements in step with further progress in translation-oriented text grammar and text pragmatics both inside our research team and elsewhere. It has, in other words, been our aim to make some steps towards (preliminary) implementation possible right now, although we are aware of the fact that much more research of both a fundamental and an application-oriented nature remains to be done. It is with this aim in mind that we now outline four lines along which a preliminary implementation should be undertaken for specific source and target languages. 1. Feature lists. It is important to compile exhaustive lists of all translationrelevant grammatical features of a given source or target language, expressed in function morphemes or by other means, and contrast this list with the corresponding list of the language from or into which it will be translated. (In the DLT setting this is always the intermediate language Esperanto.) It should be made clear for a given language pair and a given translation direction which features are translatable at which level of grammatical analysis and whether or not text level knowledge is needed (or helpful) for translating them. This measure provides for a complete coverage of the contrastive translation rules. 2. Theme-rheme regularities. It is important to devise reliable rules for detecting and rendering the communicative functions of syntagmata in clauses, and sentences as expressed in the theme-rheme distinction. The regularities derived from these rules resolve many questions which are
198
unresolvable at the sentence level. They also enhance the mechanisms mentioned under 3, below. 3. Reference and deixis. Referential and deictic links are the backbone of text coherence as far as word choice and its semantic-pragmatic impact is concerned. For a given language pair, precise mechanisms should be devised in order to - detect candidate referents (correlates ...) of both content words and of pronouns etc. by syntactic means. - measure the semantic synonymy between a given word and a candidate referent, and - assess the degree of pragmatic coreferentiality between the two. Especially the latter mechanism - the decisive one, although it heavily relies on the other two - can profit a lot from possibilities for finding typical event sequences in a semantic-pragmatic knowledge bank. 4. Target language checks. Translation rules for text coherence should not only perform the choices needed for producing a good translation, but - to assess (and, if necessary, improve) the quality of the suggested translation they should also be equipped with ? mechanism which counterchecks the effect of the chosen coherence devices in the target text. Due to the uncertainties of the field, text-grammatical rules should, as it were, have built-in quality checks. The most crucial mechanism of this list is number 3 for reference and deixis. In our view it is necessary to base solutions on the implicitness principle in machine translation.
199
Index
act: 30, 187 adequate translation: 193 agglutinativity: 30, 161 agreement: 57, 185 aktionsart: 34, 187 aktudlnf 51en&ii v£ty: 13, 92 ambiguity: 11 —» contrastive ambiguity disambiguation —> disambiguation dialogue —> monolingual ambiguity anaphora: 50 deixis antecedent correlate aspect: 32, 187 —> syntactic feature bifurcated theme bilingual ambiguity
->
development of a bifurcated theme contrastive ambiguity
candidate referent: 37, 72, 199 case (morphosyntactic) -> syntactic feature case (semantic) —> deep case cataphora: 50 —> deixis circumstantial information: 23 class - instance type - instance clause order: 175 —> syntagma order —> word order cleft sentence: 181 coherence - cohesion: 19 coherence of entities: 167, 190 coherence of events: 184, 190
200
coherence of focus: 173, 177, 190 comment —> topic - comment conceptual proximity: 29 content —» form - content —> linguistic sign content moipheme: 162 —» morpheme context: 11,12,74,171,178 contrastive ambiguity: 11, 74 contrastive grammar: 87 —» grammar Copenhagen school: 13 corpus analysis: 190 correctness: 192 correlate: 63, 67 deep case: 41, 44, 128, 135ff. definite reference: 29, 78, 85, 87 —> indefinite referece —» reference deixis: 50, 170 —> pronominal reference dependency grammar: 18 dependency semantics: 93, 128 dependency syntax: 16 development of a bifurcated theme: 94, 98 —» thematic progression dialogue —> disambiguation dialogue diathesis: 179 disambiguation: 51 —> ambiguity disambiguation dialogue: 15, 34, 126, 190 discourse analysis —> text grammar Distributed Language Translation: 10, 15, 169 DLT —> Distributed Language Translation Dutch: 163 endophor: 80 —> reference English: 15, throughout chapter 2, 175, 180 entity: 76 —» item Esperanto: 16, throughout chapter 2, 169, 177 event: 30, 31, 143, 181, 187 event sequence: 191 exophor: 80 —» reference extraposition: 181 figurative speech:
27
201
Finnish: 175 fixed word order —> word order focus: 36, 72, 87, 90, 93, 94, 106, 119 —> theme - rheme form - content: 11, 17, 160 form - function: 18, 111, 182 form government: 185 free choice: 164, 167, 187 —» linguistic sign free word order word order French: 15, 58, 163, 180 function —¥ form - function function morpheme: 162 functional sentence perspective —> aktualni clenenf vety full understanding —» understanding fully automatic machine translation: 15, 26 German: 31, 163, 180, 183 German school in text linguistics: given - new: 36, 88, 93, 102 theme - rheme glossematics: 13 grammar: 9, 17 grammaticality: 192 head of a syntajgma header: 24
->
13
internal governor
idiomatic expression: 108 implicitness principle: 180, 196, 199 indefinite reference: 78, 87 instance —» type - instance interactive dialogue —» disambiguation dialogue intermediate language: 15, 169 internal governor: 64 island-driven method: 168 ist-soll method: 191 Italian: 165 item: 31, 143 —> entity Japanese:
38
known information: 29, 32, 36, 79, 87 knowledge representation: 196 language: 17 lexeme choice: 167 lexical knowledge bank: lexical variation: 75
16, 168
202
linear progression: 94, 105 —» thematic progression linguistic sign: 163, 164, 166, 167 logic: 191 logical connection: 148, 154 markedness: 41, 109, 177 memory: 67 mentioned information: 29 metaphor: 27, 142 metataxis: 16, 139 monolingual ambiguity: 11,74 mood: 32, 187 —» syntactic feature morpheme: 30, 160 morpheme class: 31,143 —» word class morphemes and relations: 160ff. name: 35, 40, 79, 81 new information —> given - new nominalization: 20, 134 —» veibal element occurrence: 30, 187 paragraph header: 24 pragmatic reference identity: 82 pragmatics: 16, 17, 189, 191 Prague school: 13 predication: 93, 134, 181, 187 progression with a constant theme: 94, 95 —> thematic progression progression with a derived theme: 94, 97 —» thematic progression progression with a discontinuous theme: 94 —» thematic progression prominence: 36,44 pronomial reference: 37, 50 —» deixis pun: 27 quality:
31, 143
redundancy: 53, 58, 161 reference: 74, 141, 170, 199 reference identity: 76, 87 reintroduction of concepts: 109 rheme —> theme rhetorical pattern: 46, 148 Russian: 165, 175, 185
203
scenario: 191, 197 scene: 191 script: 191 semantic feature: 58, 65, 66, 72, 129, 162, 177, 198 semantic function: 129 semantic primitive: 144 semantic reference identity: 82 semantics: 16, 17, 160, 191 sociolinguistics: 17 Spanish: 165 spoken language: 53 state: 30, 187 style: 47 subtitle: 24 summarizing: 88, 122 superstructure of text: 24 synonymy: 29, 75, 87, 199 syntactic feature: 57, 65, 66, 72, 162, 177, 198 syntactic function: 129 syntagma order: 175, 177 —» clause order -» word order syntax: 17, 160, 191 tagmemics: 13 tail: 45, 62, 93 tense: 32, 187 —> syntactic feature term: 35,40 tertium comparationis: 129, 180, 190, 196 text grammar: 9 text linguistics: 9, 13 thematic pattern: 90 thematic progression: 94ff. —» development of a bifurcated theme —> linear progression —» progression with a constant theme —» progression with a derived theme —» progression with a discontinuous theme —» theme - theme theme - rheme: 13, 35, 90, 92ff„ 176, 183, 198 title: 24 topic - comment: 92, 93 —> theme - rheme translatability: 160 translation alternatives: 11, 18, 195 translation ambiguity -> contrastive ambiguity translation grammar: 159 translation on the fly: 23 translation-relevant: 164, 190, 166
204
type - instance:
76, 83, 148
unbounded dependency: 181 understanding: 20, 87, 148, 194 unique identification: 40 verbal element: 128, 130 —> morpheme class voice: 20, 32, 41, 179 word class: 31 —» morpheme class word expert system: 16, 168 word grammar: 30, 161 word order: 174, 182 —> clause order —> syntagma order
205
References
Alshawi, Hiyan (1987): Memory and context for language interpretation. Cambridge etc.: Cambridge University Press Ammann, Hermann (1928): Die menschliche Rede. Sprachphilosophische Untersuchungen. Part II: Der Satz. Lahr: Schauenburg [4th ed. Darmstadt 1974] Beaugrande, Robert-Alain de / Wolfgang Ulrich Dressler (1981): Einführung in die Textlinguistik. Tübingen: Niemeyer Cholodoviö, A. A. (1970): Zalog. In: Kategorija zaloga. Leningrad: AN SSSR, Institut jazykoznanija, pp. 2-26 Danes, Frantiäek (1968): Typy tematickj'ch posloupnosti v textu (na materiäle ceskeho textu odborndho). In: Slovo a Slovesnost 29, pp. 125-141 Daneä, Frantigek (1970): Zur linguistischen Analyse der Textstruktur. In: Folia linguistica 4, pp. 72-78 Dijk, Teun A. van (1978): Tekstwetenschap - een interdisciplinaire inleiding. Utrecht/Antwerpen: Spectrum Dik, Simon C. (1978): Functional grammar. Amsterdam/New York/Oxford: North-Holland Dixon, Peter (1971): Rhetoric. London/New York: Methuen Dressier, Wolfgang Ulrich (1972): Einführung in die Textlinguistik. Tübingen
206
Fillmore, Charles J. (1968): The case for case. In: Universals in linguistic theory. Emmon Bach / Robert T. Harms (eds.). New Yoric: Holt, Rinehart & Winston, pp. 1-88 Geist, Uwe (1987): The three levels of connectivity in a text. In: Journal of Pragmatics 11, pp. 737-749 Giilich, Elisabeth / Wolfgang Raible (1977): Linguistische Textmodelle. München: Fink Hajiöovä, Eva / Petr Sgall (1987): The ordering principle. In: Journal of Pragmatics 11, pp. 435-454 Hajicovä, Eva / Petr Sgall (1988): Topic and focus of a sentence and the patterning of a text. In: Text and discourse constitution. Jänos S. Petöfi (ed.). Berlin/New York: de Gruyter, pp. 70-96 Halliday, M. A. K. (1967): Notes on transitivity and theme in English. In: Journal of Linguistics 3, pp. 37-81, 199-244 Hartmann, Peter (1971): Texte als linguistisches Objekt. In: Beiträge zur Textlinguistik. Wolf-Dieter Stempel (ed.). München, pp. 9-29 Harweg, Roland (1968): Pronomina und Textkonstitution. München Heger, Klaus (1976): Monem, Wort, Satz und Text. Tübingen: Niemeyer, 2nd rev. ed. Hirst, Graeme (1981): Anaphora in natural language understanding: a survey. Berlin/Heidelberg/New York: Springer Hjelmslev, Louis (1943): Omkring sprogteoriens grundlceggelse. = Festskrift udgivet af K0benhavn Universitet i Anledning af Universitets Aarsfest. K0benhavn: Universitet [new. ed. s.l.: Akademisk forlag 1966] Hoey, Michael (1983): On the surface of discourse. London/Boston/Sydney: Allen & Unwin Hutchins, William John (1986): Machine translation: past, present, future. Chichester: Horwood / New York etc.: Wiley Jordan, Michael P. (1984): Rhetoric of everyday English texts. London/Boston/Sydney: Allen & Unwin Kalocsay, K. / G. Waringhien (1980): Plena analiza gramatiko de Esperanto. Rotterdam: Universala Esperanto-Asocio, 4th ed.
207
Koivulehto, Jorma (1985): Zur Frage der Kategorie "Wortart" beim Grundmorphem. In: Grammatik im Unterricht. Kurt Nyholm (ed.). = Meddelanden frän Stiftelsens för Abo Akademi forskningsinstitut 103. Äbo: Abo Akademi, pp. 133-141 Larson, Mildred L. (1984): Meaning-based translation. Lanham/New York/London: University Press of America Leech, Geoffrey N. (1971): Meaning and the English verb. London: Longman, 8th print. 1979 Leech, Geoffrey / Jan Svartvik (1975): A communicative grammar of English. Harlow: Longman, 9th print. 1984 Mathesius, Vilem (1929): Zur Satzperspektive im modernen Englisch. In: Archiv für das Studium der neueren Sprachen und Literaturen 84, pp. 202-210 Mathesius, Vilem (1939): O tak zvandm aktuälnim ölenSni väty. In: Slovo a Slovesnost 3, pp. 171-174 Melby, Alan K. (1986): Lexical transfer: a missing element in linguistics theories. In: 11th International Conference on Computational Linguistics, Proceedings of Coling '86. Bonn: Institut für angewandte Kommunikations- und Sprachforschung, pp. 104-106 Melby, Alan K. (forthc.): [...] In: [12th International Conference on Computational Linguistics, Proceedings of Coling '88 (Budapest)] Mel'Cuk, Igor A. (1988): Dependency syntax: theory and practice. Albany: State University of New York Press Mel'cuk, I. A. / A. A. Cholodovic (1970): K teorii grammaticeskogo zaloga. In: Narody Azii i Afriki [4], pp. 111-124 Newmark, Peter (1981): Approaches to translation. Oxford etc.: Pergamon Nikula, Henrik (1986): Dependensgrammatik. Malmö: Liber Nirenburg, Sergei / Victor Raskin / Allen Tucker (1986): On knowledge-based machine translation. In: 11th International Conference on Computational Linguistics, Proceedings of Coling '86. Bonn: Institut für angewandte Kommunikations- und Sprachforschung, pp. 627-632 Papegaaij, B. C. (1986): Word expert semantics. An interlingual knowledge-based approach. Dordrecht/Riverton: Foris
208
Papegaaij, B. C. / V. Sadler / A. P. M. Witkam (1986): Experiments with an MT-directed lexical knowledge bank. In: 11th International Conference on Computational Linguistics, Proceedings of Coling '86. Bonn: Institut für angewandte Kommunikations- und Sprachforschung, pp. 432-434 Petöfi, Jänos Sändor (1969): On the problems of co-textual analysis of texts. In: International Conference on Computational Linguistics (Coling) Sänga-Säby 1969. Preprints 50. Stockholm: KVAL Pike, Kenneth Lee (1967): Language in relation to a unified theory of the structure of human behavior. The Hague/Paris: Mouton, 2nd rev. ed. Sampson, Geoffrey (1987): MT: a nonconformist's view of the state of the art. In: Machine translation today: the state of the art. Margartet King (ed.). Edinburgh: Edinburgh University Press, pp. 91-108 Saussure, Ferdinand de (1916): Cours de linguistique générale. Paris: Payot, new. ed. 1969 Schank, Roger C. (1975): Conceptual information processing. Amsterdam: North-Holland Schubert, Klaus (1982): Aktiv und Passiv im Deutschen und Schwedischen. = SAIS Arbeitsberichte aus dem Seminar für Allgemeine und Indogermanische Sprachwissenschaft 5 [Kiel] Schubert, Klaus (1986a): Syntactic tree structures in DLT. Utrecht: BSO/Research Schubert, Klaus (1986b): Linguistic and extra-linguistic knowledge. A catalogue of languagerelated rules and their computational application in machine translation. In: Computers and Translation 1, pp. 125-152 Schubert, Klaus (1987a): Wann bedeuten zwei Wörter dasselbe? Über Tiefenkasus als Tertium comparationis. In: Linguistik in Deutschland (Akten des 21. Linguistischen Kolloquiums, Groningen 1986). Werner Abraham / Ritva Ârhammar (eds.). Tübingen: Niemeyer, pp. 109-117 Schubert, Klaus (1987b): Metataxis. Contrastive dependency syntax for machine translation. Dordrecht/Providence: Foris Schubert, Klaus (forthc. a): Interlingual terminologies and compounds in the DLT project. In: [Proceedings of the International Conference in Machine and Machine-Aided Translation (Birmingham 1986)]
209
Schubert, Klaus (forthc. b): The implicitness principle in machine translatioa In: [12th International Conference on Computational Linguistics, Proceedings of Coling '88 (Budapest)] Schubert, Klaus (forthc. c): Ausdruckskraft und Regelmäßigkeit. Was Esperanto für automatische Übersetzung geeignet macht. In: Language Problems and Language Planning 12 [1988] Schubert, Klaus (forthc. d): Zu einer sprachübergrcifenden Definition der Valenz. In: [Linguistische Studien. Gerhard Heibig (ed.)] Sidner, Candace L. (1983): Focusing in the comprehension of definite anaphora. In: Computational models of discourse. Michael Brady / Robert C. Berwick (eds.). Cambridge [USA]/London: MIT Press, pp. 267-330 Somers, H. L. (1987): Valency and case in computational linguistics. Edinburgh: Edinburgh University Press Sparck Jones, Karen (1986): Synonymy and semantic classification. Edinburgh: Edinburgh University Press Tsujii, Jun-ichi (1986): Future directions of machine translation. In: 11th International Conference on Computational Linguistics, Proceedings of Coling '86. Bonn: Institut für angewandte Kommunikations- und Sprachforschung, pp. 655-668 Tucker, Allen B. / Sergei Nirenburg (1984): Machine translation: a contemporary view. In: Annual Review of Information Science and Technology 19, pp. 129-160 Vasconcellos, Muriel (1986): Functional considerations in the postediting of machine-translated output. In: Computers and Translation 1, pp. 21-38 Wells, John C. (1978): Lingvistikaj aspektoj de Esperanto. Rotterdam: Universala Esperanto-Asocio Weinrich, Harald (1972): Die Textpartitur als heuristische Methode. In: Der Deutschunterricht 24 [4], pp. 43-60 Winter, Eugene A. (1982): Towards a contextual grammar of English. London etc.: Allen & Unwin Witkam, A. P. M. (1983): Distributed Language Translation. Feasibility study of a multilingual facility for videotex information networks. Utrecht: BSO Witkam, A. P. M. (1985): Distribuita lingvo-tradukado. In: Perkomputila tekstoprilaboro. Ilona Koutny (ed.). Budapest: Scienca Eldona Centro, pp. 207-228
210
Witkam, A. P. M. (forthc.): DLT - an industrial R&D project for multilingual MT. In: [12th International Conference on Computational Linguistics, Proceedings of Coling '88 (Budapest)]
211