193 78 25MB
English Pages [258] Year 1993
Linguistic Issues in Machine Translation
Linguistics: Bloomsbury Academic Collections
This Collection, composed of 19 reissued titles from The Athlone Press, Cassell and Frances Pinter, offers a distinguished selection of titles that showcase the breadth of linguistic study. The collection is available both in e-book and print versions. Titles in Linguistics are available in the following subsets: Linguistics: Communication in Artificial Intelligence Linguistics: Open Linguistics Linguistics: Language Studies
Other titles available in Linguistics: Communication in Artificial Intelligence include: New Concepts in Natural Language Generation: Planning, Realization and Systems, Ed. by Helmut Horacek and Michael Zock Evolutionary Language Understanding, Geoffrey Sampson Text Knowledge and Object Knowledge, Annely Rothkegel User Modelling in Text Generation, Cécile L. Paris Expressibility and the Problem of Efficient Text Planning, Marie Meteer
Linguistic Issues in Machine Translation Edited by Frank Van Eynde
Linguistics: Communication in Artificial Intelligence BLOOMSBURY ACADEMIC COLLECTIONS
Bloomsbury Academic An imprint of Bloomsbury Publishing Plc
Bloomsbury Academic An imprint of Bloomsbury Publishing Plc 50 Bedford Square London WC1B 3DP UK
1385 Broadway New York NY 10018 USA
www.bloomsbury.com BLOOMSBURY and the Diana logo are trademarks of Bloomsbury Publishing Plc First published in 1993 by Pinter Publishers Limited This edition published by Bloomsbury Academic 2015 © Frank Van Eynde and Contributors 2015 Frank Van Eynde has asserted his right under the Copyright, Designs and Patents Act, 1988, to be identified as Editor of this work. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage or retrieval system, without prior permission in writing from the publishers. No responsibility for loss caused to any individual or organization acting on or refraining from action as a result of the material in this publication can be accepted by Bloomsbury or the author. British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: HB: 978-1-4742-4654-5 ePDF: 978-1-4742-4655-2 Set: 978-1-4742-4692-7 Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress Series: Bloomsbury Academic Collections, ISSN 2051-0012
Printed and bound in Great Britain
Linguistic Issues in Machine Translation
Linguistic Issues in Machine Translation Edited by Frank Van Eynde
Pinter Publishers, London and New York Distributed in the United States and Canada by St Martin's Press
Pinter Publishers 25 Floral Street, Covent Garden, London WC2E 9DS, United Kingdom First published in 1993 © Frank Van Eynde and contributors, 1993 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may not be reproduced, stored or transmitted, in any form or by any means, or process without the prior permission in writing of the copyright holders or their agents. Except for reproduction in accordance with the terms of licences issued by the Copyright Licencing Agency, photocopying of whole or part of this publication without the prior written permission of the copyright holders or their agents in single or multiple copies whether for gain or not is illegal and expressly forbidden. Please direct all enquiries concerning copyright to the Publishers at the address above. Distributed exclusively in the USA and Canada by St. Martin's Press, Inc., Room 400,175 Fifth Avenue, New York, NY 10010, USA British Library Cataloguing in Publication Data A CIP catalogue record for this book is available from the British Library ISBN 185567 024 0 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book is available from the Library of Congress
Typeset by Joshua Associates Ltd, Oxford Printed and bound in Great Britain by Biddies Ltd., Guildford and King's Lynn
Contents
Preface 1. Machine translation and linguistic motivation Frank Van Eynde
vi 1
2. Co-description and translation Louisa Sadler
44
3. The interaction of syntax and morphology in machine translation Paul Bennett
72
4. Dependency and machine translation Toni Badia
105
5. On the translation of prepositions in multilingual MT Jacques Durand
138
6. Translationfillsthe gaps: a new approach to unbounded dependencies Valerio Allegranza 160 Index
235
Preface
There are not many book length publications on machine translation, and most of them focus either on the state of the art in the field or on presentations of particular systems. This book is different in that respect: it does not concentrate on systems but on issues and on choices which any MT researcher or developer faces. Existing MT systems all offer one way or another of dealing with the issues raised, and those ways will be discussed when relevant. Indeed, the book abounds with references to systems like Ceta, Eurotra, Metal, Taum-Meteo, Rosetta, DLT, LFG-based MT, and some of the Japanese systems. However, the function of those references is not to present any of these systems in detail, but to provide concrete illustrations of how the general issues can be dealt with. Our aim, then, is not to convince the reader that one particular system is superior to the other but to contribute to the development of a theory of MT. The perspective from which the issues are raised and discussed is primarily a linguistic one. This is due to the background of the authors, all of whom are linguists, but also to our conviction that most of the hard problems in machine translation are of a linguistic nature. The choice of hardware and programming language(s), the definition of a formalism and a user language, the incorporation of world knowledge and statistical data, the maintenance of large dictionaries and terminology collections are all interesting and difficult problems in their own right, but whatever choices or solutions one proposes in these areas, they will have to be integrated in a system which - by necessity deals with linguistic data, both monolingual and bilingual. Which, then, are the issues which will be discussed in this book? There are two kinds of them: the first three chapters concentrate on issues of a general nature. 'Machine translation and linguistic motivation' concentrates on the interaction of monolingual and bilingual knowledge in MT systems. The issue is traditionally known as the transfer vs. interlingua debate, and will be assessed in this chapter from the point of view of linguistic motivation. More specifically, it will be investigated how far one can go in simplifying transfer
Preface vii
while sticking to the requirement that the monolingual modules be linguistically motivated. At the same time, it will be investigated how far one should go in simplifying transfer if one wants the monolingual modules to be translationally relevant. The confrontation of these - potentially conflicting - requirements results in a new view on interlinguality, and in a plea for the use of representation languages with a model-theoretic interpretation. 'Co-description and translation' compares and evaluates two different control strategies for MT. One of them is the structure-based approach, which processes texts by means of the recursive application of rules to representations. The other is the constraint-based approach, which makes use of constraints on translational equivalence, with no commitment to sequential recursive processing. The latter, which was first developed in LFG, has a number of advantages, both conceptual and computational, but it does not fare well with a number of difficult cases for MT, such as modifier incorporation and head switching. The author shows how these cases can be dealt with by proposing some changes to the original LFG formalism. The interaction of syntax and morphology in machine translation' explores the possibility of translating subparts of words along the same compositional lines as the constituents of sentences. Given the fact that words sometimes have to be translated as syntactic constructs and vice versa, the examination of compositional translation cannot be confined to syntax, but must consider word structure as well, especially in the case of compounds. A subsidiary aim of the chapter is to assess a number of proposals in theoretical morphology with respect to their usefulness for MT. Special attention will be paid to the so-called Lexicalist Hypothesis. Next to these three contributions which address issues of a general nature, there are three chapters which concentrate on particular problems and phenomena. They provide case studies, but with an open eye and special attention for the implications of the resulting treatment for the general framework. 'Dependency and machine translation' concentrates on the treatment of syntagmatic relations in translation. The author proposes a semantically oriented version of dependency grammar, in which the grammatical functions and relations are represented in terms of governors (heads), arguments and modifiers. Special provisions are made for cases of coordination and sentential operators. 'On the translation of prepositions in multilingual MT' is framed within the same version of dependency grammar. What makes the translation of prepositions particularly interesting is that they concern both lexical and grammatical structure, that they turn out to be multiply ambiguous, and that the phrases which are headed by them display attachment ambiguities. The issues raised go beyond prepositions as a grammatical category and encompass morphological
viii Preface
cases and postpositions. More generally, one of the major questions is that of the semantic characterisation of modifiers. Translation fills the gaps: a new approach to unbounded dependencies' is another case study. Unbounded dependency phenomena are notoriously difficult to treat in a simple and satisfactory way in natural language processing. Various methods and techniques have been developed to cope with them, but most of them are computationally 'expensive' and few of them take into account the further difficulties which arise when one integrates the treatment in a system for multilingual MT. The author therefore proposes a new treatment, which pays full attention to both the monolingual and the bilingual aspects of unbounded dependencies. All of the chapters go beyond the discussion of alternatives and possibilities; they actually arrive at specific proposals and conclusions. Whether those conclusions are tenable in the long run, remains to be seen. Experimentation and further discussion will decide on that. In the mean time, we hope that this book will contribute to the emergence of a new way of writing about MT, one in which issues are more important than projects and in which particular solutions are evaluated with respect to their significance for a general theory of MT, rather than with respect to the particularities of their implementation in some given system. Frank Van Eynde Leuven, March 1992
1 Machine translation and linguistic motivation Frank Van Eynde
A good translation is one which conforms to the rules and idiom of the target language, while at the same time preserving - as much as possible - the meaning of the original. The production of such translations requires a sound knowledge of the source and target language (monolingual knowledge), of the relation between both (bilingual knowledge), and of the subject matter (extralinguistic knowledge). How these three types of knowledge (ought to) interact in the automatic production of translations is one of the central issues in current MT research. The position taken in this chapter is that the linguistic knowledge constitutes the basis for MT and that the way to have it interact with extra-linguistic knowledge is via a level of linguistic representation which is assigned a modeltheoretic interpretation, i.e. an interpretation in terms of extra-linguistic elements. The main topic of this chapter, however, is not the interaction between linguistic and extra-linguistic knowledge, but rather the interaction between monolingual and bilingual knowledge. The division of labour between monolingual and bilingual modules in MT systems is a bone of contention in the transfer vs. interlingua debate, and will be assessed in this chapter from the point of view of linguistic motivation. More specifically, it will be investigated how far one can go in simplifying the bilingual modules while sticking to the requirement that the monolingual modules be linguistically motivated. Starting from the distinction between bilingual and monolingual knowledge (section 1) I will discuss four methods for simplifying bilingual modules. They will be applied in a few problem areas, (word order, articles and lexical ambiguity), and assessed with respect to their general applicability (section 2) and with respect to their transfer reducing potential (section 3). The assessment results in a new view on the interaction of monolingual and bilingual knowledge in MT (section 4).
2
Frank Van Eynde
1. Bilingual and monolingual information Consider the translation pair (1) EN. I think that herons eat fish (2) NL. ik denk dat reigers vis eten In order to relate the English sentence to its equivalent in Dutch an MT system needs bilingual lexical information: (3) I -• ik that -• dat
think -> denk eat -• eten
herons -* reigers fish -• vis
This information is not sufficient, though. When applied without modification it yields the ungrammatical (2') NL. *ik denk dat reigers eten vis In order to get a grammatical result the system should change the order of the words in the subclause. In this case a permutation of the last two words may do the trick, but this is not a general solution. Indeed, applying the same operation when translating (4) EN. I think that herons eat much fish would yield the ungrammatical (5) NL. *ik denk dat reigers eten vis veel instead of the grammatical (5') NL. ik denk dat reigers veel vis eten What is needed, apparently, is not a permutation of the last two words, but of the last two constituents: [eat] [much fish] -• [veel vis] [eten]. Such a permutation can only be applied if the constituents of the source language sentence have been identified and this is a matter of monolingual syntactic analysis.l The result of this analysis could be represented as in
Machine translation and linguistic motivation (6)
3
s
np
v
pron
think
I I
s
| 1
I
comp
np
v
i i
i
|
that
n
i • i
np ;
i
eat
—
i
q i i
—
n i i
herons much fish In order to get a grammatical Dutch sentence the system can now make use of a bilingual syntactic rule which moves the verb to the end of the subclause: (7) [comp np v X] -• [ comp np X v] 1 2 3 4 1 2 4 3 Since X stands for any sequence of major category symbols, this rule is not only applicable when translating (4), where X stands for NP, but also when translating a sentence like (8) EN. I think that John gave her arose NL. ik denk dat Jan haar een roos gaf where X stands for the sequence of two NPs: 'her' and 'a rose'. Apart from making translation possible the addition of monolingual operations can also make translation more efficient. As an example, let us have another look at (1) EN. I think that herons eat fish Instead of using rules for the translation of words, as in (3), we could use rules which translate stems, as in (9) heron -• reiger
eat -• eten
think -• denken
supplemented with a set of rules for the mapping of number, tense, and other features (10) singular -• enkelvoud plural -• meervoud
present -• tegenwoordig past -• verleden
The net result is a considerable gain in efficiency: if the number of nouns is n, for instance, one needs nx2 rules in a word-based system - one for the singular
4
Frank Van Eynde
and one for the plural of each noun - whereas in a stem-based system one will need n+2 rules. This economy can only be achieved, if every English noun is automatically reduced to its stem and a specification of its number (11) herons -• heron + plural heron -• heron + singular and if the Dutch nouns are automatically derived from combinations of a stem and number value (12) reiger + meervoud -• reigers reiger + enkelvoud -+ reiger What is needed, in other words, is a set of rules for monolingual morphological analysis and generation.
As is clear from the given examples, translations result from the interaction between bilingual and monolingual information. This interaction can take various forms, but in mainstream MT it has become standard to draw rather strict boundaries between both types of information. In procedural terms, the translation process is divided in three sequentially ordered steps. The first is called 'analysis'; it concerns the application of monolingual rules to the source language input. These rules are based on monolingual lexical and morpho-syntactic information, and result in representations like (6). The second is called 'transfer'; it concerns the application of bilingual rules to the representations which result from analysis. These rules are largely based on lexical information, as in (9), but also on syntactic and morphological information, as in (7) and (10). The third step is called 'synthesis' or 'generation'; it concerns the application of monolingual rules to the representations which result from transfer. In order to guarantee semantic equivalence of source and target sentences all operations, whether monolingual or bilingual, should be meaning preserving. Given this scheme, a bilingual MT system has two monolingual morphosyntactic modules and a mediating transfer module:2
Machine translation and linguistic motivation 5 (1) EN. I think that herons eat fish English analysis s i
np sing i i
s
sing pres
comp
i i
i i
1
i
think
np plur i
that
i
n plur
i
I
V
ti-
pron sing
i
V
eat
i
I
np sing i
n sing i
fish
heron
f English-to-Dutch transfer s i
np enkel i i
pron enkel i i
i
s
V
enkel tegenw i i
denken
comp i i
dat
ik
i i
np meer
np enkel
n meer
n enkel
reiger
vis
i i
i i
i i
i i
V
meer tegenv i i
eten
* Dutch synthesis
(2) NL. ik denk dat reigers vis eten [enkel = enkelvoud; meer = meervoud; tegenw = tegenwoordige tijd]
Notice that the division of labour between the three modules is flexible. We could, for instance, choose to transfer words instead of stems and drop the rules
6
Frank Van Eynde
for morphological analysis. Conversely, we could choose to eliminate some of the syntactic operations in transfer by relating the sentences to more abstract representations in the monolingual modules (cf. section 2). In both cases there is a trade off between monolingual and bilingual operations: the monolingual modules are made simpler at the expense of the bilingual ones, or vice versa. As long as one translates between two languages the division of labour may seem of little import. When more languages are added, though, the situation changes. Indeed, in a multilingual system, dealing with n languages, there are nx2 monolingual modules (one analysis and one synthesis for each language) and nx(n-l) transfer modules (one for each language pair). A system which translates between more than three languages will therefore have (much) more bilingual than monolingual modules. It follows that there is a distinct advantage in shifting labour from the bilingual to the monolingual modules. A second advantage concerns the reusability of the system. The bilingual modules of an MT system can only be used for translation, but the monolingual modules can be used for a host of other applications, such as style and grammar checking, information storage and retrieval, text generation, and so on. It follows that the more operations are performed in the monolingual modules, the more interesting they become for reuse in other language processing systems. Given these advantages the ideal MT system is one with as few bilingual operations as possible. As a matter of fact, there are projects which explicitly aim at the reduction of all bilingual operations, such as Rosetta, and there are other projects which want to reduce them as much as possible, such as Eurotra (cf. 3.2). Of course, it is easy to decree that in the best of all possible worlds there should be few or no bilingual operations, but it is less easy to see how this can be done in practice, and what its consequences are for the construction of the monolingual modules. These will be explored in the next section. 2. Four methods for reducing transfer This section presents four methods for reducing the complexity of the bilingual modules. Before discussing their applicability in general terms I will prepare the ground with some case studies on selected topics. In order to get a representative sample I will subsequently discuss an instance of permutation (2.1. word order), an instance of deletion/addition (2.2. articles) and an instance of substitution (2.3. lexical ambiguity). Two of the four possible methods will be shown to have unwanted sideeffects; the two other ones will be examined more closely in paragraph 2.4 (normalisation and abstraction).
Machine translation and linguistic motivation 7 2.1. Word order In the first section the discrepancy in word order between Dutch and English was treated by a bilingual movement rule. Given the fact that word order is language specific and that even the ordering of such basic constituents as Verb, Subject and Object, is not the same in both languages, this seems like a plausible move. However, if the aim is to reduce transfer, the question arises whether the reordering could not be left to the monolingual modules. One method for shifting the reordering to the monolingual modules is based on the anticipation of the target word order in the analysis of the source language. Taking Dutch as the target language and assuming that the basic order in Dutch is SOV (cf. Koster, 1975), anticipation amounts to the derivation of a Verb-final representation in English analysis. In this way one avoids the necessity of reordering in transfer. This method has some disadvantages, though. For one thing, the English analysis module has to assign representations which lack linguistic motivation: there is no syntactic evidence for assuming that English is Verb-final. Second, English analysis becomes target language dependent. As a result one will have to create as many English analysis modules as there are different target language orders. If the target language were French, for instance, the English analysis module would have to assign the order which is required for French, i.e. Verb-second, and when translating to a Verb-initial language, such as Samoan, the result of analysis should be different again. The mirror image of anticipation is readjustment. It leaves the English Verbsecond order unchanged in transfer and defers the necessary changes to Dutch synthesis. The disadvantages of this method are similar to the ones of anticipation: it leads to linguistically unmotivated representations in Dutch synthesis and it makes the latter source language dependent. Speaking in general terms, anticipation and readjustment only seem to reduce the number of bilingual operations; in practice they simply shift them to the monolingual modules, making the latter as language pair specific as transfer itself.
There are, however, other ways of avoiding reordering in transfer. One is based on the assumption that there is some natural way of ordering constituents and that this order is common to all languages. What this natural order is, is a matter of debate, of course. Already in the eighteenth century Diderot pondered the question:
8
Frank Van Eynde
... in an interesting essay devoted largely to the question of how the simultaneous and sequential array of ideas is reflected in the order of words, Diderot concludes that French is unique among languages in the degree to which the order of words corresponds to the natural order of thoughts and ideas (Diderot, 1751). Thus 'quel que soit l'ordre des termes dans une langue ancienne ou moderne, l'esprit de Pecrivain a suivi l'ordre didactique de la syntaxe fran^aise' (p. 390); 'Nous disons les choses en fran^ais, comme l'esprit est force de les considerer en quelque langue qu'on ecrive' (p. 371). [Chomsky 1965, 7] Since French is Verb-second, Diderot would have advocated SVO as the basic order for all languages. In more recent times, McCawley has investigated the matter again and concluded that the basic order of subject, verb and object in all languages is VSO, i.e. Verb-initial [McCawley, 1970]. The important point, however, is not so much which basic order one chooses, but rather that there is one basic order and that it is uniformly applied to all languages. The name I will adopt for this method is normalisation. Normalisation avoids one of the problems with anticipation and readjustment: it does not smuggle any language pair specific operations in the monolingual modules. The other problem remains, though. Since it imposes some fixed order on all languages, also on languages which display a rather different surface order, the monolingual modules for the latter may lack linguistic motivation. Indeed, imposing a Verb-initial order on French, for instance, is in conflict with all syntactic evidence for treating it as a Verb-second language. In order to avoid this problem with normalisation one could simply dispense with the whole issue of ordering by postulating a level at which the representations are unordered. This may seem like a bold move, but notice that there are linguistic theories which postulate unordered representations: lexicalfunctional grammar, relational grammar and dependency grammar all contain levels at which the representations are not explicitly ordered from left to right. Notice, furthermore, that it is a natural way to treat languages with relatively free word order. The shift from ordered to unordered representations will cause a loss of information, though. It is comparable to the shift from words to stems in section 1. If all of'ate', 'eat', 'eats', 'eating' and 'eaten' are reduced to the same stem 'eat', then the differences between them have to be recoded in some other way. An obvious way to dp this is by means of such features as number, tense and person. How this recoding has to be done for word order is less obvious, since word order expresses various types of information, including a.o. grammatical function (subject, object) and pragmatic role (focus). Limiting ourselves, for the sake of brevity, to the former, the order could be recoded as in
Machine translation and linguistic motivation
9
(13) V
gov I I
eat
np subj
np obj
i
i i
i
n gov i i
n gov i
fish
heron [gov = governor (the syntactic head of a phrase); subj = subject; obj = direct object] In this representation each node is marked for syntactic function; by convention the syntactic head (the governor) takes the first position and is followed by its dependants in some canonical order. For a presentation of dependency grammar and its use in MT, see the chapter by Toni Badia in this volume. The name I will adopt for this method of recoding is abstraction. A major advantage of abstraction is that it avoids the problem with normalisation. While the latter may lead to the imposition of a word order which is linguistically unmotivated, the former will never have this effect since it does not assign any specific order at all; it simply abstracts from order. There is just one caveat, albeit an important one: abstraction should only be applied if the information which is abstracted from and which is relevant for translation can be recoded in some other way; otherwise, it leads to a loss of information.
Summarising, of the various ways in which a bilingual movement rule can be avoided there are two which lead to the incorporation of language pair specific information in the monolingual modules, i.e. anticipation and readjustment. As such, they may simplify transfer, but they do not reduce the number of bilingual operations; they just shift them to analysis or synthesis, respectively. Since this deficiency is not due to any particularities of the treatment of word order, but turns up in any application, these methods will not be investigated any further. Normalisation and abstraction do not have this deficiency and will further be examined in the rest of the section. 2.2. Articles Addition and deletion are two sides of the same coin: if translation from A to B requires the addition of some element, then translation from B to A will usually
10 Frank Van Eynde
require the deletion of that same element. For that reason they will be treated in tandem. As an example let us consider the translation of (14) herons eat fish into French; this should not be (15) *herons mangent poisson but rather (16) les herons mangent du poisson Whereas English uses the bare plural for subjects in generic sentences, French uses the definite article, and whereas English uses no article for singular mass nouns in object position, French uses the 'article partitif. It would be possible to treat this discrepancy in a language pair specific way by inserting the necessary articles in English-to-French transfer. But it would also be possible to avoid any insertion or deletion in transfer by applying abstraction or normalisation. Abstraction would entail the postulation of a level at which the information conveyed by the articles is recoded in a way which does not make use of the articles themselves; an obvious way to do this is by means of features which are added to the NP nodes: (17) V
gov plur pres i i
manger
np subj plur DEFIN count
np obj sing PART mass
n gov plur count
n gov sing mass
heron
poisson
i i
l i
[DEFIN - definite; PART = partitive]
i i
i i
Machine translation and linguistic motivation
11
Since the resulting representations do not contain any articles, there is no need for insertion or deletion in the bilingual modules. Normalisation would entail the postulation of canonical representations for the articles. One could, for instance, require that every noun phrase contains one and only one article, and leave it to the monolingual modules to decide whether this article has to materialise at the surface or not. The articles of 'herons' and 'fish' in (14), for instance, should not materialise, whereas those of 'herons' and 'poisson' in (16) should. Notice that the notion of 'empty article' which is familiar from linguistics, derives from the application of normalisation. Also this method makes addition or deletion in transfer superfluous: since it assigns one and only one article to every noun phrase (in all languages) there is no need to add or delete any articles in transfer. The difference between the two strategies is that the one avoids transfer by requiring the universal absence of articles, whereas the other achieves it through the universal presence of articles. Comparing the two strategies, normalisation seems slightly less attractive because of the existence of languages with no articles at all, such as Latin and Russian. For such languages the postulation of normalised articles in all noun phrases seems less natural than the universal absence of articles. 2.3. Lexical ambiguity In all representations which have been given so far the lexical items are represented by themselves; in order to arrive at their target language equivalents I postulated bilingual lexical mappings, as in (18) heron -» reiger eat -• eten fish -• vis If such mappings are one-to-one, it is easy to eliminate them from transfer: it suffices to replace each lexical item in the monolingual modules by an arbitrary number and to use this number in the representations. One only has to take care that the numbers match: (19) heron eat fish
Ana Tra -* 3 = 3 -> 4 = 4 -•5 =5
Syn -• reiger -• eten -* vis
In general, matters are not that simple, though. Just consider the translation of 'herons eat fish' in German. German, unlike English, makes a distinction
12 Frank Van Eynde between eating by human beings and eating by animals. The former is expressed by the verb 'essen', the latter by the verb 'fressen'. The correct translation of (14) herons eat fish is therefore (20) Reiher fressen Fisch and not (21) *Reiher essen Fisch It follows that we need two rules for the translation of 'eat' in German: (22) a. eat -• essen b. eat -• fressen One way of handling this discrepancy is by means of language pair specific transfer rules as in (18) and (22). Another way would avoid lexical substitution in transfer and adopt an approach along the lines of (19). As is clear from a reformulation in terms of (19), the problem with lexical ambiguity does not concern the form of the lexical items, but rather the discrepancies in their degree of differentiation: German shows a higher degree of differentiation than English in the field of eating. I will now investigate how the methods of normalisation and abstraction can cope with such discrepancies.
Normalisation would postulate some basic degree of differentiation and impose it on all languages. This basic degree of differentiation may or may not coincide with the one which is found in the vocabulary of an existing language; this is of no importance, just as it is not important whether the 'basic' word order corresponds to the one in French or Dutch or Japanese (cf. 2.1). The important point is that one basic degree of differentiation is chosen and then uniformly applied to the vocabularies of all languages. An application of this strategy to (22) would run as follows. First, instead of simply adopting the degree of differentiation which is found in the vocabulary
Machine translation and linguistic motivation
13
of the source language (English), we postulate a set of semantic distinctions which induce a degree of differentiation of their own. Suppose, for instance, that the set of semantic distinctions includes {+/-human, +/-animate}. Next, we apply these semantic distinctions to the vocabularies of English and German. In combination with the distinctions which are expressed by the lexical items of these languages, the semantic distinctions induce a degree of differentiation which is either equal to or higher than the one which is expressed by the lexical items themselves. 'Heron' is an example of the former: it has the feature set {+animate, -human}; 'bachelor' is an example of the latter: it can be either {+animate, +human) (in the sense of 'unmarried male adult') or ^animate, -human} (in the sense of 'seal'). In the case of 'heron' the degree of differentiation is not increased by the addition of the semantic distinctions; in the case of 'bachelor' it is. Subsequently, we use the same semantic distinctions for typing the argument positions of the verbs in both languages. The relevant argument position in this case is the one taken by the subject (the eater); typing this position for both distinctions yields two possibilities {+animate, +human} and {+animate, -human} (assuming that eaters have to be animate). Since the English verb 'eat' can have both types of subjects, it will be assigned two representations, whereas the German verbs 'essen' and 'fressen' will each be assigned one representation: (23) eat, [ eat, [
subj {+human}] — 41 subj {-human}] -• 42
41 -• essen, [ 42 -• fressen, [
subj {+human}] subj {-human}]
Finally, in translation it becomes the task of the English analysis module to disambiguate between the two uses of 'eat'. The net result is that the lexical ambiguity is dealt with in the monolingual modules. It might be objected at this point that the resulting differentiation lacks linguistic motivation; indeed, monolingual English dictionaries do not treat the verb 'eat' as ambiguous between the two meanings which are distinguished in (23).3 This is a typical side-effect of normalisation. The imposition of some 'basic' degree of differentiation may result in unnatural choices for those languages of which the vocabulary displays another degree of differentiation.
Abstraction amounts to the postulation of a level of representation at which semantic differentiations are not made by means of lexical items, but at which
14 Frank Van Eynde
all the necessary information for the derivation of the lexical items is present. We could, for instance, imagine a set of semantic features, complemented with rules for combining them into representations which do not contain any lexical items, but which can be related in a meaning preserving way to language specific lexical items. Such de-lexicalised representations have actually been proposed in linguistics. The paradigm example is the representation which McCawley proposed for the verb 'to kill' (McCawley, 1968): (24)
S
CAUSE
X
S i
BECOME
S i
NOT
S ALIVE
i i
Y
Notice that the basic elements in this representation do not stand for the English lexical items, 'cause', 'become', 'not' and 'alive', but rather for language independent concepts; they are semantic primitives and do not belong to the vocabulary of any specific language. By removing all lexical items from the representation one loses their differentiating potential; as a consequence, all differentiations which are semantically relevant have to be encoded in the representation. This amounts to requiring from the representation that it provide an exhaustive decomposition of the meaning(s) of each lexical item in terms of a finite stock of semantic primitives. Such decomposition has been advocated in the 1960s and the early 1970s (cf. Katz and Fodor, 1963; Katz, 1972). A good example of this method is shown in (25) man -• +hum +male +adult boy -• +hum +male -adult
woman -> +hum -male +adult girl -* +hum -male -adult
The features +/-male and +/-adult suffice to differentiate the four lexical items, and together with the feature +human they may be claimed to provide an exhaustive decomposition of the relevant meanings. For most lexical items, though, such decomposition is less feasible. Take, for instance, the words 'pigeon' and 'turtle'. Their representations will, presumably, contain such features as +animate, -human and +bird, but these are not sufficient to differentiate them. Of course, one might consult Linnaeus and
Machine translation and linguistic motivation
15
adopt his taxonomy of the world's avifauna in terms of ordines, familiae, subfamiliae, genera and species, incorporating the Latin names for these natural classes into the stock of semantic primitives. In this way one will be able to differentiate between the meanings of both words, but the way in which it is done deserves some closer scrutiny. As it happens, 'pigeon' and 'turtle' denote different genera within the same subfamily; the name for the genus will, hence, do the job of differentiation. What is remarkable, now, is that the features which correspond to the names of the subfamily, the family and the ordo, including the features ^animate and +bird, have no differentiating function at all. As a matter of fact, they are all redundant, since their values can be derived from the name of the genus: every pigeon is a bird, every bird is an animal, every animal is animate, etc. In the end, the semantic analysis of 'pigeon' boils down to the assignment of one single feature, call it [+pigeonlike], and similarly for 'turtle'. At this point, it could be wondered, though, whether the lexical items 'pigeon' and 'turtle' could not serve the purpose of differentiating between pigeons and turtles in a less cumbersome way than the features [+pigeonlike] and [+turtlelike]. A similar observation can be found in J. D. Fodor (1977). Her example is 'cow' and her line of argumentation is somewhat different, but the bottom line is similar: At the extreme the property of being a cow could not be further analyzed at all; it would have to be one of the primitive elements of the semantic system. The conclusion would be that the primitive elements are sometimes quite large 'word-sized' ones. [Fodor 1977, 149] In other words, the set of semantic primitives which one needs to differentiate between the meanings of different lexical items, will have to include those lexical items themselves (or, at least, features which correspond one-to-one to those words). This would be a good reason in itself to leave the whole enterprise of lexical decomposition for what it is, but - to make matters worse - what we need for translation is not a stock of features in terms of which we can express the same degree of differentiation as the one displayed by the lexical items of a particular language, but rather a set of terms which enable one to capture a higher degree of differentiation; otherwise, we have not even begun to address the problem of lexical ambiguity and of discrepancies in the use of lexical items between different languages. In short, apart from the methodological problems with lexical decomposition, there is no clear evidence that it will solve any translation problems. Tsujii is categorical about this: ... the lexical decomposition approach does not explain anything about lexical correspondence among different languages. On the contrary, it may increase the
16 Frank Van Eynde difficulties of lexical choice in translation. In order to discriminate 'to assassinate' from 'to kill', 'to murder', etc., though we have a rather direct correspondence between 'to assassinate' in English and 'annsatsuru' in Japanese, we have to encode many kinds of information other than 'X cause Y to become not alive', such as Y's social status, the reason of 'killing' (political or not) and, in general, the speaker's conception of the 'killing' event in question. [Tsujii 1986, 660] The point is clear: it is better to map 'to assassinate' directly onto 'annsatsuru' than to embark on an intricate and complex decomposition of both words. As it stands, the strategy of abstraction is not very attractive for the treatment of lexical ambiguity in MT.
Summing up the results of the three case studies, I conclude that anticipation and readjustment are not very attractive since they do not reduce the complexity of the bilingual operations; they only shift them to the monolingual modules, making the latter language pair specific and the whole of the system less modular. The two other methods are more promising, and will be investigated from a more general perspective in the next paragraph. 2.4. Normalisation and abstraction Normalisation is familiar from linguistics. In transformational grammar it is the standard treatment of various types of movement and ellipsis. In the Government and Binding framework, for instance, the transformations map ordered D-structures onto (differently) ordered S-structures (cf. Chomsky, 1981). In a similar vein ellipsis is treated by undoing the effects of deletion, as in (26) NL. de blauwe reiger eet meer vis dan de zwarte EN. the blue heron eats more fish than the black one The noun phrase in the Dutch 'dan'-clause contains an article and an adjective, but no noun; the noun is understood to be identical to the one of the corresponding noun phrase in the main clause ('reiger'). A normalised representation in this case would contain an abstract or empty noun there, thus allowing for a general treatment of both full and elliptical noun phrases. In this case the application of normalisation is both monolingually motivated and useful for the reduction of transfer, since the English equivalent has to contain a surface trace of the abstract noun ('one'). In other cases, though, normalisation leads to the postulation of representations which lack - or contradict - linguistic evidence. The postulation of a VSOorder for all languages, for instance, may look like an efficient way to ban
Machine translation and linguistic motivation
17
reordering rules from transfer, but when applied to Dutch or English it does not look all that natural (cf. 2.1). It follows that normalisation should be applied with care.
Abstraction is also familiar from linguistics. It is the standard treatment of grammatical function in a number of linguistic theories, including lexicalfunctional grammar, relational grammar and dependency grammar. It is also the standard way of treating inflection. The affixes for number and tense, for instance, are usually represented in terms of features like singular, plural, present and past (cf. section 1). Characteristic for the method of abstraction is that the information from which one abstracts is recoded in some other format. In the case of word order the translationally relevant information is recoded in terms of grammatical functions; in the case of the articles it is recoded in terms of features like definite and partitive. An attractive side-effect of this recoding is that it facilitates the treatment of the interaction between different phenomena. Grammatical functions, for instance, can not only be used for recoding information which is conveyed by word order, but also for information which is conveyed by surface case and preor postpositions. As a matter of fact, word order, case and adpositions are three different ways of expressing similar information, and which of them is more important depends on the language. German, for instance, relies very much on surface case and has relatively free word order, whereas English and French rely more on word order and far less on surface case. All three of them, on the other hand, make ample use of adpositions, whereas a language like Finnish uses few adpositions but a plethora of different cases. This interaction between three different ways of expressing the same information can be investigated in a more comprehensive way when one abstracts away from the differences in their surface realisation. Normalisation does not have this effect: it leads to the postulation of standardised representations of case, order and adpositions, respectively, but not to a unified treatment of these phenomena. It follows that for the purpose of comparing and translating languages the method of abstraction is more attractive than the one of normalisation. At the same time, it should be observed that abstraction has its limitations. The first limitation concerns its applicability. This derives from the requirement that the information conveyed by the original expressions has to be recoded without any loss of relevant information. This will be possible in general if the original expressions belong to a closed class and express distinctions which are few in number. The articles and most inflectional affixes are
18 Frank Van Eynde
good examples of such expressions. The lexical items, however, belong to an open class and are less amenable to abstraction (cf. 2.3). The second limitation concerns the result of its application. Even when abstraction can be successfully applied, it is by no means given that the resulting representations will require no further processing in transfer. The grammatical functions, for instance, may be used for abstracting from surface word order, case and (some) prepositions, but those functions are language specific themselves, as is demonstrated by the well-known pair (27) a. I like him b. il me plait in which the English subject becomes the French object, and vice versa. A similar remark applies to the representations which result from recoding the articles in terms of features like 'definite' and 'partitive5. Such representations avoid the necessity of insertion and deletion in transfer, but this does not mean that transfer can simply copy them. As a matter of fact, many languages do not have a partitive article, and a noun phrase with a definite article in one language may correspond to a bare noun phrase in another language. As a consequence, one will have to allow for some non-trivial substitution of features in transfer. Elimination of such substitutions is only possible if the features which are used for recoding the articles are interlingual (cf. section 4).
Unlike anticipation and readjustment, abstraction and normalisation are not mere tricks for achieving simple transfer. They are both sound and well established methods in linguistics. As a matter of fact, it has been shown in this section that some familiar linguistic descriptions can be analysed as resulting from either abstraction or normalisation. Some further evidence for the soundness of both methods is apparent from the fact that they are not only applied but also discussed for their own sake in linguistics. In 'Two models of grammatical description', for instance, the American structuralist Charles Hockett makes a distinction between Ttemand-Process' and 'Item-and-Arrangement', which is strikingly similar to my distinction between abstraction and normalisation. An example: in order to describe the formation of plural nouns in English, one can either start from one basic allomorph, say /z/, and then add rules which replace it - whenever necessary - by another one, such as /s/ in 'cats' and /iz/ in 'glasses' (this is Item-and-Process), or one can start from an abstract marker, say {Plural}, and add rules which map this abstract marker onto its appropriate phonological realisation, i.e. any of /z/, /s/, /iz/.
Machine translation and linguistic motivation
19
Hockett's distinction is made from the perspective of generation, whereas mine is made from the perspective of analysis, but they amount to the same thing: abstraction is the analytic counterpart of item-and-arrangement, and normalisation is the analytic counterpart of item-and-process. As compared to one another, abstraction has an edge over normalisation, since it does not lead to linguistically unmotivated representations and since it facilitates the study of the interaction between different surface phenomena. Normalisation is not always inferior, though: it does not necessarily lead to linguistically unmotivated representations (cf. the undoing of ellipsis in (26)), and in cases where abstraction is not applicable, as in the treatment of lexical ambiguity, it may be the only viable alternative. Taken together, abstraction and monolingually motivated normalisation are a powerful device for reducing transfer, but there is no reason to believe that they lead to the elimination of all bilingual operations. In order to see more clearly what the limitations are I now turn to a discussion of transfer. 3. On the (in)dispensability of transfer The previous section concentrated on how far one can go in reducing transfer while sticking to the requirement that analysis and synthesis be linguistically motivated. In this section I will concentrate on how far one has to go if one wants to dispense with transfer altogether. 3.1. A typology of bilingual operations So far, I have distinguished four types of bilingual operations: permutation, deletion, addition and substitution. The border lines between these four types are not always clear cut, though. Substitution, for instance, may trigger any of the other three operations, as in (28) EN. he happened to be in town NL. hij was toevallig in de stad 'he was accidentally in the town' Since there is no verb in Dutch which expresses the same meaning as 'happen to', it is rendered by means of an adverb. Interestingly, this change of category triggers a structural change: whereas the English sentence is bi-clausal, its Dutch counterpart is mono-clausal. Changes like these do not only involve substitution, but also deletion (of the main verb node and its mother), addition (of the adverb node) and movement (of the subject). More generally, they show that there is no neat borderline between substitution and the other operations.
20
Frank Van Eynde
For this reason I will replace the original typology with one which distinguishes only two types of bilingual operations: those which affect the structure of the source language representation, and those which do not. The former will be called structure changing, the latter structure preserving. Permutation, deletion and addition are all structure changing. Substitution can be structure changing too, as in (28), but in most cases it will be structure preserving. Next to this distinction, which concerns the type of bilingual operations, there is another distinction which concerns the elements which are operated upon. The distinction to be made here is the one between lexical and grammatical elements. The former are the so-called content words; they typically belong to open classes (nouns, verbs, adjectives, adverbs...), whereas the latter typically belong to closed classes (articles, auxiliaries, affixes, pronouns...). Combining the two distinctions we get four types of bilingual operations: bilingual operations
I structure preserving
lexical 3.1.1 grammatical I 3.1.3
structure changing 3.1.2 3.1.4
In the following paragraphs I will discuss each of these types in some detail. Special attention will be paid to whether and how they can be dispensed with in transfer. 3.1.1. Structure preserving operations on lexical elements This type concerns the substitution of target lexical items for source lexical items. As demonstrated in 2.3, it is hard to see how such operations can be dispensed with. Indeed, none of the two methods for reducing transfer is particularly adequate for coping with discrepancies in lexical differentiation. Abstraction and recoding in terms of semantic primitives is ultimately circular, since the set of semantic primitives will have to contain the lexical items themselves. Normalisation by means of the addition of sense differentiating semantic features is more promising, but it may lead to the imposition of distinctions which lack linguistic motivation (cf. the two meanings of 'eat5), and which are, furthermore, language pair specific: the assignment of two meanings to 'eat' may be expedient when translating to German, but it does not serve any purpose when translating to French or Italian. It seems to follow then that discrepancies in lexical differentiation can partly be dealt with in the monolingual modules - by cautious application of
Machine translation and linguistic motivation 21
normalisation - but that the majority of such cases requires a language pair specific treatment, i.e. in terms of substitution rules in transfer. 3.1.2. Structure changing operations on lexical elements Structure changing operations on lexical elements occur when a target language lexical item requires another syntactic environment than its source language equivalent. Typical examples of such discrepancies are cases in which the target lexical item is of another syntactic category than its source language equivalent, as in (28), where the English verb 'happen to' corresponds to the Dutch adverb 'toevallig'; a similar example concerns the much-discussed correspondence between the Dutch adverb 'graag' and its English equivalent, 'like to'. In order to avoid structural transfer in such cases one could apply abstraction: instead of using lexical items which are marked for syntactic category, one could make use of more abstract 'higher predicates', and leave it to the monolingual modules to decide whether these predicates will surface as a verb or an adverb. One could, for instance, adopt the practice of generative semantics to reduce all syntactic categories to S (proposition), V (predicate) and NP (argument). Notice, however, that such a reduction leads to a massive loss of information, and, hence, to serious problems in synthesis: on what grounds will synthesis determine whether some abstract predicate is to be realised as a verb or an adverb or an adjective or a preposition or a relational noun or even as a derivational affix? It is precisely to avoid such problems that I stipulated in section 2 that abstraction should only be applied if it does not lead to a loss of relevant information. Normalisation may fare better in this case. Indeed, one could keep the lexical items and their syntactic category in the representations, and only standardise the representation of their syntactic environment. One could, for instance, treat the adverbs 'graag' (Dutch) and 'gerne' (German) as two-place predicates, just like their English equivalent 'to like'. Similarly, one could treat the adverb 'toevallig' (Dutch) as a one-place predicate, just like the English raising verb 'to happen'. The viability of this proposal ultimately depends on whether the resulting representations for the Dutch and German adverbs can be linguistically motivated, i.e. motivated on monolingual grounds. In this particular case the monolingual evidence may indeed be there, or better - there may be no monolingual evidence against it, but there is no reason to believe that all structure changing operations on lexical items can be eliminated in this way. In order to say anything sensible on that matter one should first have an idea of what the relevant types of operations are. Change of syntactic category is one such type, and a rather important one as well, but it is not the only one. Another source of structural mismatches in transfer is the phenomenon of
22
Frank Van Eynde
incorporation, as exemplified by the translation of the French reflexive verb 'se suicider' in its English equivalent 'commit suicide': the direct object in the English phrase has been incorporated in the French verb. How these discrepancies can be dealt with in computational terms is discussed at length in the chapter by Louisa Sadler in this volume. Interestingly, she does not opt for normalisation in the monolingual modules but for a language pair specific treatment. The general conclusion is similar to the one of the previous paragraph: structural changes which are triggered by open class lexical items, can partly be avoided by normalisation, but in most cases a language pair specific treatment is to be preferred, if one wants to stick to the requirement of linguistic motivation in the monolingual modules. 3.1.3. Structure preserving operations on grammatical elements These operations concern the substitution of closed class elements or of the morphosyntactic features which represent them. Examples of such features have been mentioned throughout the text: grammatical function, tense, number, definiteness ... As pointed out in 2.4, those features result from abstraction, but this abstraction does not of itself lead to the elimination of transfer, since the distribution of the grammatical elements is language specific. This is not only true for the syntactic functions (cf. like-plaire) and the articles (cf. 14-16), but also for a feature like number: a source language plural does not always correspond to a target language plural. Obvious counterexamples are the pluralia tantum: 'scissors' is always plural in English, and the same holds for its Italian equivalent 'forbid', but its Dutch and German equivalents are singular: 'schaar', 'Schare'. A more systematic discrepancy is found in the translation of the Italian 'qualche + noun' construction. This quantifier invariably requires a singular noun (29) IT. qualche volta, qualche settimana whereas its equivalent in most other languages requires a plural (30) EN. a few times, some weeks NL. enkele keren, enkele weken FR. quelques fois, quelques semaines Shifting the treatment of such discrepancies to the monolingual modules could be achieved by means of abstraction: instead of using morpho-syntactic features in the representations which enter transfer, one could recode their content or their function in terms of semantic features which can simply be copied in transfer.
Machine translation and linguistic motivation
23
Taking into account the problems with the application of abstraction to the lexical items this may seem foolhardy. Notice, however, that there is an important difference between lexical and grammatical elements: whereas the former belong to open classes, and are used for expressing a multitude of concepts, the latter are limited in number and express a much more restricted range of concepts than the lexical items. As a consequence, they are more readily amenable to abstraction than the lexical items. A second difference concerns their treatment in general linguistics. Whereas the decomposition of lexical items in terms of semantic primitives has been given up in current mainstream linguistics, the semantic analysis of closed class elements attracts a lot of attention, especially in the paradigm of formal and model-theoretic semantics (cf. section 4). It seems then that abstraction in terms of semantic universals is more likely to be feasible for the grammatical elements than for the lexical items. 3.1.4. Structure changing operations on grammatical elements Operations of this type concern the changes in structure which are triggered by the substitution of a closed class element. Some relevant examples have been discussed in section 2 - cf. the permutation of constituents (2.1), the deletion of articles (2.2) and the addition of noun place holders (2.4). In all of those cases it was possible to avoid structural change in transfer without spurning linguistic motivation in the monolingual modules. A more difficult problem in thisfieldconcerns the treatment of prepositions. The problem arises when a prepositional phrase in one language corresponds to a noun phrase in another language, as in (31) EN. he answered my question IT. rispose alia mia domanda 'pro answered to-the my question' (32) FR. il arrive le lundi 'he arrives the Monday' EN. he arrives on Monday In order to avoid any deletion or insertion of prepositions in transfer there are two possibilities. Either one opts for normalisation and adds empty prepositions to 'my question' and cle lundi' (cf. the postulation of empty articles in 2.2), or one uses abstraction, reducing all prepositional phrases to noun phrases and recoding on the NP-nodes whatever information was conveyed by the preposition. Elsewhere in this volume Jacques Durand argues that the latter is a good solution for strongly bound prepositions, such as the Italian ca(lla)' in (31), but
24
Frank Van Eynde
that there is good linguistic evidence for not extending this method to all uses of prepositions. The reason is that prepositions form a hybrid class: they cannot only be used as grammatical markers, as in (31), but also as content words. In the latter case the preposition does not behave as a closed class element, and is therefore not amenable to abstraction (cf. 3.1.2). Whether all structure changing operations on grammatical elements can be dispensed with in transfer is an empirical question, but of the four types of bilingual operations they are the easiest to dispense with. Notice, for instance, that for the elimination of bilingual deletion and addition of articles, it is sufficient to represent the latter by means of language specific morphosyntactic features, whereas for the elimination of the bilingual substitution of those features one needs nothing less than an interlingual treatment of determination. 3.2. A hierarchy of transfer systems In an unconstrained transfer system all four types of bilingual operations are allowed. At the other end of the scale are the interlingual systems which do not allow any operation in transfer. One of the few genuinely interlingual systems is Rosetta. It translates between English, Dutch and Spanish, and avoids all types of bilingual operations by attuning the grammars of the monolingual modules to one another (cf. Landsbergen, 1987). In practice, this amounts to a massive application of normalisation and anticipation, especially in the treatment of the lexical items. The English and Dutch first person pronouns, for instance, are systematically marked for gender, since this information is relevant for translation in Spanish when the pronoun is the subject of a [pro - Copula - Adj] construction: (33) EN. I (masc.) am ill ES. pro estoy enfermo (34) EN. I (fern.) am ill ES. pro estoy enferma The net result of this attunement is that the monolingual modules become language set specific: instead of anticipation with respect to one given target language, as in 2.1, we get anticipation with respect to a set of target languages, and this may look like an improvement, but the difference is only one of quantity, not of quality. Indeed, attunement opens the door for the inclusion of rules and representations in the monolingual modules which lack linguistic motivation, and this jeopardizes both the construction and the reusability of the resulting grammars. In Landsbergen's own words:
Machine translation and linguistic motivation 25 A disadvantage is that writing corresponding grammars for a set of languages is more difficult than writing a grammar for each language separately. This may lead to more complex grammars. [Landsbergen 1987] I think that this is an understatement: while attuning grammars for three historically related - languages might still seem feasible, because one can find linguists who know all three of them, a similar attunement for a set of nine or more languages looks considerably less feasible. Indeed, where should one find a linguist who can foresee all possible types of discrepancies between a given source language and a set of eight or more target languages? Furthermore, once the carefully attuned grammars are up and running, there is no guarantee that they can be reused in a system which deals with other languages than the ones with respect to which the attunement was made. Suppose, for instance, that the original attunement had involved English, Dutch and German, rather than Spanish. In that case there would have been no reason to mark the first person pronoun for gender, and the resulting grammars would have lacked the appropriate expressive power to deal with Spanish. The addition of Spanish would, consequently, cause a revision of all monolingual grammars. It seems then that large-scale attunement is not only at odds with linguistic evidence, but also with the requirement of extensibility. There is a second class of systems which are sometimes advertised as interlingual, but which actually make use of bilingual operations. A typical example is CETA (Centre d'Etudes de Traduction Automatique, Grenoble), which allows for the substitution of lexical items in transfer but tries to avoid all other types of bilingual operations. In practice, this attempt to restrict transfer to lexical substitution has not been successful: the system also contains rules for the substitution of morpho-syntactic features like mood, tense and aspect, for instance. A third class of systems includes those which only allow for structure preserving operations in transfer. Eurotra would like to be such a system: The general Eurotra methodology consists in a stepwise refinement of the IS definition, allowing for simpler transfer, and ideally elimination of transfer (or at least elimination of change of structure in transfer). That is, the general Eurotra methodology contains a move from a transfer method towards an interlingual method.4 [Maegaard & Perschke 1991,10] While this may look like a plausible goal at first sight, the discussion in 3.1.2 has cast some doubts on its feasibility. Indeed, since changes in structure are often triggered by lexical items and since those lexical items are, in general, not amenable to abstraction without considerable loss of information, the only plausible way to avoid structure changing in transfer is through the systematic application of normalisation in the monolingual modules, and this will more often than not force one to spurn linguistic evidence.
26
Frank Van Eynde
It is not entirely surprising then, that the Eurotra system, while - by and large - sticking to the requirement that the monolingual modules be linguistically motivated, has not reached the goal of simple transfer (cf. Allegranza etai, 1991, 79-82). In practice, it allows all four types of bilingual operations in transfer, which makes it a member of the fourth class of systems. The fourth class of systems puts no constraints on the expressive power of the transfer module: all four types of bilingual operations are allowed. Theoretically it is the least interesting type of system, but in practice it is by far the most common: Systran, Metal, Taum-Meteo, Eurotra and the majority of the Japanese systems belong to this class. It could be asked at this point whether the quest for constraints on bilingual operations makes any sense. Indeed, if the systems which observe such constraints tend to spurn linguistic evidence, and if the systems which try to observe the constraints end up allowing for all types of bilingual operations when developed beyond the size of a small-scale prototype, one should start wondering about the empirical value of the constraints. The situation is, in fact, reminiscent of the attempts in mathematical linguistics to formulate constraints on the expressive power of generative grammars for natural languages. Having shown that finite state and context free grammars are not sufficiently powerful for the generation of natural languages, at least not if one wants the grammars to be descriptively adequate (i.e. in conformity with linguistic evidence), Chomsky was led to the conclusion that one needs the expressive power of a Turing machine for this purpose, which is tantamount to saying that there should be no formal constraints at all on the generative power of natural language grammars. It seems to follow then that the theory of formal grammars and automata has little empirical significance for the study of natural language. In the same vein, it seems that the constraints discussed so far somehow lack empirical significance. Notice, though, that I have only discussed some types of constraints. The following table gives an overview: Class 1. Interlingua 2. 3. Simple Transfer 4. Complex Transfer
Allowed no bilingual operations at all only lexical substitution (3.1.1) only substitution (3.1.1 and 3.1.3) all four types of bilingual operations
What has been attempted so far, is to formulate constraints on the power of the bilingual operations themselves, but given the discussion in the previous paragraphs, it might actually make more sense to formulate constraints in terms of the expressions on which the bilingual rules operate. Indeed, since abstraction can cope far better with closed class elements than with lexical
Machine translation and linguistic motivation
27
items, it may well be worth trying to restrict the bilingual operations to the lexical elements. Apart from being more feasible, such a constraint would also be more desirable. Indeed, the frequency of the individual grammatical elements is considerably higher than that of the lexical elements. The definite article or the present tense, for instance, occur much more frequently than any arbitrary noun or verb. The reason is that sentences - in order to be grammatical - have to contain grammatical markers of various sorts, and since the number of those elements is low, the frequency of each individual element is quite high. Of course, sentences have to contain lexical elements as well, but since their number is very high, the frequency of each individual lexical element is much lower. It follows that the pay-off for the reduction of bilingual operations on grammatical elements is significantly larger than that for the reduction of such operations on lexical elements. Taking a lead from the discussion in 3.1 one can imagine two types of constraints: a stronger one which eliminates all bilingual operations on grammatical elements, and a weaker one which allows structure preserving operations, but excludes the structure changing ones. In terms of the new hierarchy of systems the former one is type-2 and the latter type-3: Class 1. Interlingua 2. Lexical Transfer 3. 4. Complex Transfer
Allowed no bilingual operations at all only on lexical elements (3.1.1 and 3.1.2) all except 3.1.4. all four types of bilingual operations
Neither of these reductions is trivial, especially if one does not want to impose any language pair specific or linguistically unmotivated operations in the monolingual modules. In its weaker version, though, the reduction may well be feasible: the combined application of abstraction and monolingually motivated normalisation may yield representations which are abstract enough to avoid the permutation, deletion or addition of grammatical elements and morpho-syntactic features in transfer. In its stronger version, the reduction is considerably more difficult to achieve. Indeed, the elimination of all bilingual operations on grammatical elements and morpho-syntactic features is only possible if the content or function of the latter is represented in interlingual terms. The task of defining such representations and their relation to surface text in computationally tractable terms is an ambitious enterprise. Its feasibility will be examined in some detail in section 4. Before closing this section, I want to draw the attention to an interesting analogy between my proposal to aim for an interlingual treatment of closed class elements and a current trend in the MT community to aim for inter-
28
Frank Van Eynde
linguality with respect to a restricted domain. In a recent paper on the 'research status of machine translation', Margaret King discusses the various ways in which MT systems deal with the semantic barrier (cf. the use of lexical decomposition, semantic roles, world knowledge, interaction with the user, etc.) and arrives at the conclusion that: It may not make much sense to look for a universal set of primitives, but if the system to be designed is meant to operate in a closed world, it may make perfectly good sense to look for adequate primitives within that world. [King 1989,12] An existing example of such a system is Titus: Titus ... makes use of a language independent representation as the result of analysis, and is able to do so both because the world it deals with is closed and therefore susceptible to formal modelling and because input to the system is very tightly controlled, [ibid., 11] Indeed, most lexical items can have a large set of meanings and alternative translational equivalents, but within a restricted and highly structured domain, such as textile manufacturing in the case of Titus, the range of possible meanings and translational equivalents gets fairly limited and the meanings themselves get amenable to a representation in language independent terms. What I propose, is not to aim for full interlinguality with respect to a restricted domain, but rather for partial interlinguality with respect to all possible domains; the restriction in my proposal does not concern the domain but the language. In this respect there is a clear difference between King's recommendation and mine. On the other hand, there is also a similarity: in both proposals the feasibility of an interlingual treatment is linked to the possibility to restrict attention to a closed world, the world of denotata in a restricted domain (King) or the 'world' of the closed class elements (Van Eynde). There is another similarity as well, but that can only be made explicit at the end of section 4. 4. Interlingual information and semantic universals For an interlingual treatment of the closed class items one applies abstraction in the monolingual modules and recodes the content or function of the items in terms of semantic universals. Not everything that has been called 'semantic universal' is useful for interlingual MT, though, and the main purpose of this section is to find out which type of semantic universals one actually needs. In order to keep the discussion concrete I will start from a well-known example of non-equivalence in translation, i.e. the forms of tense and aspect. Even for such closely related languages as French and English there is no oneto-one correspondence between their respective tense and aspect forms. The
Machine translation and linguistic motivation
29
French 'present simple', for instance, has at least four translational equivalents in English: (35) elle fume des cigares she smokes cigars (simple present) (36) attention, il s'evanouit *take care, he faints take care, he is fainting (present progressive) (37) il vit a Paris depuis 1968 *he lives in Paris since 1968 he has lived in Paris since 1968 (present perfect) (38) elle travaille a Louvain depuis 1980 *she works in Leuven since 1980 she has been working in Leuven since 1980 (present perfect progressive) As the starred sentences show, the choice between the four equivalents is not a matter of taste or preference, but of content and grammaticality: the use of the simple present in (38) causes ungrammaticality, and the use of the present perfect in (35) or (36) would yield another meaning for the sentence. The interlingual approach to this case of non-equivalence between forms would be to relate the language specific forms, such as 'present simple' and 'present perfect', to language independent semantic representations in the monolingual modules, and to copy the latter in transfer. It is not a priori clear, though, what the relevant semantic representations are, and how they relate to the language specific forms of tense and aspect. This raises the question of the relation between form and meaning, or - more specifically - of the relation between syntax and semantics. This relation has been the topic of many debates in linguistics and philosophy and has been studied in depth in various frameworks. In the context of this article there are three theories, or approaches, which are worth considering in some detail: the contrastive approach, the linguistic approach, as exemplified by Saussurean structuralism and generative grammar, and the model-theoretic approach (4.1). They each highlight an important aspect of the form-meaning relation, but none of them is sufficient in itself to provide the basis for an interlingual treatment of grammatical elements. Such a basis can be provided by an alternative approach which combines some of the virtues of the three classical approaches (4.2).
30
Frank Van Eynde
4.1. Three classical approaches From the discrepancies observed in (35-38) it is clear that the degree of differentiation which is provided by the French morpho-syntactic forms is not fine grained enough for translating into English; in order to be useful for MT an analysis in semantic terms should provide a higher degree of differentiation. To determine the proper degree of differentiation one could compare the languages involved, and derive a criterion of differentiation from the data themselves. In the case of the French 'present simple', for instance, the data observed in (35-38) would require the postulation of (at least) four meanings, each corresponding to one of the four equivalents in English. Notice, though, that this method is inherently language pair specific: if the target language had not been English, the result would have been different. Imposing such differentiations on the source language is, therefore, tantamount to anticipation. A possible remedy in multilingual MT would be to apply the method to all language pairs in the set and to define the proper degree of differentiation for the set as the combinatorial product of all language pair specific differentiations. However, apart from being extremely cumbersome, it is only quantitatively different from the original approach: instead of language pair specific, it is language set specific, and instead of causing anticipation, it leads to systematic normalisation, also in cases where normalisation is at odds with linguistic motivation (cf. the remarks on Rosetta in 3.2). Another problem with the contrastive approach is that it tends to treat all translational discrepancies in the same way, i.e. as signalling (yet) another differentiation, thus ignoring the fact that some of the equivalents may be in complementary distribution in the target language, so that the choice between them is a strictly monolingual matter. The choice between the present perfect and the present perfect progressive in (37-38), for instance, is determined by the Aktionsart of the English sentence: a state in (37) vs. a process in (38). In short, while a contrastive analysis of the data is useful for pointing out some interesting discrepancies and problem cases, it does not in itself provide a good criterion for differentiation.
A second approach is based on the detailed analysis of individual languages. The emphasis here is not on comparison, but on monolingual evidence. The method is characteristic for most variants of linguistic semantics, and its most prominent representatives include Saussure and Chomsky. Typical for the former is a strong commitment to the close correspondence between form and meaning, or - as Saussure used to call them - between signifiant and signifie: 'Un mot peut exprimer des idees assez differentes sans
Machine translation and linguistic motivation 31
que son identite soit serieusement compromise (cf. 'adopter une mode' et 'adopter un enfant', la fleur du pommier' et 'la fleur de la noblesse', etc.).' [Saussure 1916,151]. In other words, the meanings cannot show a higher degree of differentiation than the forms themselves. There is just one small modification to be added: if an expression belongs to more than one paradigm, then it has a different signifie in each paradigm: 'Soient les deux membres de phrase: "la force du vent" et "a bout de force": dans l'un comme dans I'autre, le meme concept coincide avec le meme tranche phonique [fors]; c'est done bien une unite linguistique. Mais dans "il me force a parler" [fors] a un sens tout different; c'est done une autre unite.' [Saussure 1916, 147] This shifts the problem of differentiating meanings to the problem of differentiating between paradigms, but given Saussure's example, the latter must probably be understood in a rather down-to-earth manner; the relevant paradigms in this case simply correspond with morpho-syntactic categories: the noun 'force' vs. the finite verb 'force'. The application of this method to the analysis of tense and aspect yields a set of one-to-one correspondences between forms and meanings; for some concrete examples, see Burger, 1961 and Burger, 1962 for French, and Janssen, 1990 for Dutch. For all its linguistic appeal, the relevance of this method for multilingual MT is rather limited. Indeed, the difficult cases in translation are those where there is no one-to-one correspondence between forms of different languages, and where one would like to have a higher degree of differentiation at one's disposal. For translation into English, for instance, one has to make a distinction between 'adopter une mode' (to follow a fashion) and 'adopter un enfant' (to adopt a child), but the Saussurean criterion for differentiation does not provide the possibility to distinguish between two meanings here. Similarly, it postulates one and only one meaning for the French 'present simple' and one meaning for the English simple present. Since the distribution of these forms is crucially different, their meanings cannot be identical. The result is that one gets two language specific meanings which partially overlap, but which cannot be related in a one-to-one way in translation. It follows that one has to allow for many-to-many mappings between the representations of those meanings in transfer. Notice, furthermore, that in such cases one could just as well have done without semantics, since the forms themselves will stand in exactly the same many-to-many relation. In short, as long as one sticks to one-to-one correspondences between form and meaning in the monolingual modules, one cannot reasonably expect to arrive at a level of representation which displays the appropriate degree of differentiation for interlingual MT. Generative grammar takes a more relaxed position on the relation between
32 Frank Van Eynde form and meaning: it does not postulate any one-to-one correspondence between syntactic and semantic representations, for instance. What is common with Saussurean structuralism, however, is the insistence on the primacy of the forms (autonomous syntax) and on the impossibility of an independent study of meaning(s): The study of meaning is fraught with so many difficulties even after the linguistic meaning-bearing elements and their relations are specified, that any attempt to study meaning independently of such specification is out of the question. To put it differently, given the instrument language and its formal devices, we can and should investigate their semantic function; but we cannot, apparently, find semantic absolutes, known in advance of grammar, that can be used to determine the objects of grammar in any way. [Chomsky 1957,101] This quote dates from the early days of generative grammar and a lot has changed since then, but - throughout the various developments - mainstream generative grammar has stuck to the view that syntax is the core of any serious study of language, and that semantics - like phonology - has a derived status. In the 'Government and Binding' framework, for instance, the representations at the level of logical form are straightforwardly derived from syntactic representations; in its stricter versions this derivation is even claimed to be constrained by the same conditions on rules as those which hold for syntax proper (cf. Chomsky, 1975,1981 and Jackendoff, 1983). A good example of the application of these assumptions to the analysis of tense and aspect is Woisetschlager's dissertation, 'A semantic theory of the English auxiliary system' (1976). One of the principles on which his analysis rests is the so-called univocality assumption: 'Given a syntactic constituent there is exactly one semantic expression that specifies its meaning.' [op. cit., 1213]. Interestingly, this univocality holds more strictly for the grammatical than for the lexical elements: T h e more highly grammaticalised a morpheme is, the less it will tolerate polysemy.' [op. cit., 13]. This reminds one of the Saussurean postulate of one-to-one relations between form and meaning, but there is a difference. While Saussure postulated one meaning per form in a given paradigm, Woisetschlager postulates one meaning per form in a given syntactic function. This leads him to admit two meanings for the English progressive: an aspectual one, as in (39) Walter was filing the day's mail and an epistemic one, as in (40) Bill is flying to Buffalo tomorrow (41) Little Mary was seeing ships on the horizon
Machine translation and linguistic motivation 33 In (39) 'be' functions as a main verb and takes an '-ing' complement, just like the other aspectual verbs 'stop', 'begin', 'continue' and 'start'; in (40-41) it functions as an auxiliary. The criterion for differentiation is purely syntactic: There is an important syntactic difference between them in point of 'degree of grammaticalisation': the aspectual progressive is clearly less grammaticalised than the epistemic progressive. This is correlated with a difference in generality of applicability as follows: the aspectual progressive is limited to being applied to propositional contents that represent events, while no such restrictions as to type of propositional content are imposed on the epistemic progressive, [op. cit., 121] In other words, there are two meanings because there are two syntactic functions. From a monolingual point of view, there is nothing wrong with a criterion like this, but from the point of view of translation, it is less appealing, since the degree of syntactically motivated differentiation tends to be considerably lower than the one which is needed for interlingual MT.
The third approach is based on the assumption that meaning can be studied independently from its expression in individual languages. A good example of this approach is the framework which the logician Richard Montague developed in the late 1960s. In English as a formal grammar he explicitly challenged the syntax-first assumption of generative grammar: Some linguists ... have proposed that syntax - that is, the analysis of the notion of a (correctly formed) sentence - be attacked first, and that only after the completion of a syntactical theory consideration can be given to semantics, which would then be developed on the basis of that theory. Such a program has almost no prospect of success. There will often be many ways of syntactically generating a given set of sentences, but only a few of them will have semantic relevance; and these will sometimes be less simple, and hence less superficially appealing, than certain of the semantically uninteresting modes of generation. Thus the construction of syntax and semantics must proceed hand in hand. [Montague EFL, 210] In practice, Montague turned the table: instead of an autonomous syntax and a syntactically driven interpretive semantics he proposed an autonomous semantics and a semantically driven syntax. Put like this, it seems like a mere shift in emphasis, but its implications are far reaching and have led to a radical reorientation of semantics, at least in linguistics. Of special importance in the present context, is that Montague provided a framework in which semantics can not only be studied as an autonomous discipline but also as a discipline which is no less precise or rigorous than syntax. Building on the latest developments in formal logic, he designed a number of models for the description of both the syntax and the semantics of fragments of English. The best known of these models is the one which he
34
Frank Van Eynde
presented in The proper treatment of quantification in ordinary language'. It is organised as follows: syntax of disambiguated English
syntax of intensional logic
semantics of intensional logic
basic expressions of English
basic expressions of intensional logic
entities
syntactic rules of intensional logic
semantic operations
i
syntactic rules of English complex expressions of English
1
complex T ^ expressions of intensional logic
i
I
denotations of the expressions
[T = rules of translation; I = rules of interpretation] There are two levels of syntactic representation, the one of disambiguated English and the one of intensional logic; they are related by means of rules of translation (T) which correspond one-to-one to the syntactic rules of disambiguated English. The semantics takes the form of a model-theoretic interpretation of the expressions of intensional logic. The rules of interpretation (I) which define this relation between the expressions and their denotations correspond one-to-one to the syntactic rules of intensional logic. The expressions of English are, hence, interpreted in two steps: first they are translated in expressions of intensional logic and then the latter are given a model-theoretic interpretation.5 The question to be asked at this point is whether this framework can provide a basis for interlingual MT; more specifically, whether the distinction between the syntax of English and the syntax of intensional logic can be used as a model for the relation between language specific and interlingual representations, respectively. What makes this question interesting is not so much the fact that the PTQ-grammar has two levels of representation, but rather that the semantically interpreted level is defined in a language independent way, or at least in a way which is independent from the peculiarities of any given natural language. This property might make it useful as an impartial standard for semantic differentiation in natural languages. The way it has been defined in Montague's work, however, the framework is both too general and too strict for this purpose.
Machine translation and linguistic motivation 35 It is too general, since there are no restrictions on the meanings which it assigns and operates with. This is an advantage for the study of logic, since logicians do not want to be constrained to those concepts which are found in natural language, but for the semantic analysis of natural languages it is a handicap. When faced with the task of analysing the meanings of the forms of tense and aspect, for instance, one is not primarily interested in the definition and interpretation of all conceivable temporal concepts and their mutual relations. One would rather want a relatively small but linguistically relevant subset of those concepts. Montague's framework is also too strict. Since it adheres to a rather strong version of compositionality between the expressions of English and intensional logic, there have to be (at least) as many different syntactic representations for a given sentence as it has meanings. It follows that all possible ambiguities have to be anticipated at the level of English syntax. This is, indeed, the reason why Montague calls it the syntax of disambiguated English.6 From the point of view of translation, such a level of disambiguated syntactic expressions is not very useful; instead of having two levels with the same function, i.e. providing the basis for compositional interpretation, it would be more useful to have two levels with a different function: one which represents language specific morpho-syntactic structures, and one which consists of partly interlingual - semantic representations. Given the discrepancies between languages at the morpho-syntactic level the relation between the two levels should, furthermore, not be one of strict compositionality, if one wants to use the semantic representations for MT.
In short, none of the three discussed approaches provides a level of representation which can serve as an interlingua for MT. Yet, as far as the issue of differentiation is concerned they each point at an important requirement: whatever the criteria for differentiation one chooses, the resulting differentiations should be expressed in language independent terms (cf. logical semantics), they should be related to the morpho-syntactic expressions of individual languages in a non-arbitrary way (cf. linguistic semantics), and they should, furthermore, capture the distinctions which one needs for translation (cf. the contrastive approach). In the following paragraph I will sketch the outline of an approach which meets these requirements. 4.2. A new approach to interlinguality As suggested in the previous paragraph I will distinguish two levels of representation in the monolingual modules: a morpho-syntactic one which is
36
Frank Van Eynde
defined in terms of language specific elements, and a semantic one which is partly language specific and partly interlingual. The language specific part concerns the lexical items and their structural properties, the interlingual part concerns the closed class elements. The bilingual modules relate the semantic representations of different languages. They contain rules for the substitution of the lexical items (3.1.1) and for the changes in structure which they trigger (3.1.2); they do not contain any rules for the treatment of closed class elements. The resulting scheme can be summarised as follows: model-theoretic objects INTER language specific
interlingual
semantic representation SYN
ANAl
language specific morpho-syntactic representation
•
ANA
text
SYN
TRA
interlingual |
J | g ^
semantic representation ANA
SYN
language specific morpho-syntactic representation SYN
ANA text
[ANA - analysis; SYN = synthesis; TRA = transfer; INTER - interpretation] This scheme concerns translation between two languages, but it can easily be extended to three or more languages: one simply adds one 'column' per language. In the context of this chapter, the most interesting part of the scheme is the interlingual part of the semantic representations. In contrast to the language specific parts for which one could use existing linguistic descriptions (importing ideas from dependency grammar, lexical-functional grammar or head-driven phrase structure grammar, for instance), the interlingual parts cannot be borrowed from any existing linguistic or logical framework (cf. 4.1). One way to be more specific about them is to spell out the requirements which a representation has to fulfil in order to qualify as interlingual. The central requirement is, of course, that the representation can be copied in
Machine translation and linguistic motivation 37 transfer without causing any loss of information or anomalies in the target text. Given the discussion in the previous sections we can be more specific, though. First of all, the interlingual representations should be language independent; they should not be made to mirror the language specific distinctions and categories of any particular natural language, but rather provide a neutral standard in terms of which the different languages can be compared. The main problem in defining such representations is that they can take any form one wants and that many or even most of the possible representations and concepts will be of little use for the analysis of natural languages (cf. the remarks on generality at the end of 4.1). An interesting way of dealing with this problem has been developed in the research on generalised quantifiers. Starting from a characterisation of all logically possible meanings of quantifiers, one can formulate a number of constraints which reduce this set to those meanings which are actually expressed in natural languages (cf. Barwise and Cooper, 1981). From the point of view of linguistics these constraints are semantic universals. They are not necessarily sufficient for differentiating between the meanings of individual language specific expressions, but at least they provide a useful starting point for the non-arbitrary derivation of such differentiating properties. Second, there is the relation between the language specific morpho-syntactic representations and their interlingual counterparts. Since I do not want to anticipate all semantic ambiguities at the level of morpho-syntactic representations, this relation cannot be strictly compositional. As pointed out in 4.1, it should allow for many-to-many mappings between form and meaning; at the same time, it should not be entirely arbitrary either. More specifically, I would like the deviations of the one-to-one scheme to be linguistically motivated. For that reason I prefer to speak of relaxed compositionality, rather than of noncompositionality. Just how relaxed one has to be on this issue is an empirical matter; it depends on the peculiarities of the expressions one is dealing with, and on the choice of the set of meanings with respect to which they are described, but also on one's sense of what counts as linguistic evidence for making a certain distinction. On this score, generative grammar can serve as a source of inspiration, since it has developed techniques for relating syntactic and semantic representations in non-compositional, yet linguistically motivated, ways: cf. the relation between constituent structure and functional structure in lexical-functional grammar, or the relation between D-structure and S-structure in Government and Binding. There is, of course, a tendency in these frameworks to concentrate exclusively on syntactic evidence, and this is a limitation one should not accept when working with interlingual representations, but - from a technical point of view - the experience of generative grammar with relaxed versions of compositionality is highly relevant to the enterprise of interlingual MT.
38 Frank Van Eynde Finally, there is the matter of interpretation, i.e. the relation between the semantic representations and the extra-linguistic objects they denote. From a translational point of view, this interpretation might seem superfluous. After all, what one needs is a mapping from morpho-syntactic on to semantic representations, and vice versa; how those representations relate to modeltheoretic denotations is of no concern to MT, one would think. There are, however, two reasons for not sharing this view. First, if one wants to make use of extra-linguistic knowledge in MT, and there are good reasons for thinking that one should (cf. the introduction), one should also provide the means to integrate it with the linguistic knowledge. Now, if the representations which result from applying the linguistic knowledge to texts are - at least partially - interpreted in terms of extra-linguistic entities, the task of defining an appropriate interface between both types of knowledge will be facilitated. Second, the assignment of denotations is an important - if not indispensable - tool for the linguists who define the relation between morpho-syntactic and semantic representations. Indeed, if one abandons the constraint that each expression has one and only one meaning, semantic analysis can no longer be reduced to an exercise in finding nice labels for existing morpho-syntactic distinctions. Matters get more complex and if one wants to avoid arbitrariness in the assignment of meanings, it is of crucial importance that the latter are defined in a manner which is clear, explicit and, above all, language independent. Only then can the semantic representations be used consistently and uniformly by speakers of different languages. Good examples of semantic features which do not fulfill this requirement are the names of such thematic roles as Agens, Patiens, Instrument, Experiencer, and the like. Intuitively appealing as they are, they prove difficult to apply in any consistent way on a large scale. As a matter of fact, linguists tend to disagree on their application, as soon as one leaves the domain of clear cut cases. What is needed to make such features useful for interlingual translation, is a definition in terms of non-linguistic entities, i.e. in terms of model-theoretic denotations (cf. Dowty, 1989 for an interesting proposal). The new approach to interlingual MT can now be summed up as follows. The monolingual modules relate texts to representations which are partly language specific and partly interlingual. The interlingual parts concern the closed class elements; they are defined in language independent terms and receive a model-theoretic interpretation; the relation between the interlingual representations and their morpho-syntactic counterparts is one of relaxed compositionality, and should be in conformity with linguistic evidence; the latter should not be limited to syntactic evidence. This approach combines methods from both logical semantics (modeltheoretic interpretation and language independence) and generative grammar
Machine translation and linguistic motivation
39
(relaxed compositionality). The third approach discussed in 4.1, i.e. the comparative one, offers a starting point and an independent measure for evaluation. It offers a starting point because it gives an idea of what kind and degree of differentiation one should aim for, and it offers a measure for evaluation because it gives an empirical standard against which one can check in how far a concrete interlingual proposal provides the degree of differentiation which is needed for translation. Phrased in such general terms, the new approach to interlinguality might seem like an exercise in wishful thinking, and, indeed, if all I could offer at this stage were the outline given in this paragraph, I would feel sceptical about its feasibility myself. However, since the approach has actually been applied already, resulting in an interlingual treatment of the grammaticalised forms of tense and aspect in the EC languages (cf. Van Eynde, 1988, 1991, forthcoming), and since the resulting treatment was not developed in a vacuum, but in the framework of a large-scale MT project, in which it was put to the test of implementation (cf. Eurotra Reference Manual 7.0, 1991), there is some concrete evidence for its feasibility as well. As mentioned at the end of the previous section, there are some similarities between this new approach to interlinguality and the one described in King (1989). Apart from the one already discussed in 3.2, there is the common emphasis on a model-theoretic interpretation of the representations. This is absent from most classical approaches to interlingual MT, probably because the research in formal semantics was unknown or considered irrelevant in the MT community, but given the recent trend in formal semantics to deal with natural as well as logical languages, and given the obvious advantages of modeltheoretic interpretation for MT (cf. supra), there are good reasons for adopting it. The main difference between King's proposal and mine is, then, that she proposes to apply the interlingual approach to the whole of the language (lexical as well as grammatical expressions), but with respect to a highly restricted domain, whereas I want to apply it only to the grammaticalised expressions, but without any constraints on the domain. Both approaches reject the feasibility of an interlingua which is both complete and domain-independent, but they point in different directions for arriving at a feasible alternative. The reason why I prefer the second alternative is that I want to stick to the requirement that the monolingual modules in an MT system should be linguistically motivated. For that reason I have rejected the incorporation of language pair specific operations in analysis and synthesis, and for that same reason I would object to the incorporation of domain specific operations and representations. Indeed, the only way to arrive at reusable monolingual modules is to keep them free of rules and representations which are specific to one particular domain or language pair.
40
Frank Van Eynde
5. Conclusions 1. The linguistic information which is needed for MT includes bilingual and monolingual information. The former is usually stored in language pair specific modules (transfer), the latter in monolingual modules (analysis and synthesis). In multilingual systems it pays off to keep the bilingual modules as simple as possible, also if this means that the monolingual modules grow more complex. 2. There are four possible ways in which the complexity of transfer can be reduced: anticipation, readjustment, normalisation and abstraction. The first two force one to postulate linguistically unmotivated representations in either analysis or synthesis. Normalisation is not necessarily at odds with linguistic evidence, but in many cases it is, and it should therefore be applied with care. Abstraction is the most adequate of the four, but it cannot properly be applied to open class elements (lexical items). 3. The bilingual operations can be classified according to two dimensions: the type of operation they perform (structure preserving vs. structure changing) and the type of the expressions on which they are applied (lexical vs. grammatical). A system which does not allow any bilingual operations on lexical elements has to resort to linguistically unmotivated normalisations in the treatment of the lexicon. A system which allows bilingual operations on lexical, but not on grammatical elements, does not necessarily have this deficiency. There are two variants of such systems: the ones which only avoid structure changing operations on grammatical information and the ones which avoid all operations on grammatical information; the latter requires an interlingual treatment of the grammatical elements. 4. To arrive at an interlingual treatment of grammatical elements one cannot simply apply any of the standard methods of contrastive linguistics, Saussurean semiotics, generative grammar or logical semantics. One rather needs an approach in which the most useful aspects of these methods are combined in a novel way. The result of this combination is a concrete proposal for constructing language independent representations in terms of which the language specific grammatical elements can be analysed and model-theoretically interpreted. Since the method is only meant to apply to grammatical elements, the resulting MT systems will be mixed: the lexical elements will be treated in a language pair specific way (transfer), whereas the grammatical elements will be treated in a language pair independent way (interlingua).
Machine translation and linguistic motivation 41
Notes 1. This is the translational variant of Chomsky's principle of the structure-dependence of transformations: 'For the transformational component to function in the generation of structured sentences, it is necessary for some class of initial phrase markers to be provided.' [Chomsky 1975, 80]. 2. It is worth pointing out that the strict separation of bilingual and monolingual operations has recently been challenged in the paradigm of constraint-based language processing. While still accepting the distinction between both types of information, the adherents of this approach reject the view that there should be one level of representation which contains all translationally relevant information and that all of the analysis should be completed before starting transfer or generation. Instead, they propose a piece-meal translation of partial information (cf. Sadler, this volume). 3. In this particular case normalisation yields the same result as anticipation; the difference with anticipation would become clear when translating to French or Italian; in that case normalisation would still lead to the postulation of two senses of 'eat', whereas anticipation would not, since the target languages do not make the distinction either. 4. IS stands for 'interface structure'; it is the representation which mediates between transfer and the monolingual modules. 5. The role of the level of intensional logic in PTQ can be compared to that of the level of semantic representation or logical form in generative grammar. However, whereas generative grammar tends to identify semantics with the derivation of semantic representations or logical forms, the aim of logical semantics is not just to derive such representations but, above all, to assign them model-theoretic interpretations; the former is, after all, a matter of syntactic rearrangement, the real semantics concerns the latter (cf. Lewis, 1972). 6. As a consequence, the distinction between the two levels of syntactic representation, the one of disambiguated English and the one of intensional logic, is more a matter of presentation than of content. Since the former already contains all information which is needed for the interpretation, it is possible to skip the level of intensional logic and to assign interpretations to the expressions of disambiguated English directly. This direct approach is the one which Montague adopted in 'English as a formal language'.
Acknowledgements I would like to thank Ludo Vangilbergen, Ineke Schuurman and Valerio Allegranza for their comments on previous versions of this text.
References Allegranza, V., P. Bennett, J. Durand, F. Van Eynde, L. Humphreys, P. Schmidt and E. Steiner (1991), 'Linguistics for MT: the Eurotra linguistic specifications', in C. Copeland, J. Durand, S. Krauwer and B. Maegaard (eds), Studies in Machine
42
Frank Van Eynde
Translation and Natural Language Processing, Vol. 1, Office for Official Publications of the European Communities, Luxembourg, 15-123. Badia, T. (this volume), 'Dependency and machine translation.' Barwise, J. and R. Cooper (1981), 'Generalised quantifiers and natural language', Linguistics and Philosophy 4. Burger, A. (1961), Significations et valeur du suffixe verbal franc, ais -e-, Cahiers Ferdinand de Saussure 18. Burger, A. (1962), Essai d'analyse d'un systeme de valeurs, Cahiers Ferdinand de Saussure 19. Chomsky, N. (1957), Syntactic structures, Mouton, Den Haag/Paris. Chomsky, N. (1965), Aspects of the theory of syntax, MIT Press, Cambridge (Mass.). Chomsky, N. (1975), Reflections on language, Collins, Fontana. Chomsky, N. (1981), Lectures on government and binding, Foris, Dordrecht. Diderot, D. (1751), 'Lettre sur les sourds et muets'. Page references are to J. Assezat (ed.), (1875), Oeuvres completes de Diderot, Vol. I, Gamier Freres, Paris. Dowty, D. R. (1989), 'On the semantic content of the notion of "thematic role"', in G. Chierchia, B. Partee and R. Turner (eds), Properties, types and meaning, Vol. II. Kluwer, Dordrecht. Durand, J. (this volume), 'On the translation of prepositions in multilingual MT'. The EUROTRA Reference Manual 7.0 (1991). Commission of the European Communities, Luxembourg. Van Eynde, F. (1988), 'The analysis of tense and aspect in Eurotra', Proceedings of Coling-88, Budapest, 699-704. Van Eynde, F. (1991), 'The semantics of tense and aspect', in M. Filgueiras, L. Damas, N. Moreira and A. P. Tomas (eds), Natural language processing. Lecture notes in artificial intelligence, Vol. 476, Springer Verlag, Berlin. Van Eynde, F. (forthcoming), 'An interlingual analysis of tense and aspect', in M. Kay, M. Nagao and A. Zampolli (eds), Essays in honour ofB. Vauquois. Fodor, J. D. (1977), Semantics: theories of meaning in generative grammar, Crowell, New York. Hockett, C. (1954), 'Two models of grammatical description', Word 10. Jackendoff, R. (1983), Semantics and cognition, MIT Press, Cambridge (Mass.). Janssen, T. (1988), Tense and temporal composition in Dutch: Reichenbach's 'point of reference' reconsidered', in V. Ehrich and H. Vater (eds), Temporalsemantik. Niemeyer, Tubingen. Katz, J. and J. A. Fodor (1963), 'The structure of a semantic theory', Language 39,170210. Katz, J. (1972), Semantic theory, Harper, New York. King, M. (1989), 'Research status of machine translation'. Manuscript of a contribution to the IBM Europe Summer School on 'Machine translation of natural languages', Garmisch. Koster, J. (1975), 'Dutch as an SOV language', Linguistic Analysis 1. Landsbergen, J. (1987), 'Isomorphic grammars and their use in the Rosetta translation system', in M. King (ed.), Machine translation: the state of the art, Edinburgh University Press, Edinburgh. Lewis, D. (1972), 'General semantics', in D. Davidson and G. H. Harman (eds), Semantics of natural language, Reidel, Dordrecht, 169-218. Maegaard, B. and S. Perschke (1991), 'An introduction to the Eurotra programme', in C. Copeland et al. (eds), Studies in machine translation and natural language processing, Vol. 1, Office for Official Publications of the European Communities, Luxembourg.
Machine translation and linguistic motivation
43
McCawley, J. (1968), 'Lexical insertion in a transformational grammar without deep structure'. Papers from the 4th regional meeting of the Chicago Linguistics Society. McCawley, J. (1970), 'English as a VSO language', Language 46. Montague, R. (1970), 'English as a formal language', in R. Thomason (ed.), (1974), Formal philosophy. Selected papers of Richard Montague, Yale University Press, New Haven. Montague, R. (1973), 'The proper treatment of quantification in ordinary English', in R. Thomason (ed.), (1974), Formal philosophy. Selected papers ofRichard Montague, Yale University Press, New Haven. Sadler, L. (this volume), 'Translation and co-description'. de Saussure, F. (1916), Cours de linguistique generate, Payot, Paris (latest edn, 1972). Tsujii, J.-I. (1986), 'Future directions of machine translation', Proceedings of Coling-86, Bonn, 655-68. Woisetschlager, E. (1976), A semantic theory of the English auxiliary system, MIT, PhD thesis.
2 Co-description and translation Louisa Sadler
This chapter discusses and evaluates a recent proposal for translation within the LFG formalism (Kaplan et dl> 1989), by reference to both the 'classical' transfer model and recent proposals to use unification-based grammars for translation. We point out that the traditional classification of MT systems in terms of the transfer/interlingua distinction must be supplemented with a distinction between 'structural' or recursive approaches and constraint-based approaches. 'Structural' or recursive approaches involve the transformation of source text into target text by means of the recursive processing of textual representations. This structural characteristic is common to both 'classical' transfer and interlingual approaches. A constraint-based approach, on the other hand, involves a set of constraints on translational equivalence, with no commitment to sequential, recursive processing. The LFG proposal is one of the very few examples of the latter approach (see also Whitelock 1991). In principle, a constraint-based approach offers at least two advantages: (i) the ability to express and combine translational constraints referring to different 'levels' of linguistic information (without somehow combining these 'levels' in one representation); and (ii) the possibility of avoiding the form of the target structure being too heavily determined by the source language structure, which often results in translationese. In the first section, we sketch out the standard transfer model and go on to discuss transfer in systems with 'sign-based' monolingual components. A 'signbased' unification grammar produces a characterisation of a string along a number of linguistic dimensions in the same representation. The next section provides a detailed introduction to the proposal outlined in Kaplan et al, pointing out the properties of the approach. We then go on to a practical discussion of the behaviour and adequacy of the LFG proposal with respect to some well-known cases of 'difficult' transfer, and conclude with a short discussion of the adequacy of the approach.
Co-description and translation
45
1. The transfer model Throughout the 1970s and early 1980s, the basic distinction in MT architectures has been that between transfer-based and interlingual systems. In both these models, the crucial assumption is made that translation involves sequentially transforming a surface string into an abstract representation in analysis, and deriving a target string from an abstract representation in generation. The mapping between surface string and abstract representation can be mediated by the use of any number of intermediate representation languages. The transfer and interlingual models differ essentially in the assumptions they make with respect to the appropriate degree of abstractness for the 'deepest5 representation. In a transfer model, this representation is firmly tied to the source or target natural language, with the result that rules must be written stating specific correspondences between elements of source and target abstract representations. A transfer module is a collection of such statements, and transfer involves recursively applying such rules to a source representation, deriving a representation of the target representation language. In an interlingual approach, on the other hand, the 'deepest' level of representation is neutral, and the target generator can apply directly to the abstract representation produced by the analysis module. There are two key assumptions shared by these classical models of translation. The first is that all translationally relevant information can and must be provided at one level of representation, whatever the contents of this level are. The second is a commitment to a structural, or derivational model of translation, involving the processing of representations. We will discuss these assumptions with respect to the standard transfer model, ignoring interlingual variants in what follows. As we pointed out above, the monolingual components in a transfer-based MT system typically involve a series of intermediate representations between surface string and abstract representation. Such systems are therefore stratificational, in that a number of different levels of linguistic representation are involved. In addition, they are generally also sequential, in that a representational level i + 1 is produced by recursively processing the representation produced at level i. Crucially, transfer is essentially the same type of operation - the abstract representation of the source string is traversed by some recursive procedure, rules are applied and a target or output representation is produced, either directly or by invocation of rules in the target grammar. Since the key characteristic of this model is the stepwise transformation of structures (both in transfer and in the monolingual components), we have called this model recursive or structural (see Sadler 1991 for further discussion). Because transfer takes place on the basis of one abstract representation of
46
Louisa Sadler
the source text it follows that this representation must express all relevant facets of the input text in some way. It has often been observed (cf. for example Tsujii, 1986 and Sadler and Arnold, 1991) that different sorts of representation seem appropriate for the translation of phenomena of various kinds. The tension, in designing abstract representations as the basis for transfer, is precisely that of accommodating information of intuitively different types while also maintaining some coherent definition of the level of representation in question. The ability to combine in translation information belonging to different 'levels' of linguistic analysis is an important criterion for adequacy in translation systems. There are three further desiderata in the design of translation systems which relate directly to our discussion of the LFG proposal. These are that: • rules should be bidirectional, in both monolingual grammars and in transfer. This permits the construction of a fully reversible MT system which has obvious advantages in terms of the efforts involved in linguistic description. In addition bidirectional rules allow one to easily test the correctness of rules. • the linguistic description should be modular, especially in the sense that the monolingual component should not be contaminated with information oriented towards other languages; that is, in the terminology of van Eynde (this volume) there should be no anticipation and no readjustment. • the linguistic descriptions at every level of analysis should be linguistically justifiable; that is, there should be no arbitrary normalisation (see van Eynde, this volume). 1.1. Unification formalisms and translation In recent years, there has been a steady convergence of work in transfer based MT with that in Computational Linguistics in general. As a result there are a number of approaches which essentially augment fairly standard unificationbased formalisms for monolingual analysis and generation with a component for relating attribute-value structures (Estival et al, 1990, van Noord et al, 1990, Tsujii and Fujita, 1991). The monolingual component may be sign-based, in which case a surface string is associated with a representation classifying the string along a number of different dimensions which express information of different sorts, for example, orthographic, surface syntactic, deep syntactic, semantic, discourse-oriented, etc., associated with different leading or distinguished attributes. For example, a feature structure may have (complex) values for the leading features PHON, SYN and SEM. At first glance, such formalisms allow for the direct use of information associated with these different features of the representation in transfer without falling into the problems associated with the hybrid but abstract level of representation of the standard transfer model, by the use of bilingual signs, making reference freely to
Co-description and translation 47
any parts of the source or target representation in transfer. In fact, although such systems are often referred to as 'sign-based transfer systems' they do not generally exploit the sign-based nature of monolingual representations in transfer, and thus transfer information is restricted to the values associated with one leading attribute (e.g. SEM). A further important and related property of these systems is the transfer algorithm itself, which does not differ in any essential way from that associated with the standard transfer model. Typically, transfer is the recursive application of a rule set to a (feature) structure, and one attribute is used to provide the structure for recursion. (The precise mechanisms by means of which rules are matched against source representations and target representations built up in transfer differ but the general characteristics are the same). This recursive algorithm effectively limits the possibility (opened up by the use of sign-based representations), for the expression of correspondences along different dimensions, unless explicit mechanisms are used for passing this information around the feature structures in the derivation path as the recursive transfer algorithm is applied (see Whitelock, 1991 and Sadler, 1991 for further discussion of this point). Thus although such declarative, unification-based formalisms do in fact support a notion of constraint or translation correspondence, this is given a very restricted interpretation, because transfer processes the input structure and does so in one dimension. What is needed is a different control strategy which does not involve the application of transfer rules to the recursive structure of a leading feature. The LFG proposal discussed in this article is one such alternative strategy. 1.2. Test cases for translation The presentation of work on transfer-based MT has typically been characterised by the discussion of rather well-known problem cases which involve complex structural or lexical changes. In arguing for or against a proposed framework or architecture, papers standardly show how well the formalism fares with respect to a 'shopping list' of such problem cases. These cases include argument switching (la), dependent incorporation (lb) and head switching (lc): (la) John likes Mary; Marie plait a Jean (lb) commit suicide; se suicider (lc) John has just arrived; Jean vient d'arriver A typical example of this is Tsujii and Fujita (1991) who discuss the treatment of these three in their formalism (see also van Noord et al, 1990,
48
Louisa Sadler
Estival et al, 1990, Arnold et al, 1988). Of course, there is no suggestion that transfer systems face challenges only with these much-discussed structural mismatches, and we can add to this list problems such as the translation of unbounded dependencies and anaphora of various sorts (see the chapter by Allegranza in this volume), and the treatment of various 'semantic' phenomena such as negation, quantifier scope, modality and temporal reference. In the sections that follow, we concentrate on the adequacy or otherwise of the LFG proposal in dealing with two of these phenomena (head switching and dependent incorporation).
2. Projections and co-description Kaplan et al (1989) present an approach to machine translation based on co-description, using the equality and description based mechanisms of LFG. The term co-description refers to the possibility in this theory of stating what are essentially inter-module or inter-level constraints which simultaneously describe elements of both domains, by setting them into correspondence with each other. This is achieved in LFG by means of projections which are linguistically relevant mappings or functions between levels. Mapping functions such as (f> (from c- to f- structure) and o (variously from c- to f- to semantic structure) are familiar from the recent LFG literature (Halvorsen 1988a, b, Halvorsen and Kaplan, 1988, Kaplan, 1987). As pointed out by Kaplan (1987) and Halvorsen (1988a), the familiar LFG symbols t and 1 are simply convenient notations for 0(M(*)) and 0(*) respectively, where M stands for 'mother' and * stands for the c-structure node to which the description is attached. This method of specifying the correspondences between elements of different representational levels has a number of nice features. Under previous proposals for specifying a level of semantic structure in LFG (for example, Halvorsen 1983), a technique known as 'description by analysis' was used. This essentially involves producing a representation (in this case an f-structure) and then matching patterns against this structure in order to derive the semantic structure. This technique, of course, assumes the prior existence of an fstructure in analysis. It also requires that all information relevant to the semantic structure must be present in the f-structure. This problem is familiar in the MT context as the problem of channelling all information through one level of abstract representation in transfer formalisms. In the co-descriptional approach, information about a particular level can be stated by using a number of different projectors, each permitting the expression of linguistically significant mappings between different structures. To take a simple example, the surface constituent structure (c-structure) position of an adverbial might be relevant to the semantic structure alongside f-structure properties. Since the
Co-description and translation
49
co-descriptional approach allows both the composition and the inverse of projector functions, various descriptive possibilities open up, with the result that in principle the statement of correspondences should be both straightforward and elegant. Work in this spirit by Kaplan etal (1989) defines two translation functions r (between f-structures) and r' (between semantic structures). By means of these functions, one can co-describe elements of source and target f-structures and s-structures respectively. Achieving translation can be thought of in terms of specifying and resolving a set of constraints on target structures, constraints which are expressed by means of these functions. The formalism permits a wide variety of source-target correspondences to be expressed: r and can be composed, as can T' and a. The approach allows for equations specifying translations to be added to lexical entries and (source language) c-structure rules. For example: (2) (r(TSUBJ)) = ((rt)SUBJ) composes r and ^, equating the r of the SUBJ f-structure with the SUBJ attribute of the r of the mother's f-structure. Thus (2) says that the translation of the value of the SUBJ slot in a source f-structure fills the SUBJ slot in the f-structure which is the translation of that source f-structure. Note however, that capturing translation relations in this way in terms of T and r' equations added to the source lexicon and c-structure rules means that the proposed formalism is not bidirectional, which is a serious disadvantage.1 Despite this, what results is an interesting approach to MT, with a number of apparent advantages: • it is description based, rather than being concerned with the construction of structure. Translation is seen as the resolution of a set of constraints on the target language. • it avoids the problem that arises in traditional transfer (or interlingual) systems where a variety of (often incompatible) kinds of information must be expressed in a single, linguistically hybrid structure, and yet still allows information from different linguistic levels of representation to interact to constrain the translation relation, by function composition. • because it uses the formal apparatus of LFG, it is at least compatible with a large body of well worked out linguistic analyses. • the examples that Kaplan et al give suggest that the notation is both natural and expressive: natural, in the sense that adequate r relations can be stated on the basis of reasonable intuitive and well-motivated linguistic analyses; expressive, in the sense that it is powerful enough to describe some difficult translation problems in a straightforward way.
50
Louisa Sadler
We begin by giving some simple examples of the LFG translation formalism, by way of introduction to the detailed discussion of some problem cases below. Take the following pair, in which similar argument structures are mapped to different syntactic grammatical functions and surface realisations: (3) John likes Mary Marie plait a Jean We will use this example to illustrate translation using r (which maps between f-structures) and alternatively using r' (which maps between semantic structures). In the Kaplan et al (1989) proposal, the semantic structure is projected by a from the f-structure. The equational constraint language can be used to state inter-level as well as single level constraints, opening up various descriptive possibilities, for example: (fJM * SUBJ) - {(/> *),2 or in more familiar notation: (tSUBJ) = (1) equates the f-structure of the current node with the SUBJ attribute of the f-structure associated with the mother of the current node (a^M*) denotes the semantic structure corresponding to the f-structure associated with the mother of the node referred to (a^M^ARGl) - a(^M*SUBJ) states that the ARG1 in the semantic structure corresponding to the f-structure associated with the mother of the current node is the semantic structure corresponding to the SUBJ of the f-structure associated with the mother of the current node. That is, this equation states that the semantics of the f-structure SUBJ constitutes the ARG1 in the semantics of the containing f-structure. Projectors such as a, r and r' can occur in equations on both c-structure rules and in the lexicon. With these preliminaries over, we can proceed to the translation case in hand (3). If we express the translational correspondences using r, we set the SUBJ of like in correspondence with the AOBJ OBJ ofplaire and the OBJ of like in correspondence with the SUBJ of plaire. The lexical entries will contain the following (ignoring all o equations): (4) Kke,V: (tPRED) - like(SUBJ, OBJ) (rtPRED FN) - plaire(SLB£ AOBJ) (rtAOBJ OBJ) - r(TSUBJ) (rt SUBJ) - r ( t OBJ) John, N: (tPRED) - John (rtPRED FN) - jean
Co-description and translation 51
maty, N: (tPRED) = mary (rtPRED FN) - marie The PRED values in LFG are special atoms known as semantic forms which indicate the grammatical functions which are subcategorised by this PRED. This information is important in determining the completeness and coherence of f-structures. The partial target f-structure described by the equations above is: (5)
TPRED plaire SUBJ [PRED marie] AOBJ [oBJ
[PRED jean]]
Alternatively, we could express the translation between semantic structures along the lines given below: (ignoring equations): (6) like,V: (a0M*REL) = like (a^M*ARGl) = a(0M*SUBJ) (aJ*M*ARG2) = a(j*M*OBJ) (r'a^M*REL FN) = plaire (Yaj*M*ARGl) = T ' ( O # 1 * A R G 1 ) (r'a0M*ARG2) = r'( ludotheque pierre tombale => gravestone epine dorsale => backbone bring together => rapprocher aller en flottant => float plante grasse => succulent plante grimpante => creeper Following the model of (18) we might assume (25) as the regular r equation for toy and try (24) as the 'special' r case for library:
Co-description and translation
57
(24) library: [PRED = toy] ec TADJ (rtPRED FN) = ludotheque (25) toy: (rtPRED FN) = miniature There are a number of problems with this. Note that x equations for translating adjuncts are annotated to the c-structure rule, as in (26). (26) NP
-
AP* 1 e t ADJ (rl)e(rtADJ)
N
Making this annotation optional, or disjoining the r equation on toy in (25) with an equation translating it as 'nil5, as in (22), would lead to a general problem of non-translation for adjuncts. The alternative will produce target f-structures corresponding to ludotheque miniature and so on. The problem, then, is specifying just which adjuncts are to remain untranslated in which contexts. Furthermore there are serious problems with (24). LFG is formulated in such a way as to exclude reference to adjuncts (members of the set-valued ADJ attribute) in lexical entries: ... since there is no notation for subsequently referring to particular members of that set (i.e. the set of adjuncts), there is no way that adjuncts can be restricted by lexical schemata associated with the predicate ... Since reference to the adjunct via the 'down arrow' is not possible from other places in the string, our formal system makes adjuncts naturally context-free. (Kaplan and Bresnan: 216) But we need to refer to the adjunct toy in (24), in order to give context for the translation ludotheque. In order to do so, we have been forced to introduce the membership constraint shown in (24). Some revision to allow one to refer to a particular member of the set of ajduncts is therefore necessary, and will be assumed in the following section.
3.3. Some extensions This section has been concerned with the impossibility of adequately treating in the approach of Kaplan et al certain translational phenomena resulting from the differences between lexicalisation patterns between source and target languages. We will show in this subsection that there are two distinct features of the approach which contribute to the problem. They are
58
Louisa Sadler
• the integration of monolingual and bilingual information in the source lexicon and c-structure rules; • the constraint-based nature of the approach. Prior to discussing these features of the LFG approach it is illustrative to compare this model with standard representation based transfer systems. In such systems, bilingual information is stated in a set of transfer rules (typically, there are both lexical and structural transfer rules). Translation is achieved by recursively applying transfer rules to a representation which is the output of the analysis phase. Hence these systems differ with respect to both the properties mentioned above, in that • monolingual and bilingual information is separately stated (in the monolingual and transfer components, respectively); • translation is a recursive procedure which transforms one representation into another. Because transfer rules are matched against pieces of source representation, a transfer rule can relate a single lexical item in one language to a construction or a set of lexical items in the other language. In such formalisms it is generally relatively easy to state a translation relation between the subpart of a representation-based which has commit as head and suicide as object and the lexical head se suicider. For example, in the representation-based system MiMo (Arnold etaU 1988) the following rule extracts from the source representation and inserts into the target representation those parts (bound to!) which we wish to translate as a unit, specifying that commit and suicide together translate as se suicider and vice versa: (27) '.commit [arg2 = Isuicide] o !se-suicider In this formalism, as in some other structural or representation-based approaches (cf. also Estival et al, 1990), it is possible to organise a transfer component so that more specific rules (such as (27)) take priority over or suppress the 'regular' translation pair commit o commettre. On the other hand, in LFG MT representations play no part - the piece-wise mapping function r involves equations which attach to source lexical items and to source c-structure rules, not to representations. This means that the bilingual statements are necessarily structured by the source language (lexicon and c-structure rules) - r equations are given for the units (lexical or c-structural) of the source language. In effect, the bilingual statements do not really constitute a transfer lexicon, they are most naturally thought of as simply a further set of equations in the monolingual source lexicon - just another pro-
Co-description and translation
59
jection. This is the source of the problem in correctly stating the mapping for the incorporation cases, because the units for translation are not co-extensive with the units for monolingual analysis. The alteration required is to propose a separation of the monolingual and translation equations, by including a true transfer lexicon in the model. This lexicon is simply a collection of r equations, but it differs from the monolingual lexicon in terms of its units of organisation. There must be entries in the transfer lexicon for each of the cases of incorporation discussed above. For the 'regular' translations, the r equations of (16) and (17) above will be given. We introduce in addition an entry which is organised around translationally relevant units. We illustrate this transfer lexicon with commit (of suicide). As a first approximation: (28) commit: a (tOBJPRED)=c suicide b (rtPRED FN) = se suicider c r(tSUBJ) = (rtSUBJ) dr(tOBJ) = NILor d r(tOBJ) = rt The regular r equation for suicide of course assigns the FN suicide in the translation. This information is inconsistent with (d) and must be overridden. We assume that priority union is necessary to prefer the information in the transfer lexicon over any inconsistent information from the regular r equations. Thus an entry like (28) will bear the priority operator (see Kaplan 1987 for some brief discussion). Of course, such operators must be treated with care. As before, we produce commettre le suicide and se suicider as candidate translations (but see note 6). But we maintain the transparency of our lexical entries - most importantly, we do not alter the regular entry for suicide. It is easy to see that some mechanism to prefer target f-structures produced from the 'special' transfer lexicon can be added, if required. Consideration of cases where non-subcategorised elements must be incorporated or fused in translation shows that the picture is slightly more complicated. Above these cases were problematic because we could not constrain the null translation or the regular translation of the adjunct appropriately, producing for toy library both ludotheque and ludotheque miniature. The problem is that of specifying which adjunct serves as context and is to be 'untranslated', for we need to specify not a path in the source, but a value (roughly (rtADJ PRED toy) = NIL or r(tADJ PRED toy PRED FN) - NIL). Such an equation would be inconsistent with the regular translation of toy as
60
Louisa Sadler
miniature in the normal way, and could again be treated by priority union. It is clear that the current impossibility of referring to members of the ADJUNCT set is problematic and some extension of the notation may be required. However this could be simulated by structuring the transfer lexicon further to allow conjoined entries: (29) [[library: (rtPRED FN) = ludotheque] A[toy: (rt) = NIL]] The use of conjunctions of entries as in (29) can of course be extended to the subcategorised cases (28), obviating the need for priority union with the regular equations in that case too. We have discussed some cases which seem to be problematic for the r projection as presented in Kaplan et al. We have suggested that the problem arises because the transfer lexicon is organised solely around the lexical units of the source language, thus making it impossible to state context for some translations. We have presented some extensions to the framework which offer some hope of overcoming these problems. 4. Head-switching There has been a good deal of discussion in the MT literature concerning the translational difficulties caused by cases in which languages do not share the same head-dependent relations. These cases are important in representing a pervasive structural difference between languages which a translation formalism should be able to deal with in a natural way. This section discusses the problems raised for Kaplan et al by these cases: (30a) John just arrived. Jean vient d'arriver. (b) Jan zwemt graag/toevallig. John likes to/happens to swim. (c) I think that John just arrived. Je pense que Jean vient d'arriver. (d) Ik denkdat Jan graag zwemt. I think that John likes to swim. (e) Ich glaube dass Peter gerne schwimmt. I believe that Peter likes to swim. Kaplan et al present two different approaches to these cases. We first show here that the proposals they make cannot in fact account for the full range of
Co-description and translation 61
data for this phenomenon. In 4.2 and 4.3 we discuss in a preliminary way alternatives which depart from or extend the formalism they propose in various respects, pointing out some directions for further research. Their discussion considers the problem of translating these pairs when the source is the language which uses an adverbial when the target language uses a verbal construction. Since this is in fact the direction of translation which raises problems for their approach, we will concentrate on this direction of translation in our discussion. The first alternative they present assumes that the adverb is an f-structure head subcategorising for a sentential ARG. This has the effect of making the source and target f-structures rather similar to each other, so that no switching has to be allowed for in the bilingual mapping expressed by r equations. This amounts to treating head-switching as a monolingual phenomenon, and it faces a number of serious problems in this formalism. Firstly, it violates the LFG assumption that the f-structure of the highest c-structure node contains the f-structures of all other nodes (sometimes referred to as the root-to-root condition). Secondly, while such an analysis might well be correct for semantic structure, it cannot be justified on monolingual grounds as a surface-oriented syntactic feature structure (an f-structure). For example, it is the verb, not the adverb, which is the syntactic head of the construction, carrying tense and participating in agreement phenomena. Thus the proposal would appear to embody incorrect assumptions about the nature of f-structure. In fact, this proposal is an instance of anticipation (van Eynde, this volume); it amounts to tuning the source and target grammars. Thirdly, the approach simply shifts the problem of specifying a headswitching mapping to the monolingual mapping between c-structure and f-structure, and as a consequence cannot be extended to the data discussed here, for precisely the same reasons as the second alternative, which we discuss in detail below. The second proposal Kaplan et al make involves treating head-switching in the mapping between f-structures, i.e. by means of r equations. Notice that a third possibility, which they do not discuss, is to assume a flat f-structure in which the adverb is an f-structure SADJ(unct), but to analyse it as a semantic head at s-structure. In the model they assume, the arrangement of projections (shown in (4)) is such that a function a maps f-structure units to units of semantic structure, with r relating f-structures of different languages and r' relating semantic structures across languages. Given these assumptions, treating the head-switching operation monolingually between f-structure and semantic structure simply means that the mapping problem we describe below arises in that monolingual mapping rather than in the r mapping.9
62
Louisa Sadler
4. L Head-switching by r equations If head-switching is a r operation the adverbs are f-structures SADJs. The r annotation to ADVP in the source language states that the r of the f-structure of the mother node is the XCOMP of the r of the SADJ slot, as shown on the c-structure rule for the English sentence in (30a) (theirfig.26):10 (31) S =>
NP (tSUBJ) = 1
ADVP VP (tSADJ) - i (r(tSADJ) XCOMP) = (rt)
The essential idea is to subordinate the translation of the f-structure which contains the SADJ to the translation of the SADJ itself. Further r equations come from the lexical entries for the source items (following Kaplan et aVs % 21): (32) arrive: V (tPRED) - arrive(5(/fi7> (rtPRED FN) = arriver r(tSUBJ) = (rtSUBJ) just: ADV (t PRED) = just (TtPREDFN) = venir John: N (t PRED) = john (rtPRED FN) = jean These equations and further equations from the target lexicon correctly l o t o (33) f 3 ^ and o n H (34). f1A\ relate PRED arrive
(33)
h
SUBJ / 4 [PRED John] SADJ / 5 [PRED just]
Co-description and translation (34)
PRED
venir
|SUBJ
t4[..]
XCOMP u
63
PRED arriver SUBJ
t4 [PRED jean]
In order to see the problem with this approach we need to look at cases in which these adverbs appear in embedded sentences, as illustrated in (30c, d, e). The English structure for (30c) is given in (35), and the additional lexical entries in (7). (35)
PRED think /, SUBJ f2 [PRED I] COMP /,[...]
(36) think: V (tPRED) = thiak(SUBJ,COMP) (ft PRED FN) = penser r(tSUBJ) = (rtSUBJ) T(tCOMP) = (zrtCOMP) /:N (tPRED) = I (TtPRED FN) = je The equations in (31), (32) and (36) together specify a set of constraints on the target f-structure. The problem is that they require the translation of the f-structure immediately containing the SADJ attribute (f(f3)) to be both a COMP (from (36)) and an XCOMP (from (31)) in the target f-structure. Substituting variables for clarity, the set of equations derived is given in (37). (37) (Tfl PRED FN) = penser projection and the ^-1 projection. Preliminary investigation of these assumptions suggests that difficulties will still arise in these cases, however. 10. Note that here and subsequently, following Kaplan etal, we ignore the monolingual potential for more than one ADVP with its attendant problems for translation. For some discussion of these cases in another framework, see Arnold et al (1988). References Arnold, D. J., S. Krauwer, L. Sadler and L. des Tombe (1988), 'Relaxed Compositionality in Machine Translation', Proceedings of the Second Conference on Theoretical and Methodological Issues in Machine Translation, CMU, Pittsburgh, page numbers not integrated. Estival, D., A. Ballim, G. Russell and S. Warwick (1990), 'A Syntax and Semantics for Feature-Structure Transfer', Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation, Linguistic Research Center, Austin, Texas, page numbers not integrated. Halvorsen, P-K. (1983), 'Semantics for Lexical-Functional Grammar', Linguistic Inquiry, vol. 14/4, pp. 567-616. Halvorsen, P-K. (1988a), 'Algorithms for Semantic Interpretation', presented at Workshop on Computational Linguistics and Formal Semantics, Lugano, Switzerland. Halvorsen, P-K. (1988b), 'Semantic Interpretation', presented at the International Conference on Fifth Generation Computer Systems, Tokyo, Japan, pp. 471-8. Halvorsen, P-K. and R. M. Kaplan (1988), 'Projections and Semantic Description in Lexical-Functional Grammar', presented at the International Conference on Fifth Generation Computer Systems, Tokyo, Japan, pp. 1116-22. Kaplan, R. and J. Bresnan (1982), 'Lexical-Functional Grammar: A Formal System for Grammatical Representation', in J. Bresnan (ed.), The Mental Representation of Grammatical Relations, MIT Press, Cambridge, Mass. Kaplan, R. (1987), 'Three seductions of computational psycholinguistics', in P. Whitelock, M. M. Wood, H. L. Somers, R. Johnson and P. Bennett (eds), Linguistic Theory and Computer Applications, Academic Press, London, pp. 149-88. Kaplan, R., K. Netter, J. Wedekind and A. Zaenen (1989), 'Translation by Structural Correspondences', Proceedings of the 4th European ACL, UMIST, Manchester, pp. 272-81. van Noord, G., J. Dorrepaal, P. van der Eijk, M. Florenza and L. des Tombe, 'The MiMo2 Research System', Third International Conference on Theoretical and Methodological
Co-description and translation 71 Issues in Machine Translation, Linguistic Research Center, Austin, Texas, page numbers not integrated. Sadler, L., I. Crookston, D. Arnold and A. Way (1990), 'LFG and Translation', Third International Conference on Theoretical and Methodological Issues in Machine Translation, Linguistic Research Center, Austin, Texas, page numbers not integrated. Sadler, L. (1991), 'Structural Transfer and Unification Formalisms', Applied Computer Translation, 1/4, pp. 5-21. Sadler, L. and H. S. Thompson (1991), 'Structural Non-correspondence in Translation', Proceedings of the 5th European ACL, Berlin, Germany, pp. 293-8. Sadler, L. and D. Arnold (1991), 'A Constraint-Based Approach to Translating Anaphoric Dependencies', Working Paper in Language Processing, No. 29, Department of Language and Linguistics, University of Essex. Tsujii, J-I. (1986), 'Future Directions of Machine Translation', Proceedings ofCOLING, Bonn, Germany, pp. 655-68. Tsujii, J-I. and K. Fujita (1991), 'Lexical Transfer based on bilingual signs: Towards interaction during transfer', Proceedings of the 5th European ACL, Berlin, Germany, pp. 275-80. Whitelock, P. (1991), 'Shake and Bake Translation', manuscript. Zajac, R. (1990), 'A Relational Approach to Translation', Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine Translation, Linguistic Research Center, Austin, Texas, page numbers not integrated.
3 The interaction of syntax and morphology in machine translation Paul Bennett
Morphology has been relatively neglected within MT research. For instance, the few references in Hutchins (1986) are to morphological processing, the treatment of inflection, spelling rules, and so on. In this chapter, however, we shall explore the possibility of translating subparts of words along the same compositional lines as the constituents of sentences. It should be immediately obvious that a phenomenon realized morphologically in one language may be a syntactic construct in another. For instance, the comparative is represented by a separate word in French (plus grand), by a suffix in German (grower), and by either in English depending on the adjective (bigger, more attractive). Sometimes, moreover, adjective and comparative morpheme are fused together in a non-compositional form (meilleur, besser, better). The examination of compositional translation, therefore, cannot be confined to syntax, but must consider word structure also. The problems encountered in morphology, though, may be rather different from those occurring in the sentence domain. A subsidiary aim of the chapter is to discuss a number of proposals in the literature on theoretical morphology in order to see how helpful they are. We shall take our examples from the Eurotra languages, and from two other languages that have been the subject of MT research, Chinese and Japanese. This chapter is structured as follows. In section 1, we survey the various kinds of morphological formation which occur, and examine how they have been handled within linguistics. We then focus on translation problems and examine cross-linguistic differences, comparing them with those found in syntax. Sections 2-4 consider how different morphological constructions should be represented in MT, and the contribution which can be made to simplifying transfer; section 2 deals with inflection, section 3 with derivation and section 4 with compounding. In section 5, we discuss various general issues, and section 6 forms the conclusion. A topic which we shall leave aside entirely here is that of morphographemics and the segmentation of surface forms. A number of proposals to handle such problems are now implemented and available (e.g. Pulman et al, 1988). MT
The interaction of syntax and morphology in MT
73
does not in itself introduce any new problems in this area - though solutions need to be adaptable to more than one language (cf. Thurmair 1984) - so it seems reasonable to concentrate on more pressing and relevant problems. 1. The linguistic and translational background This section presents the linguistic and contrastive background to the MT treatment of word structure. In 1.1. we discuss theoretical linguistic work on morphology, and in 1.2 sketch some translational problems. 1.1 Morphology is traditionally divided into three spheres. Inflection deals with the formation of different forms in the paradigm of a lexeme, e.g. the plural of a noun, different cases, the various tenses and person-number forms of verbs. Derivation involves the creation of a new lexeme via affixation (e.g. happiness, unsafe). Compounding is where two stems are combined (e.g. hotel room, stirfry). There are a number of demarcation problems, in the areas of derivation vs. inflection, derivation vs. compounding and compounding vs. syntax (cf. Bauer 1988, chs 6 and 7), but on the whole these notions are fairly clear. These three types require distinct consideration in an MT context. The development of generative grammar provided a formal theory within which precise hypotheses about linguistic structure and interpretation could be tested. Interest in the morphological aspect of language has varied within the generative paradigm over time (cf. the opening remarks of Anderson 1982), but over the last decade morphology has become an important area of research. A crucial issue has always been the relation of morphology to syntax, and the extent to which morphological rules are autonomous of syntactic ones. Increasingly, the independence of morphology has been asserted, and the idea that syntactic rules can affect the internal structure of words has become less popular, though it still has its adherents. Instead, the undeniable parallels between morphological and syntactic constructs have been captured by lexical rules and principles of argument inheritance which operate in the lexicon. We cannot give a full account of such developments here. Instead we offer a summary with references to some key works (textbook accounts include Scalise 1984, Spencer 1991; and Anderson 1988 is an excellent survey article). Lees (1960) proposed to derive compounds in the syntax from sentential structures; a comparable approach was followed within the Generative Semantics movement, with derivational affixes (and sometimes semantic components with no actual morphological reflex) seen as 'higher predicates'. It was also common to view inflectional affixes as manipulated by syntactic rules (e.g. the Affix Hopping transformation). Chomsky (1970) argued that the relation between John refused the offer and
74
Paul Bennett
John's refusal of the offer should be treated by relating the lexical entries for refuse and refusal rather than by deriving both from the same deep structure. This paper may be said (cf. Hoekstra etal, 1980) to have launched the lexicalist research programme, which centres on the claim that syntactic rules cannot refer to word-internal structure; this notion, which is formulated in various ways, is often referred to as the Lexicalist Hypothesis (LH). Halle (1973) proposed an autonomous morphological component, while Aronoff (1976) studied the properties of word formation rules, concentrating on derivation. Selkirk (1982) provided a comprehensive lexicalist approach to the three areas of morphology, making use of context-free rules and interpretive principles. Lieber (1983) gives a detailed account of compounding, claiming that no special rules or principles are needed. Di Sciullo & Williams (1987) propose that words are syntactic atoms, to which syntactic rules do not apply. This thesis of the atomicity of words is a strong form of the Lexicalist Hypothesis. Lieber's work is an example of how notions pervasive within generative grammar (e.g. that of using general principles rather than stipulative rules) have influenced work on morphology. Other popular notions have been applied in morphology, too. An important concept is that of headedness (cf. Williams 1981, Selkirk 1982, Zwicky 1985, Di Sciullo & Williams 1987): many derivational affixes have been analysed as heads of their words. Equally, the notion of feature percolation has been employed in morphology (Lieber 1983, Di Sciullo & Williams 1987). The distinction between arguments and modifiers or adjuncts, familiar from syntax, is also useful in morphology, especially in the domain of compounding. There may, for instance, be different principles for interpreting compounds depending on whether the non-head is an argument of the head (Selkirk 1982, Lieber 1983). This work has, naturally, not been without its critics: for instance, Botha (1984) makes some severe criticisms of lexicalist work on compounding, while acknowledging its value, and Bauer (1990) argues that the notion of'head' has no place in morphology. It should also be stressed that the LH, while apparently endorsed by most generative morphologists, is not accepted by all. Anderson (1982) considers it a defining characteristic of inflection that it is 'syntactically relevant': derivation operates in the lexicon, independently of syntax, but inflection does not. Botha (1981) and Roeper (1988) propose that synthetic compounds are generated from syntactic underlying structures (in some ways a return to the ideas of Lees, but less unconstrained). Baker (1988) develops a sophisticated theory of incorporation which involves the incorporated noun being moved into the verb in the syntax. Kageyama (1982) proposes that a number of Japanese compounds are derived from corresponding sentential constructions. Shibatani & Kageyama (1988) identify a set of Japanese compounds which, they argue, are derived post-syntactically (in the phonological component). Marantz (1984) sees causative affixes (among others) as
The interaction of syntax and morphology in MT
75
'higher verbs' which 'merge' with the lower lexical verb at some point in a derivation. Such ideas are not confined to GB work: Gunji (1987) makes a GPSGstyle analysis ofJapanese in which the causative affix is viewed as a higher verb. It has even been claimed that 'the pendulum is swinging back away from strong lexicalism and in the direction of allowing syntactic word formation' (Shibatani & Kageyama 1988, p. 452). Perhaps it will be useful at this point to distinguish different interpretations of the LH, and to suggest how they might be interpreted in computational terms. Among other things, the Lexicalist Hypothesis might mean either of: (1) a. No rule may mention both elements of syntactic structure and elements of morphological structure, b. Word boundaries must be respected at all levels of representation. It should be noted that (la) does not ban a rule which describes some feature of a sentence and refers to a verb with the feature (vform = past}, it just disallows a rule which looks to see if that verb contains a separate past-tense affix. So percolating such a feature between the verb and the sentence-node does not contravene any interpretation of the LH. We shall see that whether a system violates the LH depends on how this hypothesis is interpreted (see 5.2). It might also be worth adding that the LH concerns the morphology-syntax relation, and need have no direct relevance to semantics. So a semanticallybased abstract level need not encounter any constraints from lexicalism (unless one wants to maintain a sentence/word dichotomy in semantics also). 1.2 It is uncontroversial that the number of sentences in any language is infinite and, hence, that translation via a 'sentence dictionary' is impossible. It is not so obvious, however, that mapping source- to target-language words is quite so unmanageable. We shall therefore begin this section by explaining why compositional translation of word-forms is necessary. The key notion here is a much abused one, that of productivity. This has been used in a number of senses, and we shall draw a useful distinction (cf. Bauer 1988, ch. 5) between productivity and generality. (Anderson (1985, pp. 19-20) makes essentially the same distinction, but in less clear terms: for him, generality is 'productivity sense 1', while productivity is 'productivity sense 2'.) A morphological process is said to be productive if it can be used in the formation of new words. A process is said to be generalized to the extent that it can be seen occurring in existing words. Bauer (1988, p. 61) cites the English nominalizing suffix -merit as an example of a form which is highly generalized but not productive. For a reverse example, we may consider the adjective must-see (as in a must-seefilm,from Ayto 1989): forming an adjective by compounding modal and verb is productive, but is hardly generalized at all.
76
Paul Bennett
It is productivity that necessitates the treatment of morphology in MT systems. If new words can be formed, words cannot all be entered in dictionaries (monolingual or bilingual). One frequently encounters productive uses of derivational affixes (e.g. boomlet, murderee, Murdochisation, unbiodegradable). Consider, too, the words with the prefix eco- coined over the last few years. Moreover, derivation is, in principle anyway, recursive: countercounter adaptation, re-rewrite (and cf. Carden 1983). The main area of productivity, however, is compounding: expressions such as goal-oriented and work environment can be coined very freely. And the number of compounds in English is infinite, since two nouns can be combined to form a compound noun, which can then be combined with another noun, and so on, giving sequences such as project management information. Inflection, too, is productive, as it can apply to new words: boomlets. We may refer at this point to Pulman (1987), who argues that there is a potentially infinite number of passive verbs in English, since proper names can be concatenated to form new proper names, which can be made into verbs by adding -ize, and all these verbs can be passivized (e.g. The system was Hewlett-Packard-ized). So it is impossible in principle to list all (passive) verbs. Generality is also a problem, however. A generalized process will be responsible for a mass of words (e.g. noun plurals, adverbs in -ly, adjectives prefixed with un-). It may be theoretically possible to list all such forms, but such a task would be enormous, and would fail to take advantage of the regular relations (in terms of both form and meaning) which exist between stems and derived words. For instance, rather than write bilingual dictionary entries which treat all of happy, unhappy, happily, unhappily, happiness, unhappiness as wholes, it would at least be worth exploring the possibility of analysing each word into morphemes and translating these (viz. happy, un-, -ly, -ness). In a sense, the generality of morphological processes provides the basis for resolving the problem of having enormously large dictionaries. A related notion here is that of lexicalization, which is again used in different senses. For instance, Bauer (1988, pp. 61 and 247) regards words like warmth as lexicalized, since the formation of nouns by suffixing -th is neither productive nor generalized. Additionally, a word such as research may be regarded as lexicalized for semantic reasons: its meaning is not a function of the meaning of re- and search. This leads us appropriately to the notion of compositionality, which is a central concept in this volume. Many words (e.g. research) must be lexicalized because they are not compositional: it is a condition that an expression can only be produced by generative rules if its meaning can be derived compositionally by rule as well. Lexicalization is far commoner among words than among syntactic constructs (cf. Di Sciullo & Williams 1987, p. 14). As will be seen, this last fact imposes special demands on the representation of complex words for MT purposes.
The interaction of syntax and morphology in MT
77
We will take lexicalization to mean that a word is represented as an atom, with no internal structure. So research will simply be an ordinary lexical unit with the feature {lu = research} (cf. {lu = search} in the case of search, like all single-morpheme words). It is pointless to propose representations and rules for the internal make-up of lexicalized expressions (as done by Lieber 1983, among others). Such internal structure could only be used in the case of caiques such as skyscraper and French gratte-ciel, which we take to be coincidences of no relevance to compositional translation. Despite the existence of lexicalization, however, it is clear that there is a need for translation to consider parts of words, and we can now begin to consider particular translation problems. Just as some syntactic processes are more common in some languages than in others (e.g. the various raising processes are more productive and generalized in English than in German; cf. Hawkins 1986), so some morphological processes are more common in some languages than in others. For instance, it is often stated in descriptions of Romance languages that they make far less use of compounding than English does (e.g. Lang 1990). In contrast, the Romance languages have a wider range of 'emotive' suffixes than English (e.g. Italian ragazzo 'boy', ragazzino 'little boy', ragazzaccio 'nasty boy'). And affixes which are translationally equivalent may vary in generality. We shall therefore now survey the range of problems which arise, though making no attempt at completeness. Inflection is perhaps less problematic in this regard than other areas of morphology. It is quite common for ideas expressed inflectionally in one language to be expressed some other way in another. For instance, the definite article in Danish appears as a suffix unless the noun is modified by an adjective (though this is sometimes seen as an example of cliticization rather than inflection, e.g. Sadock 1991, p. 115): (2) a. hus house b. huset house-the c. det store hus the big house Future tense is expressed by an auxiliary in the Germanic languages (John will leave), but by an affix in Romance (e.g. French Jean partira). Passive voice may be formed by an auxiliary verb plus past participle, or by a special form of the verb incorporating a passive morpheme {-rare in Japanese, -s in some Danish passives). Inflectional forms of common words are quite often lexicalized (cf. went with asked).
78
Paul Bennett
Derivation is awkward in a number of way s. One is that the same meaning may be represented by different affixes, e.g. the various nominalizing affixes in English, and one must be careful not to generate (say) *acceptation instead of acceptance. Affixes can vary in their generality, as already mentioned. Judge and Healey (1985, p. 293) note that there are more adjectives in English which permit suffixation of -ly than there are in French permitting suffixation of -ment. Thus French may have to have recourse to a variety of syntactic equivalents, especially with prepositions (e.g. avec concision 'concisely', a juste titre 'deservedly'). Many languages allow the formation of causative verbs which have to be expressed syntactically elsewhere (cf Italian imbruttire 'to make ugly'); it can be said that English has a 'lexical gap' in this case. Japanese is quite striking here, as virtually any verb can have a causative affix added to it: (3) a. Hanakowa hon o TOP book ACC 'Hanako read a book'
yon-da read-past
b. Taroo wa Hanako ni hon o yom-ase-ta TOP to book ACC read-cause-past 'Taroo caused Hanako to read a book' Or the translation of a derived word may be lexicalized (cf German verbessern 'to improve'). Sometimes one derived word may be translated by another, with both appearing to be compositional and yet neither being a compositional translation of the other (e.g. English unemployed is translated into German as arbeitsbs, lit. 'workless'). The translation of a prefix by a suffix (and vice versa) seems less common than one might expect: but, for instance, English -able is rendered in Chinese by the prefix ke- (as in ax 'love', keai 'loveable'). So we conclude that by no means all derived words have straightforward compositional translations as derived words. The same conclusion holds for compounding: not all compounds can be translated by compounds, which is a natural consequence of the fact that generality of compounding varies from language to language. Chinese allows compounding of two verbs, in the so-called resultative construction, e.g. pao-lei ('run till one is tired, lit. run-tired'; see Thompson 1973, Li 1990). Many English NN compounds are translated by N + preposition + N sequences in Romance languages, though here we encounter definitional problems, since it is not clear how many (if any) of such Romance sequences are compounds themselves. For instance, a war story may be translated into French as une histoire de guerre, and the cupboard door as laporte de Varmoire (Judge & Healey 1985, p. 21): can one simply say that presence vs. absence of the article in French before the second noun determines whether one is dealing with a
The interaction of syntax and morphology in MT
79
compound? Even if the answer to this question is 'yes', the fact remains that an English compound is translated by a syntactic construction in French (and vice versa, of course, a point often overlooked). Sometimes the translation may also be a compound, but with a different structure from the source: e.g. English canopener is Spanish abrelatas (lit. 'opens-cans', i.e. 3rd-person singular verb plus plural noun, with no explicit equivalent of English -ef). This last example nicely makes the point that compounding and derivation can interact, in ways that can create problems for translation. Many compounds (whether themselves lexicalized or not) may receive lexicalized translations, e.g. German Schreibtisch 'desk' is literally 'writing-table'. At this point it will be instructive to compare the kinds of translational problem sketched above with those which arise in the translation of the constituents of sentences. Structuralists claimed that 'languages differ more in morphology than in syntax' (Bloomfield 1935, p. 207). Translationally, though, the morphological differences are probably less challenging than the syntactic ones. At the syntactic level, languages differ in such matters as the applicability of processes which disrupt a neat linkage between a predicate and its dependants, e.g. dative shift, passive, raising, wh-movement. The idea of neutralizing such differences via a canonical form-type representation has arisen in order to solve these problems and contribute to simple transfer (cf. Allegranza etal, 1991, Durand etal, 1991, Badia, this volume). Morphology, however, appears to offer little along these lines. Rather, the problems thrown up are of three broad types: (a) a morphological construct may be translated by another of a different type (e.g. can-opener, abrelatas) - this is perhaps the nearest to the syntactic problems just referred to; (b) a morphological construct may be translated by a syntactic one (e.g. concisely, avec concision) - also comparable to the problems just discussed, and perhaps just a special case of them; (c) a morphological construct may be translated by a lexicalized expression (e.g. Schreibtisch, desk). The existence of regularly-created neologisms is, as mentioned, sufficient to justify the endeavour of translating the components of words. However, there is a big difference between morphology and syntax (cf. Halle 1973), viz. that most sentences used are new, while most words used are familiar. The fact that a word can be decomposed does not necessarily mean that it should be decomposed for MT purposes; just as people may both store complex words as wholes and yet be able to decompose them (Aitchison 1987, p. 161). This is a point we shall return to below. Another special characteristic of morphology is the distinction between actual and possible words (whereas separating actual from possible sentences makes no sense). 1.3. The idea of translating the parts of words, and taking the morpheme as the basic unit of translation, has received little close attention in MT work. In
80 Paul Bennett connection with the DLT system, Schubert (1987, pp. 85-6) raises the possibility of employing dependency trees where the nodes are occupied by morphemes rather than words. He does not adopt this proposal, however, largely on the grounds that some morphemes cannot be translated directly, as they simply represent features of content morphemes. This objection is forestalled by the featurization of inflectional morphemes proposed in section 2 below, which ensures that it is largely content morphemes that are represented in abstract structures. 2. Inflection In this and subsequent sections, we discuss some principles for the representation of word structure in MT, structuring the discussion according to the three traditional kinds of morphology. An initial issue which should be briefly addressed concerns the relations between morphological and syntactic trees. We shall assume that morphological trees are simply an extension 'downwards' of syntactic trees, and thus reject the view of Sadock (1991) that morphological and syntactic structures are strictly separate, perhaps involving differing constituency and subject to constraints on the links between them. This is because the kinds of morphology-syntax interaction at work in MT require access to a single integrated representation for a sentence, and linguistic trees do not permit crossing of branches.1 We begin with a consideration of inflection. Our central point here will be that representation of inflectional affixes depends entirely on the treatment of the constructions in which they occur, and that there is no independent theory of inflectional representation. We assume an approach in which much information is shown at the abstract level (for which we use the convenient abbreviation IS) in the form of features, even if it is realized by a separate constituent on the surface. Many inflectional features appear on words for agreement purposes (e.g. verbs agree with subjects in person and number, adjectives agree with nouns in number and gender). Such features simply have no place at a level designed with transfer in mind. In fact, we can propose the following principle, which uses the standard concept of an agreement target agreeing with a source or controller: (4) An inflectional feature F will not appear on a constituent which at other levels is an agreement target for F. F may at most appear on a constituent which at other levels is an agreement controller for F. So the number and person of a verb's surface subject will not appear on that verb at IS.
The interaction of syntax and morphology in MT 81
However, some inflectional features must be encoded in some form at IS, and we propose the following: (5) An inflectional feature F will only occur at IS if it has contrastive values which are semantically or syntactically relevant. So number must appear on nouns, since it is semantically relevant; but it will appear as a feature on the NP node, and a plural affix will not appear as such at IS. The Danish inflectional definite article (see (2)) will be featurized like all articles, thus neutralizing the surface differences between Danish and (say) English. Surface morphological case will not be represented at IS, as it adds neither semantic nor syntactic information. The various realizations of passivization illustrated earlier will be reduced to a single {voice = passive} feature on the sentence node: the fact that a clause is passive is far more important than whether a language encodes passive via an affix or a separate auxiliary (and in Chinese, neither of these methods is used, there just being a change in grammatical relations). Passive would be an example of syntactically-relevant information which may be inflectionally encoded on the surface and which survives in some form at IS. In the case of tense and aspect affixes and auxiliaries, too, the meanings these express must be represented in features, preferably drawn from a universal system (cf. van Eynde, this volume). We see, then, that the representation of inflectional information is dependent on how the construction it marks is handled, and that the same principles apply where the information is marked in other ways, e.g. by auxiliaries. Functional morphemes, such as auxiliaries and tense affixes, are prime candidates for featurization in a dependency-based framework. The inflectional features themselves will be transferred by default, and the lexical units by straightforward lexical statements. It will then be up to the target synthesis module to produce the correct forms. 3. Derivation The basic hypothesis to be explored in this section is the extent to which the parts of words formed via derivation can be translated compositionally, and the structures of the words simply transferred by default.2 3.1 If one looks at simple examples, e.g. English attain/attainable, German erreichen/erreichbar, and English blue/blueish, Danish bld/bldlig, translation of derived words looks straightforward, as if it can be done on the basis of ordinary morphological trees:
82
(6)
Paul Bennett
ADJ V attain
ADJ suf able
V erreich
suf bar
Specific transfer rules would be needed to handle cases where either source or target word was lexicalized. However, to conclude from examples such as these that simple trees are enough for transferring derived words would be as wrong as concluding from examples like The cat chased the rat that surface structure is always sufficient for translating sentences. The problems essentially centre on the affixes and their translation. The limited generality of affixes, and the (often complex) conditions to which they are subject mean that an affix may well have a range of equivalents in the target language, as in these Dutch and English examples: (7) a. on-aandachtig / in-attentive on-beducht / un-afraid b. zeker-heid / certain-ty treurig-heid / sad-ness c. bezoek-er / visit-or lop-er / runn-er To attempt to state all the possible many-to-many correspondences during transfer would be virtually impossible, and would in any case be quite misguided, since affix choice is a clear example of a monolingual matter which should be dealt with in synthesis not in transfer. The key to this problem is to recognize the distinction between the formal and functional aspects of word formation (to use the terminology of Anderson 1985, pp. 6-7). For example, the derivation of singer from sing in English can be seen from two perspectives. Formally, the suffix -er has been added; functionally, an agent noun has been formed from a verb. But -er also has other functions (e.g. to form instruments, as in mixer\ while the agentive function can take other forms (e.g. conversion in cook). Spencer (1991, pp. 428-33) discusses this idea under the name of the Separation Hypothesis. The solution in an MT context is to render each affix in some abstract manner such that all affixes which are functionally identical are treated the same way. Each surface affix can be treated as the realization of some 'abstract affix' which is defined in terms of semantics and the operation it performs. So -er might be described as an agentive or subject nominalizer; likewise the various ways of forming deverbal process nominals would be given the same abstract affix (for a brief discussion of this way of regarding the semantics of affixes, see Aronoff
The interaction of syntax and morphology in MT
83
1974, p. 50). A partial list of abstract affixes, with exemplification from a variety of languages, is: (8) a. # neg 'not' un-happy, in-acceptable, in-util b. # re 'again' re-draft, re-devenir c. # subj 'performer of the action' drink-er, Lehr-er, emplea-dor d. # action 'the process or result of some activity' develop-ment, observa-cion, Frischhalt-ung e. # state 'the state or quality of having some property' sad-ness, posibil-idad, egal-ite f. # cause 'cause to be' simpl-ify, ver-einfachen g. # become 'become, INCHOATIVE' dark-en, palid-ecer, er-roten h. # adverb 'in such a way' quick-ly, rapide-ment, felice-mente Some of these may need to be further specified, e.g. the difference between single and collective performers (manager vs. management), between processes and results (the different senses of examination), or different kinds of negation (non-British vs. un-British). It is not necessary that an abstract affix always attach to a single class of stem, e.g. the non- in non-smoker may also be an instance of # neg. At IS, there will be abstract affixes, while at other levels concrete affixes will occur. This is entirely parallel to the framework of Zwanenburg (1988), who distinguishes two morphological levels, a 'bare' level which contains affix-labels such as 'action' and 'agent', and an 'enriched' level which contains individual affixes such as -tion and -er. Booij (1986) objects to such an approach on the grounds that affixes which realize the same abstract type may differ in distribution and productivity - but this is really an argument for the approach in question in an MT context, since these differences can be expressed in monolingual components and transfer can profit from their neutralization. With the device of abstract affixes, translational problems are much reduced. In analysis, an affix is just replaced by a specification of the abstract affix it realizes. In transfer, the affix is (except where the target word is lexicalized) translated by default. The task of synthesis is to ensure that the right affix is
84
Paul Bennett
chosen. To a great extent, then, the problem is made very similar to that of translating inflectional affixes, with both kinds of affix being rendered in more semantic terms at IS, and more of a processing burden being put on the monolingual components. In the case of derivation, however, choice of correct affix in synthesis is by no means straightforward. Many affixes combine only with stems which meet certain conditions, often phonological conditions which are not easy to state in the case of systems which only deal with written language. Consider, for instance, English -en (as in soften), which only attaches to adjectival stems which are monosyllabic and end in a plosive or fricative. Such information can indeed be given in lexical entries, but an awful lot of specific information may need to be included (an alternative might be to try to derive such information from the written form of a stem, but it remains to be seen how accurate and time-consuming this would be). In other cases, affixes are subject to semantic constraints, e.g. English un- does not (with some exceptions) attach to adjectives with negative meanings (cf. unhappy with *unsad)? An idea from theoretical linguistics which does seem likely to be useful in this regard, when dealing with instances of multiple affixation in a single word, is that of level-ordering morphology (Siegel 1979). On this view, affixes are divided into two classes, class I and II (a distinction independently motivated by facts related to stress), and class I affixes cannot attach outside class II affixes. *Fearlessity is excluded, since -less is class II but -ity is class I; instead, the form fearlessness is found, containing the class II suffix -ness. By incorporating such restrictions into the rules building up word structure, a number of incorrect affix choices can be excluded fairly simply. Comparable level-ordering analyses have been proposed for Italian and Dutch (Scalise 1984, pp. 87-90), and for Japanese (Kageyama 1982), but it remains to be seen whether they are extendable to all relevant languages. It has also been objected (Fabb 1988) that level-ordering is too weak, in that there are large numbers of ill-formed affix sequences which it fails to rule out. Fabb's own solution is to specify the combinatory restrictions of affixes in greater detail. Fpr instance, a number of affixes (e.g. adjectival -ish) never attach outside another affix, while -ity only attaches to Latinate stems which end in a consonant. It may well be that this kind of approach is more promising than level-ordering, but again its applicability to languages other than English would need to be investigated. So far, then, we have argued that affixes are to be represented by abstract affixes which represent their meaning. An implicit assumption is that derived words analysed thus show a high degree of cross-linguistic correspondence. Perhaps the best way of illustrating this point is to discuss an example where it does not obtain, as in these English/German examples:
The interaction of syntax and morphology in MT
85
(9) a. beauty - Schonheit b. beautiful - schon In English the adjective is derived from the noun, but vice versa in German. If such examples were widespread, the idea of compositional translation of derived words would be unappealing, but in fact such cases are uncommon. The next topic we can tackle is that of the relevance of the prefix/suffix distinction to IS. As was mentioned earlier, prefixes generally translate by prefixes, and suffixes by suffixes, though we did give an example where this is not the case. Because of this last kind of example, and the fact that morpheme order is a superficial phenomenon that is irrelevant to abstract representations, we shall assume IS structures which neutralize the distinction between prefixes and suffixes. IS structures, then, consist of a stem and an affix, and it is a task for the synthesis component to order the affix correctly (just as it has to order, say, modifying adjectives correctly). Infixes and circumfixes, where they occur, can be handled in the same way - at IS they are just affixes. Equally, other morphological processes can be treated like ordinary affixes at an abstract level. For instance, one method of adverb-formation in Chinese includes reduplication, as in man 'slow' and manmande 'slowly'; at IS this will just be considered as having #adverb as a sister of man, just as slowly is represented with #adverb as a sister of slow. There remains the problem of parasynthesis, whereby a prefix and suffix are added simultaneously to a stem, giving a tripartite structure rather than the binary trees common in morphology.4 Possible examples would be Italian invecchiare 'to become old', from vecchio 'old', and French antiparlementaire 'antiparliamentary' from parlement 'parliament'. It is in fact controversial whether the proposed examples should really be handled as involving parasynthesis (cf. Scalise 1984, pp. 147-51, Durand 1982). The point we would make here, however, is that parasynthesis analyses are not appropriate at IS, since a single affix can be posited instead. So invecchiare is simply the inchoative of vecchio, and there is no need to complicate transfer by proposing a specially-complex structure. Analyses involving parasynthesis are never motivated in semantic terms - i.e. that the meaning of the complex word must be seen as determined by the interplay of stem, prefix and suffix - but rather on the grounds that adding only a single affix does not yield a well-formed word. We would suggest that this latter criterion is irrelevant at IS. In particular, the presence of the infinitival affix is not an IS matter, as it is inflectional. So, for instance, German vereinfachen 'simplify' (with stem einfach 'simple') would be represented as:
86
Paul Bennett
(10)
x. adj einfach
aflf # cause
(There is no reflex here of the infinitive, which merely happens to be the citation form.5) 3.2 We now turn to the issue of dependency and headedness within word structure. As mentioned in 1.1, the notion of 'head' has been employed extensively in morphology, though its use is controversial. Rather than survey the various theoretical approaches, we shall consider the implications for MT. If we extend the dependency-based representations adopted in other chapters (especially by Badia) to the realm of word structure, we shall need to identify the head of a complex word and its frame and dependants. It seems clear that both stem and affix contribute to the properties (syntactic and semantic) of a derived word. But if, as an example, teach is the head of both teacher and teachable, then there is no explanation for the differing categories of the latter pair. Hence, if the stem is the head, the idea that complex unit and head are categorially linked cannot be maintained. We may thus regard the affix as the head, and say that, for instance, -er actually belongs to the category of nouns, giving us the following kind of representation (with the head or governor conventionally ordered on the left): (11)
ti gov,n #subj
argl,v teach
The frame of -er is that it selects for a verb (which is here regarded as an argument of the affix). We give the IS for the German translation Lehrer, to show how these representations simplify transfer: (12)
ii gov,n #subj
argl,v lehren
Only the translation of teach to lehren, which is needed independently, is required here. However, it is not so clear that all affixes merit treatment in this same way. It is, in particular, class-maintaining affixes which seem to call for a different
The interaction of syntax and morphology in MT
87
approach, since with them the category of the derived word is the same as that of the stem. It is here that an asymmetry between prefixes and suffixes shows: most prefixes (though not quite all) are class-maintaining. Among suffixes, it is in particular (though not only) the evaluative suffixes that are classmaintaining. It seems reasonable, therefore, to argue that certain affixes, because they simply modify the stem and do not fundamentally change its semantics, should not be analysed as heads. Here, too, the hyponymy test for headedness has its clearest results: if a ragazzino is a kind of ragazzo, then it seems unexceptionable to take the stem as the head of the former. Translational equivalence would support this, since boy is the head of the English translation, viz. 'little boy'. However, it is not the case that all classmaintaining affixes should necessarily be analysed as non-heads (e.g. adjectival negative un- might well be regarded as a head). 3.3 We have dealt above with one problem of synthesis, viz. that of determining the correct affix, a difficulty which, incidentally, exists irrespective of whether the 'abstract affix' approach is employed. Another problem which arises is that of blocking, where a compositional form does not occur, apparently because another form (lexicalized or not) has the same meaning and 'usurps' the slot into which the regular formation would fit. Aronoff (1976, pp. 44-5) suggests that the formation of *gloriosity from glorious is blocked because the abstract nominal from glorious is glory. However, gloriousness is allowed, because -ness forms (unlike -ity forms) are not listed in the lexicon and so cannot be overridden. Unfortunately, this idea of listedness is not compatible with the approach taken here of explicitly not listing compositional derived words. On the other hand, it may be that, in this case, the restrictions on the attachment of -ity mentioned earlier will suffice. It is generally acknowledged that blocking is an ill-understood notion, and it does not seem to be straightforwardly amenable to computational implementation. Andrews (1990, p. 519) attempts to give a formal Morphological Blocking Principle within a unification-based framework, but it cannot be described as efficiently computable. What is needed, we suggest, is a mechanism for transferring a complex word into a lexicalized one which automatically involves the prevention of a compositional translation. It may be that the hierarchy of priority mentioned by Schubert (1987, p. 162), by which a more specific rule prevails over a more general one, is on the right lines, but his description is not precise enough. (For another approach, see, Sadler (this volume) on incorporation.) We have discussed the neutralization of positionally-defined affixes and of affixes with differing form but the same function. Our next topic is zeroaffixation or conversion, where no explicit affix is identifiable. This occurs both systematically (e.g. German and Dutch adverbs being formally identical to
88
Paul Bennett
adjectives) and sporadically (e.g. the English verb saw formed from the noun saw). Conversion is an awkward topic, which has received little attention within recent theoretical work on morphology. Lieber (1981) argues that most zero-affixation analyses be outlawed, on the grounds that the putative zero affixes do not behave like bona fide affixes (they neither define a unique lexical class nor have a consistent effect on argument structure). Her solution is to propose that neither word in a conversation relation is derived from the other. Rather, both occur in the lexicon and are related by a redundancy rule, which is highly rated by the evaluation metric. Unfortunately, this solution - and the whole idea of redundancy rules relating independently-existing entries appears to be quite unhelpful in a computational situation, where the redundancy rules serve no useful purpose. So we shall attempt a rather different approach here. Let us recognize a zero affix as one possible realization of an abstract affix if and only if the relation between source word and derived word is semantically the same as that which obtains when there is an overt affix (this principle is made explicit as the Overt Analogue Criterion by Sanders 1988, but we do not require that the overt affix occur in the same language). So English free (as a verb, meaning 'make free') may be analysed as follows: (13)
v^ gov,v # cause
argl,adj free
This simplifies transfer to languages where the equivalent has an overt affix, e.g. German befreien (with stem fret). Under this approach, a zero affix will be used for those Danish adverbs which are identical in form to adjectives. Danish falls between English and German in having sizeable numbers of adverbs formed by both zero- and overt-affixation (e.g. glad 'happy, happily', dejlig 'wonderful', dejligt 'wonderfully'). In addition, German adverbs will be analysed as involving a zero affix, on translational grounds, even though it is rare for a German adverb to have an explicit affix. But idiosyncratic examples like English saw and paint will simply be treated as instances of lexical ambiguity, with no statement relating them at all. Conversion represents a problem in synthesis, however, when the zero affix is one of a number of possible realizations of an abstract affix. We cannot see any prospect for describing the appropriate conditions for synthesizing conversion forms, other than stipulating that the zero affix cannot attach to a form containing an overt affix. We may have to fall back on lexical specification, i.e. stating in the target language dictionary that the causative of free is just free, and relying on some version of blocking to exclude *enfree etc.
The interaction of syntax and morphology in MT
89
This of course is not an attractive solution. Especially in a bilingual system, it might be preferable in such a case to simply translate befreien as free, on the grounds that, while compositional translation is possible in principle, it is in practice so difficult that the advantages are outweighed by the problems. This last point leads us to a more general conclusion to this section. Just because compositional translation of the parts of a word is feasible, it may not in every case be advisable. Even if (say) quickly can be handled by translating quick and -ly, it may be easier and safer to translate it as a whole. Certainly this would seem a useful strategy for very common words. We suggest that the extent of adopting this approach should be decided on practical rather than theoretical grounds, influenced by considerations such as frequency of words, properties of the language involved, size of lexica, and so on. 4. Compounding We made the point about derivation that it is the translation of affixes which is hardest, as the open-class morphemes can more or less be left to look after themselves, i.e. can be transferred by means of the same lexical transfer rules needed when they occur as words in their own right. This might suggest that the translation of compounds is relatively easy, perhaps complicated only by interaction with derivation. However, we shall see that there are a number of problematic areas in the translation of compounds, even though one can to a large extent rely on standard transfer rules to translate the parts of compounds.6 We can first point out that the identification of compounds is not just a theoretical problem facing the linguist, but also a practical one facing any NLP system. The problem of identifying a compound in a text is not the same as that of identifying one in the abstract. It has to be acknowledged that sequences containing blanks can constitute compounds (e.g. project announcement); different languages have different orthographic conventions, and languages like English are not internally consistent, with different speakers following different rules. The problem is even worse in Chinese and Japanese, where the scripts do not mark word divisions at all. In fact, it cannot even be assumed in all cases that sequences written without a blank are words not phrases: Trommelen & Zonneveld (1986, pp. 157-8) argue that such Dutch expressions as stokdoof 'stone-deaf are adjective phrases, not adjectival compounds. It is also difficult to decide on internal bracketings when a compound has more than two elements; cf. (14) a. [book review] section b. newspaper [review section]
90
Paul Bennett
4.1 We shall concentrate, for practical reasons, on compund nouns. Let us begin with a relatively simple example. Comparing English toothbrush with German Zahnbiirste, we seem to have straightforward compositional translation which can be done on entirely superficial trees:
n tooth
n brush
n zahn
n biirste
Nevertheless, we do need rather more sophisticated representations than these. In the first place, the notion of head seems to be essential in compounding: not only does the head determine the category of the compound, it also determines its semantic features and (in some languages) its gender. There will therefore be very general percolation of features between compound-node and head. The examples here are non-synthetic compounds, where the non-head is not an argument of the head, but a modifier. Introducing the notions of head and modifier, and ordering the head canonically on the left, gives us (for just the English example, now): (16)
«* gov,n brush
mod,n tooth
This also neutralizes ordering differences between languages which differ in the position of the head in compounds, e.g. English and French. Let us stick with this toothbrush example, and consider now its French translation, brosse a dents. If we assume that this is indeed a compound (cf. earlier discussion), we must ask ourselves how the preposition a is to be represented (cf. Durand, this volume). Even though de and a may contrast in such structures (e.g. tasse de the 'cup of tea' vs. tasse a the 'teacup'), we shall regard such prepositions as semantically empty, like the strongly-governed on in rely on, and hence as not meriting structural representation at IS. This is because they cannot be modified or coordinated, nor do they have their general lexical value (which in the case of a may be taken as 'motion towards'). So brosse a dents would be shown as follows (with the modifier lemmatized to its singular form): (17)
n. gov,n brosse
mod,n dent
The interaction of syntax and morphology in MT 91
In analysis, the feature {pform = a) would occur on the 'dent' node; this might render transfer to another Romance language easier. In translation into French, it would be the task of the synthesis component to produce the correct position here, perhaps relying on the semantic features of the two nouns. An alternative approach would attempt to specify far more precisely the relation between head and non-head in these examples, by classifying the modifier relation. In this case one might propose: (18)
11
gov,n brush
mod,purpose,n tooth
The kind of modification in question would then determine the generation of the correct preposition. In this case, a occurs rather than de when the modifier describes the use or purpose of the head (this is among the criteria on which the choice between these prepositions is standardly explained in grammars, e.g. Judge & Healey 1985, p. 333). Unfortunately, this solution encounters a major obstacle, namely that the possible semantic relations between the parts of an NN compound are virtually unlimited: even a single compound may be compatible with a variety of relations, which context and pragmatic inferencing can help the hearer to choose between (cf. Downing 1977, Sparck Jones 1983). It is extremely difficult to formulate a system of classifying such relations that will be both consistent and objective, let alone one that an NLP system can successfully exploit. Moreover, there is little reason to think that such a system would in its entirety be needed for translation. Consequently, we propose that no such fine-grained set of semantic relations be employed in MT; rather, a system should attempt to do what it can on the basis of the simple 'modifier' relation and the appeal to semantic features in synthesis. This is admittedly not perfect, but is undoubtedly the better of two unsatisfactory solutions. It illustrates a point where the idea of compositional translation begins to break down: the interpretation of NN compounds is constrained by the meanings of the parts, but is not totally determined by them (unless one regards the interpretation as being an anodyne 'related to5). The idea of not attempting to specify in a grammatical description the myriad possible interpretations of NN compounds is accepted by many authors (e.g. Booij 1979, p. 993; Selkirk 1982, pp. 22-3; Lieber 1983, p. 260). As an example of how semantic features can be exploited in synthesis, we give some Dutch-English examples. Most Dutch NN compounds can be translated simply as NN compounds in English, but some require or prefer the use of an ^/-phrase:
92
Paul Bennett
(19) a. bedrijfsniveau industry-level 'level of industry' b. herhalingsgraad repetition-degree 'degree of repetition5 It is abstract nouns involving some notion of measurement which resist compounding. We have dealt above with compounds where the non-head is not an argument of the head, and have suggested that simply describing it as a modifier may be sufficient. Now, however, we turn our attention to synthetic compounds, where the non-head is an argument of the head, e.g.: (20) a. b. c. d.
truck-driver can-opener drug-resistant theory-development
It is clear that one of the arguments of a frame-bearing predicate can be filled inside a compound; this same argument slot cannot then be filled outside the compound (* truck-driver of cars). Various proposals have been made to account for frame-satisfaction in both compounds and syntax along similar lines (e.g. Selkirk 1982, Lieber 1983). A proposal following the representations given for compounds above might handle (20a) as follows: (21)
ii gov,n driver
arg2,n truck
Truck is shown as the arg2 (deep object) of driver, which is assumed to be an argument-bearing head (cf. the driver of the truck). (However, there is a problem which we ignore here, concerning the internal structure of this compound and the interaction between the derivational affix -er and the remainder.) The other cases in (20) could also be represented in a comparable way, e.g.: (22)
adj gov,adj resistant
arg2,n drug
The interaction of syntax and morphology in MT
93
Another problem we can identify here is that of discovering the precise status of the non-head: (23) a. book donations b. library donations Information about semantic features is needed to determine that book is logical direct object in (23a), but library is logical indirect object in (23b). As we saw above when discussing modifier non-heads, the exact relation of a dependant to its head is usually far less explicit in morphology than in syntax, a kind of neutralization which creates problems for parsing and hence for MT. An important issue which we have left aside here is that of the number and definiteness of the non-head noun. Sometimes a singular noun must be translated as a plural, as in this English-Spanish example: (24) a. car production b. production de coches production of cars Such examples provide a difficult test for interlingual accounts of number and determination, since 'singular' and 'absence of article' do not have the same interpretation inside compounds as outside them. 4.2 We have assumed above that translation of compounds is indeed compositional, including the putative fact that a lexical unit receives the same translation irrespective of whether it is part of a compound or part of a syntactic construction. Thus tooth is rendered as Zahn or dent in both He lost a tooth and He lost a toothbrush. This is an important claim, and one that appears to be generally correct. However, we are aware of three kinds of counterexample, which merit some discussion. The first kind is items which have special translations in compounds. For instance German Haupt when a word in its own right is translated into English as head', but when a modifier in a compound it is usually translated as main, as in Haupteingang 'main entrance'. Our second kind of counterexample is taken from Chinese, and is based on the different occurrences of bound and free morphemes in this language. Consider the following possible and impossible words: (25) a. shu book b. shudian book-shop
94 Paul Bennett c. *dian shop d. shangdian shop As a bound morpheme, dian 'shop' can only be used as part of a compound (cf. (25b)); when a syntactic free form is needed, shangdian must be used. It may be sufficient here to translate shop as dian or shangdian, leaving it to the synthesis module to choose the appropriate form. The third problem is one which touches on other areas and so invites a lengthier discussion. We refer here to the phenomenon of relational adjectives, adjectives which are semantically related to nouns and may indeed be derived from nouns, and which in some ways behave more like nouns than adjectives. An example would be industrial, derived from industry (with which it is synonymous) and occurring in attributive position only, where it has to follow other classes of adjective: (26) a. b. c. d. e.
They want to avoid industrial unrest They want to avoid industry unrest *The unrest is industrial They want to avoid severe industrial unrest *They want to avoid industrial severe unrest
Other examples are solar, annual and atomic. The concept of relational adjective is a familiar one in French linguistics (cf. Judge & Healey 1985, pp. 281-3, Durand 1982, pp. 19ff). The term is not common in the English grammatical tradition, though the notion has been referred to quite frequently (e.g. the nominal non-predicating adjectives of Levi 1978). Booij (1979, pp. 994-5) uses for Dutch the term 'denominal, reference-modifying adjectives', and assigns them the interpretation of the underlying noun. From an MT point of view, the problem is that an NN compound may sometimes need to be translated as relational adjective + noun (or some other structure). That is, the non-head noun in NN may not get the translation it receives when it functions in syntax. Lambrecht (1984, p. 773) does not use the term 'relational adjective', but notes that a German NN compound very often translates as an English relational adjective + noun expression: (27) a. Judenfrage Jews-question 'Jewish question'
The interaction of syntax and morphology in MT
95
b. Orientteppich Orient-rug 'Oriental rug' c. Kernphysiker nucleus-physicist 'nuclear physicist' A French construction involving a relational adjective may be rendered by an NN compound in English: (28) a. consommation energetique consumption energetic 'energy consumption/consumption of energy' b. marche mondial market world (i.e. adjective from noun monde 'world') 'world market' Ordinarily, world is to be translated as monde; but it may be translated as mondial when it functions as non-head in an NN compound. Very often, a French expression involving a relational adjective is a better equivalent of an Engish NN compound than is N + prep + N, since the relational adjective can express a wider range of relations between the two nouns (van BaardewijkResseguier 1983, p. 79). One way of simplifying transfer here is to claim that relational adjectives just are nouns at IS, so paralleling the deep structures of Levi (1978), who gives plenty of evidence for the nominal origins of these adjectives. It also reflects the semantic interpretations suggested by Booij (1979), who argues that a sequence of relational adjective + noun is semantically equivalent to an NN compound, i.e. 'N bears some relation to N\ The synthesis module would then have the task of providing a relational adjective where needed, though the conditions under which an adjective or noun is preferred here in English are complex and fuzzy (Levi 1978, pp. 224-5). We are assuming, then, structures and transfer mappings such as: (29)
n n,gov market
n n,mod world
n,gov marche
n,mod monde
This seems fairly promising, but there are a number of potential difficulties. One is that of recognizing relational adjectives in analysis, since (a) not all
96 Paul Bennett denominal adjectives are relational, and (b) there can be cases of polysemy/ homonymy between relational and non-relational adjectives (cf. a nervous illness and a nervous child). Another is that of synthesizing the correct relational adjective, since these are not always derived straightforwardly from the noun, e.g. solar is not a derivative of sun, nor annual of year. It seems that the English module would have to contain a statement to the effect that solar is in certain circumstances the surface realization of the IS lexeme sun, a statement which would preferably apply in both analysis and synthesis (cf. Levi 1978, p. 152). In the last few paragraphs, then, we have examined potential problems for the hypothesis that the same lexical transfer rules operate word-internally as operate in sentence translation. We have seen that the extent of these problems can be reduced (though probably not eliminated) by positing less surface-tied IS representations. The final issue to be addressed in this section is as follows. The whole idea of compositional translation of compounds (and, indeed, of most adjective-noun combinations) rests on the assumption that there is a sizeable amount of crosslinguistic correlation in the use of structured vs. unstructured names for objects and concepts. That is to say, within a lexical hierarchy organized on the principle of hyponymy, it is rare for all items to bear an unanalysable label, for at some point a compound or phrase will be used. Compare the right-hand ends of the following hierarchies: (30) a. physical object - artifact - furniture - chair - armchair b. physical object - living thing - fruit - apple - cooking apple In these cases, the lowest level within a single-morpheme name (chair and apple) is that of the basic level category, a perceptually-salient category which combines a number of properties such as early acquisition and ease of ostensive definition. These ideas have been extensively explored by cognitive psychologists (e.g. Rosch 1977). Our point is that it does appear to be generally true that compounds are only (but not invariably) used for categories subordinate to the basic category. This is not logically necessary, for one can easily conceive of a language in which a chair was termed 'sitting furniture'. If languages used compounds at any random slot on such semantic hierarchies, the idea of compositional translation of compounds would be far less attractive.7 5. Two issues 5.1 We first address the problem of bracketing paradoxes (Williams 1981, Pesetsky 1985), cases where morphological and semantic considerations suggest differing structures for complex words. For ungrammatically, facts
The interaction of syntax and morphology in MT
97
about level-ordering require (31a), while the compositionality principle implies (31b): (31) a. [un [grammatical ity]] b. [[un grammatical] ity] Un- must attach outside -ity, as the former is level II, yet the meaning is clearly 'state of being ungrammaticaP. Pesetsky's solution is to make use of two different morphological representations, one of which satisfies level-ordering and other morphophonological requirements, and the other of which meets semantic compositionality. It is the bracketing at the latter, semanticallyrelevant, one (Pesetsky's Logical Form) which is relevant to IS (though the traces proposed by Pesetsky are not needed). However, the other representation must be used to help choose the correct affix. Bracketing paradoxes have also been identified in the realm of compounding - or, more precisely, within the sphere of compounding-derivation interactions. Consider, for instance, the expression urban sociologist (Spencer 1988), in the reading 'student of urban sociology'. This appears to call, on semantic grounds, for the bracketing seen in (32a): (32) a. [[ urban sociolog] ist] b. [[ urban] [sociolog ist]] The problem with the alternative, (32b), is that it risks eliciting a translation equivalent to 'sociologist who is urban'. This seems to be similar to such examples as ungrammaticality, viz. that the more semantically-oriented structure is needed at IS, whereas the alternative structure may be better at other levels. However, this account finesses the problem of the morphological relation between sociology and sociologist, specifically of the final -y in the former. As Spencer points out, one can find comparable examples where no straightforward morphological relation holds: (33) a. a', b. b'. c. c\
moral philosopher moral philosophy electrical engineer electrical engineering southern Dane southern Denmark
(See further Spencer 1991, ch. 10.) The first of each pair is a personal noun describing someone related in some way to the prime examples. Spencer's solution is to propose that these personal nouns are formed via a kind of
98
Paul Bennett
productive back-formation. For instance, (33a) is possible provided that (33a') exists in the permanent lexicon, as also moral and philosopher. But there appears to be no computational equivalent of this analysis, and we shall leave this topic with the conclusion that these examples represent a real problem for the idea of semantically-motivated ISs which also respect morphological facts. In many cases here one is dealing with relational adjectives, but it is not clear how this observation helps to solve the problem. 5.2 So far we have assumed that we are still representing words as words, i.e. that a word like simplify still forms at IS a constituent belonging to the category verb. So the representation for this word would be as follows: (34)
gov,v # cause
argl,adj simple
where the mother V node would itself be the head of a sentence, as in this representation for The students simplify the problem: (35) argl,np gov,v #caus
argl,adj simple
arg2,np gov,n problem
gov,n student
We have also taken a kind of 'minimal' approach to the representation of compounds, assuming that they get their own special representations at IS, e.g. a noun compound belongs to the category noun and has an internal structure. So a fuller structure for a sentence such as The man bought a toothbrush would be: (36) v,gov buy
np,argl
np,arg2
n,gov man
n,gov n,gov brush
I
n,mod tooth
The interaction of syntax and morphology in MT
99
However, it may be objected that such representations do not prevent complex transfer where a non-simplex word has to be translated by a syntactic construction. So an alternative kind of representation may be preferable, which rejects the idea that words maintain their integrity at IS. Rather, in derived words, the affix is (in this case, as the details vary from one affix to another) regarded as the head of a clause, and as taking a number of phrasal arguments, one of which is the stem. Our example would, on this conception, be analysed as follows (of course, there are other possibilities for the frame of # caus): (37)
gov,v #caus
argl,np I gov,n student
arg2,np arg AO,ap I I gov,n gov,adj problem simple
(37) could be paraphrased as the students made the problem simple or the students caused the problem to be simple. Thus this approach neutralizes the distinction between syntactic and morphological means of expressing such notions as causation, agency, stative nominalization, etc. The price is that monolingual processing must be more complex, but transfer appears to be simplified. We shall refer to this as the 'syntactic' approach to morphological representation, as opposed to the 'word-based' approach outlined earlier. Under the syntactic approach no word is assigned a complex internal structure at IS (i.e. in effect there is no IS morphology). This of course implies contravening version (lb) of the Lexicalist Hypothesis as outlined earlier.8 Likewise, in compounds we could claim that non-heads project a phrasal node of their own (just as a modifier expensive would project an AP node of its own). This gives the structure: (38)
n,gov man
n,gov brush
np,mod I n,gov tooth
There would then no longer be separate IS representations for compounds, they would be assimilated to the syntactic gov-arg-mod model. The syntactic approach would seem to have two main advantages: (a) it
100
Paul Bennett
facilitates transfer to languages where a complex word has to be rendered via a syntactic construct, and (b) it makes the thorny issue of compound identification less problematic, since whether or not some structure is to be regarded as a compound has only a small effect on its IS representation. Roeper (1988) argues for his transformational derivation of compounds such as bike-riding partly on the basis of the Uniformity of Theta Assignment Hypothesis (UTAH) of Baker (1988, p. 46): Identical thematic relationships between items are represented by identical structural relationships between those items at the level of D-structure. Replacing 'D-structure' in this formulation by 'IS' gives a general theoretical underpinning for the syntactic approach to compounds, viz. that frames are satisfied in a uniform way, not one way in phrases and another way in compounds - just as they are not satisfied one way in a passive and another way in an active. Roeper (1988, p. 194) regards this as 'the classic argument from simplicity' in favour of his approach. Note that it is not possible to maintain both UTAH and the Lexicalist Hypothesis.9 We would emphasize here that these two approaches, the word-based and syntactic-based, while differing on an important point, are by no means diametrically opposed to each other. Both agree on the use of abstract affixes, both distinguish arguments and modifiers in compounds, both rely on the notion of hesidedness. We suggest that the choice between them depends on the properties of the languages involved in the system. For instance, the syntactic approach offers little advantage in the case of closely-related languages which differ little in the extent to which they use morphological mechanisms. But it becomes more useful as the languages handled become less similar. A general problem with the syntactic approach, however, is that it proposes IS representations where morphologically complex words have more complex structures than in the word-based approach and may not even be shown as constituents (cf. (37), where the representation of simplify is spread across the structure). It therefore makes it more cumbersome and long-winded to state mappings from complex to lexicalized words - and such mappings must be repeated each time such a translation is called for, whereas a general rule transferring a morphological to a syntactic structure can be stated just once and for all. This, then, is a general argument against the syntactic approach. Such mappings of structured to atomic objects in transfer are typical of morphology, and mark one respect in which this domain differs translationally from syntax.10 We can add that the simple fact that morphological constructions are sometimes translated by syntactic ones implies that (except possibly in the case of featurized inflectional morphology) there must be rules at some point in an MT system that refer to both syntactic and morphological objects. It is
The interaction of syntax and morphology in MT 101 therefore not possible for an MT system as a whole to respect version (la) of the LH, though individual components may do so. 6. Conclusion Let us now summarize our main proposals and conclusions. The existence of neologisms requires MT systems to handle translation of subparts of words; considerations of reducing dictionary-size also point in this direction. Morphology reveals a number of translational problems, not all of which are the same type as occurs in translation of sentence parts. Inflectional morphology can mostly be treated fairly straightforwardly by means of featurization. Derivation and compounding require structured representations in which governors, arguments and modifiers play a similar role as in syntax. Derivational affixes can be interpreted as surface realizations of abstract affixes, thus simplifying transfer considerably; conversion is recognized in certain cases; bracketing paradoxes are resolved by using the more semantically-oriented structure at IS. Compounds can be represented without delving closely into the interpretation of nonsynthetic types, while synthetics may involve argument satisfaction inside the compound. There are both 'word' and 'syntactic' approaches to the representation of complex words. We gave an argument for the word-based approach which relies on the frequency of correspondence between compositional and lexicalized forms, a consideration which suggests a difference between translation of syntax and morphology. More generally, however, we hope to have demonstrated both the necessity and the feasibility of applying to word structure the same principles of compositional translation that have been applied to sentence structure. The importance of this issue to MT research and development can now be seen as incontestable. Notes * Thanks are due to the authors of other papers in this volume, and especially to the editor, Frank van Eynde. I am more generally indebted to all those who have worked on problems of morphology in Eurotra, who are too numerous to list here - some (but not all) are mentioned in the notes. 1. This is not to say that Sadock's ideas are of no relevance, as it may well be that some of his work on 'morphosyntactic mismatches' can be incorporated into a stratificational framework in the form of different structures at different levels (cf. the discussion in section 5.1 on bracketing paradoxes). 2. A great deal of work on the translation of derivational morphology in Eurotra has been carried out by Coby Verheul, to whom this section is much indebted. 3. It is because of these constraints on generality and use of affixes that Whitley, in a
102 Paul Bennett
4.
5.
6. 7. 8. 9. 10.
language-teaching context, is led to the conclusion that 'pedagogy cannot treat word formation like sentence formation' (1986, p. 320). MT is not the same as secondlanguage teaching, but both are concerned with providing knowledge of a target language, and Whitley's point that morphology is in a certain sense more restricted than syntax deserves to be borne in mind. Clear distinctions between parasynthesis and circumfixing are not always drawn. Here we take parasynthesis to involve the simultaneous attachments of two affixes whose independent existence is clear, e.g. and- and -aire in antiparlementaire, while in circumfixing the two parts of the single affix have no independent existence. There is a problem here, viz. the thematic vowel which in the Romance languages occurs between stem and inflectional suffixes (it may be said to determine the conjugation class to which a verb belongs) - it would be a in Italian invecchiare. But this is obviously just the kind of surface phenomenon which has no relevance at IS. Of course, in synthesis it is necessary to produce the correct thematic vowel. Since choice of vowel appears to be arbitrary, we can only suggest that it be held as a feature in the lexical entry for the stem (here vecchio). The idea of parasynthesis structure at IS makes no contribution to solving this problem, and complicates transfer. For an outline of some Eurotra work on compounds, see Ananiadou & McNaught (1990). Lang (1990, p. 34) observes that Spanish often prefers to employ derivation where English would use a compound, e.g. naranja 'orange', naranjo 'orange tree'. This might be problematic for the position taken in the text. The syntactic approach to derivation and compounding has been explored in Eurotra by Achim Blatt, Marta Carulla, Dieter Maas and Fernando Sanchez. Though it should be noted that Baker himself (1988, p. 71) rejects an incorporationstyle approach to English compounds. For another discussion of these two approaches, see Allegranza et al (1991, pp. 6975).
References Aitchison, J. (1987), Words in the Mind, Blackwell, Oxford. Allegranza, V., P. Bennett, J. Durand, F. van Eynde, L. Humphreys, P. Schmidt and E. Steiner (1991), 'Linguistics for machine translation: the Eurotra linguistic specifications', in C. Copeland, J. Durand, S. Krauwer and B. Maegaard (eds), The Eurotra Linguistic Specifications, 15-123, Office for the Official Publications of the European Communities, Luxembourg. Ananiadou, S. and J. McNaught (1990), 'Treatment of compounds in a transfer-based machine translation system', Proceedings of 3rd International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Language, 57-63. Anderson, S. (1982), 'Where's morphology?' Linguistic Inquiry 13, 571-612. Anderson, S. (1985), 'Typological distinctions in word formation', in T. Shopen (ed.), Language Typology and Syntactic Description, vol. 3, 3-56. Cambridge University Press, Cambridge. Anderson, S. (1988), 'Morphological theory', in F. Newmeyer (ed.), Linguistics: The Cambridge Survey, vol. 1,146-91, Cambridge University Press, Cambridge. Andrews, A. (1990), 'Unification and morphological blocking', Natural Language and Linguistic Theory 8, 507-57.
The interaction of syntax and morphology in MT 103 Aronoff, M. (1976), Word Formation in Generative Grammar. MIT Press, Cambridge (Mass.). Ayto, J. (1989), The Longman Register of New Words, vol. 1, Longman, London. Baardewijk-Resseguier, J. van (1983), 'La non-alternance entre syntagme prepositionnel et adjectif de relation', Cahiers de Lexicologie 43.2, 73-84. Baker, M. (1988), Incorporation. University of Chicago Press, Chicago. Bauer, L. (1988), Introducing Linguistic Morphology. Edinburgh University Press, Edinburgh. Bauer, L. (1990), 'Be-heading the word', Journal of Linguistics 26, 1-31. Bloomfield, L. (1935), Language. Allen and Unwin, London. Booij, G. (1979), 'Semantic regularities in word formation', Linguistics 17, 985-1001. Booij, G. (1986), 'Form and meaning in morphology: the case of Dutch "agent nouns'", Linguistics 24, 503-17. Botha, R. (1981), 'A base rule theory of Afrikaans synthetic compounding', in M. Moortgat et al (eds), The Scope of Lexical Rules, 1-77, Foris, Dordrecht. Botha, R. (1984), Morphological Mechanisms. Pergamon Press, Oxford. Carden, G. (1983), 'The non-finite=state-ness of the word formation component', Linguistic Inquiry 14, 537-41. Chomsky, N. (1970), 'Remarks on nominalization', in R. Jacobs and P. Rosenbaum (eds), Readings in English Transformational Grammar, 184-221. Ginn, Waltham (Mass.). Di Sciullo, A. and E. Williams (1987), On the Definition of Word. MIT Press, Cambridge (Mass.). Downing, P. (1977), 'On the creation and use of English compound nouns', Language 53, 810-42. Durand, J. (1982), 'A propos du prefixe anti- et de la parasynthese en francos', University of Essex Department of Language and Linguistics Occasional Papers 25, 134. Durand, J., P. Bennett, V. Allegranza, F. van Eynde, L. Humphreys, P. Schmidt and E. Steiner (1991), 'The Eurotra linguistic specifications', Machine Translation 6, 103— 47. Fabb, N. (1988), 'English suffixation is constrained only by selectional restrictions', Natural Language and Linguistic Theory 6, 527-39. Gunji, T. (1987), Japanese Phrase Structure Grammar, Reidel, Dordrecht. Halle, M. (1973), 'Prolegomena to a theory of word formation', Linguistic Inquiry 4, 3 16. Hawkins, J. (1986), A Comparative Typology of English and German. Croom Helm, London. Hoekstra, T., H. van der Hulst and M. Moortgat (1980), 'Introduction', in T. Hoekstra et al (eds), Lexical Grammar, 1-48, Foris, Dordrecht. Hutchins, W. (1986), Machine Translation: Past, Present, Future. Ellis Horwood Limited, Chichester. Judge, A. and F. Healey (1985), A Reference Grammar of Modern French, Arnold. Kageyama, T. (1982), 'Word formation in Japanese', Lingua 57, 215-58. Lambrecht, K. (1984), 'Formulaicity, frame semantics, and pragmatics in German binomial expressions', Language 60, 753-96. Lang, M. (1990), Spanish Word Formation, Routledge, London. Lees, R. (1960), The Grammar of English Nominalizations, Mouton, Den Haag/Paris. Levi, J. (1978), The Syntax and Semantics of Complex Nominals, Academic Press, London. Li, Y. (1990), 'On V-V compounds in Chinese', Natural Language and Linguistic Theory 8, 177-207.
104
Paul Bennett
Lieber, R. (1981), 'Morphological conversion within a restrictive theory of the lexicon', in M. Moortgat etal (eds), The Scope of Lexical Rules, 161-200, Foris, Dordrecht. Lieber, R. (1983), 'Argument linking and compounds in English', Linguistic Inquiry 14, 251-85. Marantz, A. (1984), On the Nature of Grammatical Relations. MIT Press, Cambridge (Mass.). Pesetsky, D. (1985), 'Morphology and logical form', Linguistic Inquiry 16, 193-246. Pulman, S. (1987), 'Passives', Proceedings of 3rd European ACL, 306-13. Pulman, S., G. Russell, G. Ritchie and A. Black (1988), 'Computational morphology of English', Linguistics 26, 545-60. Roeper, T. (1988), 'Compound syntax and head movement', in G. Booij and J. van Maarle (eds), Yearboook of Morphology 1988,187-228. Rosch, E. (1977), 'Human categorization', in N. Warren (ed.), Studies in Cross-Cultural Psychology vol. 1, 1-49. Academic Press, London. Sadock, J. (1991), Autolexical Syntax, University of Chicago Press, Chicago. Sanders, G. (1988), 'Zero derivation and the overt analogue criterion', in M. Hammond and M. Noonan (eds), Theoretical Morphology, 155-75. Academic Press, London. Scalise, S. (1984), Generative Morphology, Foris, Dordrecht. Schubert, K. (1987), Metataxis: Contrastive Dependency Syntax for Machine Translation, Foris, Dordrecht. Selkirk, E. (1982), The Syntax of Words, MIT Press, Cambridge (Mass.). Shibatani, M. and T. Kageyama (1988), 'Word formation in a modular theory of grammar: postsyntactic compounds in Japanese', Language 64, 451-84. Siegel, D. (1979), Topics in English Morphology, Garland, New York. Sparck Jones, K. (1983), 'So what about parsing compound nouns?', in K. Sparck Jones and Y. Wilks (eds), Automatic Natural Language Parsing, 164-8, Ellis Horwood Limited, Chichester. Spencer, A. (1988), 'Bracketing paradoxes and the English lexicon', Language 64, 66382. Spencer, A. (1991), Morphological Theory, Blackwell, Oxford. Thompson, S. (1973), 'Resultative verb compounds in Mandarin Chinese', Language 49, 361-79. Thurmair, G. (1984), 'Linguistic problems in multilingual morphological decomposition', Proceedings of 10th Coling, 174-7. Trommelen, M. and W. Zonneveld (1986), 'Dutch morphology: evidence for the righthand head rule', Linguistic Inquiry 17, 147-69. Whitley, M. (1986), Spanish/English Contrasts. Georgetown University Press, Georgetown. Williams, E. (1981), 'On the notions "lexically related" and "head of a word"', Linguistic Inquiry 12, 245-74. Zwanenburg, W. (1988), 'Morphological structure and level ordering', in M. Everaert, A. Evers, R. Huybregts and M. Trommelen (eds), Morphology and Modularity, 395-410, Foris, Dordrecht. Zwicky, A. (1985), 'Heads', Journal of Linguistics 21, 1-29.
4
Dependency and machine translation Toni Badia
In a transfer-based machine translation system (MT system, henceforth) the structure of the representation upon which the translation applies (i.e., the interface structure) is a matter of crucial importance. Its formal characterization, its level of depth, the linguistic features of the text that it intends to capture, are all issues that have to be settled when starting to design such a system. In particular the need to sort out the syntagmatic relations between the constituents in the interface structure is a central point in this enterprise. In this chapter, a proposal for the representation of these relations is offered. One of the fundamental aspects of the proposal is the determination of the appropriate level of depth of the linguistic analysis underlying the interface representations, so that an adequate formal definition of them can be achieved. Although many of the points here raised have been developed within the linguistic theory of Eurotra, a neutral point of view has been taken, so that a general discussion derives which is not dependent on a particular MT system. The chapter is organized as follows. In the first section I introduce a distinction between two sorts of linguistic phenomena, singling out those that belong to the predicative core. The next two sections state some requirements that the representations of the predicative core have to fulfil. Section four discusses the level of depth which the analysis process has to reach. The two following sections are devoted to the formal development of a dependency grammar and the conditions that can be imposed upon the structures it produces. Next, section seven points out a few problems and difficulties in applying the resulting grammar to large parts of linguistic utterances. Finally, in section eight a particular instance of dependency grammar is formalized and exemplified. 1. Two sorts of linguistic phenomena The abstract representations upon which the transfer operation is performed are drawn from two different sorts of linguistic phenomena. On the one hand
106 ToniBadia there are the main buiding blocks of sentences that, as a consequence of the particular linguistic relations they entertain with one another, express some given relations between objects in the world, and on the other there are some linguistic elements that qualify, and even modify, these relations. Consider, for example, sentence (1) The girls do not seem to have understood the tale This sentence expresses a relation (namely, one of understanding) between some given entities (the girls and the tale). The linguistic elements that denote the entities involved in the relation are ordered in the linguistic string; if their positions were interchanged (and the necessary adjustments were made in order to preserve grammaticality) the resulting sentence would be (2) The tale does not seem to have understood the girls When comparing it with (1) it is easy to see that the difference lies in what is related to what. Since the understanding relation is not a symmetric one, to assert that the girls are in the understanding relation to the tale is not equivalent to assert that the tale stands in that understanding relation to the girls. However, this is not the only information conveyed by that sentence. Its meaning is not wholly described if only this relation is taken into account, as can be seen when comparing it to other sentences in which the same relation between the same entities is expressed, as in (3) a. b. c. d.
The girls have understood the tale Did the girls understand the tale? The girls will not understand the tale It is impossible that the girls understand the tale
Part of the meanings of these sentences is identical to the one of (1): the relations between the linguistic elements that denote, respectively, the girls, the tale and the understanding relation remain constant. Nevertheless there is no doubt that these sentences differ in meaning from (1) and that they differ also from one another. For example, by uttering (1) the speaker casts doubt on the girls' having understood the tale at some time fairly close to the utterance time, whereas by uttering (3a) one simply states that they have understood it. Using (3b) one asks whether the girls understood the tale at some time not very close to the utterance time; in contrast, when one utters (3c) one asserts that it is not true that they will understand the tale in some time posterior to the utterance time. Finally the sentence in (3d) expresses the impossibility that the girls understand the tale.
Dependency and machine translation 107 These few examples suffice to show that, for example, tense, modality and negation qualify and modify the meaning carried by the predicative core of the sentence. We are then faced with two kinds of linguistic phenomena. The fundamental distinction between them lies in the nature of the relations that the constituents of the sentence establish with one another. Firstly, there are those constituents that contribute to the predicative core of the sentence by their being either the head of the predication (i.e., the predicate) or one of the constituents related to it. But secondly, there are some constituents that are external to the predicative core and related to it as a whole. In some cases, the treatment of the second class of relations can be made by means of an abstraction process, so that the differing features of the surface strings of various languages can be overcome. This approach is possible with respect to negation, modality and some temporal operators, because they are expressed at the surface by means of closed classes of words. This is however not generally true for all sentential operators, since there are many that are construed out of predications or contentful phrases. On the other hand, the treatment of the relations between the constituents within the predicative core demands always an analysis process that is linked to the normalization or canonicalization of the relations between the constituents. This requirement derives from the fact that the relations among the basic units that constitute the predicative core of sentences are free and certainly linked to the ways speakers of different languages conceptualize the world. To try and follow a process of abstraction in this field would certainly mean to face lexical decomposition, as in the famous pair kill - cause to die, but also in the case of nouns, since they do not have full equivalences in different languages, like the noun 'field' which can be translated into Japanese in at least six different ways (Nagao and Tsujii, 1986). However at the present state of the art doubt can be cast on the usefulness of lexical decomposition for MT systems in large scale implementations. Firstly, the theoretical status of componential analysis is not clear. As Lyons (1977) points out, very often the semantic features are more a consequence of the lexical differences they are supposed to explain than a real reason for these differences. Secondly, there is no explanation for the fact that some of the combinations allowed by the semantic features are lexicalized and some are not. Finally, in an MT system, in the development of which perforce have to take part linguists belonging to different schools and traditions, it is doubtful that a certain degree of cohesion is obtained in lexical decomposition, so that the resulting abstract representations could be common to some extent. Note that to disallow lexical decomposition implies that MT systems are structured as transfer systems. The interface representations of the source language texts are not identical to the ones of the target language because of, at least, the differences in the open class lexical elements and the relations they bear with
108 ToniBadia one another. Hence a transfer device is needed that converts the former into the latter. This chapter presents a proposal for the representation of the predicative core of a sentence. The problems relating to the representation of the other linguistic phenomena that contribute to the sentence meaning are not discussed. 2. Lexical selection in transfer One of the aims when defining the representation of the predicative structure of sentences is to get as near as possible to the ideal situation in which all information apart from the lexical units is represented alike in the abstract representations for translational equivalents. The advantage would be that transfer rules ought to be written only to translate lexical units. This would mean that they are framed in a structure that is interlingual (with respect to the languages involved in the system). In order to approach this ideal as much as possible, there are two requirements that need to be fulfilled by the representation of the predicative core of sentences and that, at first sight, can be respectively characterized as external and internal to the representation of the predicative structure. Among the former there is the need that it be compatible with the treatment of the sentential operators; at the same time, it has to enable the formulation of the relations between elements belonging to different predicative structures (so that both long distance dependencies and anaphoric relations are easily formulated). Some of these aspects will be mentioned later in this chapter, others are the topic of separate articles in this volume (Allegranza, this volume). The requirements however that, when fulfilled, make it possible to formulate a definition of the representation of the predicative structure are the internal ones, that is to say, those that derive from the particular characteristics of the lexical units constituting the main building blocks of the predicative core and from the relations established among them. By considering them some conditions can be fixed that this definition has to comply with. As we shall see in this section and the next one, these requirements are basically related to lexical selection, which is one of the operations that must be performed during the analysis and transfer processes. As mentioned above, if decomposition of content lexical elements is not allowed in the system a transfer module is required that relates the lexical units of the source language to those of the target one. Hence, even in the ideal situation in which all general linguistic phenomena could have been dealt with interlingually the transfer module would be an essential part of the overall system architecture. The transfer rules could be written as follows:
Dependency and machine translation
109
(4) a. en lu = house -• fr lu = maison b. en lu = book -> fr lu = livre They describe how English lexical units (en_lu) have to be translated into French lexical units (fr lu). The transfer module between English and French would then be a list of pairings of English and French lexical units. Thus a representation originally stemming from an English sentence could be mapped onto a representation of a French sentence by simply translating the lexical units from one into the other, by means of these rules. However, in many, if not most, cases the lexical transfer rules cannot be formulated in such a simple way. A first source of difficulty appears when two different lexical units have a single graphic (or phonetic) form. Consider for instance the word 'book'. Its translation into French varies according to whether it is a noun or a verb, so that in the first case it has to be translated as 'livre' and in the second one, as 'louer'. The simple rules in (4) become in this case a bit more complex, since they have to contain information about the category of the lexical unit that is being translated, as can be seen in the following rules: (5) a. en lu = book, cat = n-»fr lu = livre b. en lu = book, cat = v -*fr lu = louer Obviously the category specification in these rules will only be useful if in the construction process of the abstract representation by the English analysis module the category information can be specified at each appearance of the word 'book'. This can reasonably be expected given the central role the category information plays in any linguistic analysis, so that the rules in (5) can be naturally used in the transfer process. Nevertheless syntactic category is not the only, not even the most common, source of discrepancies in the translation of lexical units from one language to another. A quick glance to any dictionary shows that lexical units, even though belonging to a single category class, do not have a unique meaning. Consider, for example, the English noun 'bank', some of the meanings of which are listed in (6): (6) a. b. c. d.
a river bank a bank of sand a bank in a road the Bank of England
Each noun phrase in (6) exemplifies a different meaning of the noun (as a matter of fact, they correspond to different entries in a dictionary). Consequently, each of these meanings is appropriate to a particular context; thus it
110 ToniBadia is not adequate to talk of walking along a bank (meaning along a banking institution), nor of having money in the bank (meaning in the bank of a road). If we now turn to the Spanish noun 'banco', which is a good translation of 'bank' in some of its uses, we find out that it also has different meanings, some of which are listed in (7): (7) a. a', b. b'. c. c'. d. d\ e. e\
un banco para sentarse a sitting bench un banco de carpintero a work bench el Banco de Santander the Bank of Santander un banco de peces a shoal of fish un banco de arena a bank of sand
A comparison of the lists in (6) and (7), corresponding to (some of) the meanings of'bank' and 'banco' respectively, shows that in some cases one is the translation of the other and in some others this is not the case; that is to say, 'bank' in some of its meanings has to be translated into Spanish by a noun other than 'banco', and inversely in some uses of 'banco' it cannot be translated by 'bank'. The correspondences holding between the different uses of 'banco' and 'bank' are listed in (8): a. b. c. d. e. f. &
a river bank a bank of sand a bank in a road the Bank of England a sitting bench a work bench a shoal offish
— — — — — — —
orilla banco pendiente banco banco banco banco
As this is a phenomenon that affects most content lexical units of languages, it is clear that lexical transfer rules are to be written in such a way that they relate, not just lexical units, but meanings of lexical units. Hence they have to specify on their left-hand side, by means of some feature(s), to which particular meaning they must apply. Thus, some of these rules could be written in the following way, in which some lexical semantic features are used in an obvious way: (9) a. sp b. sp
lu = banco, furniture = yes -• en lu = banco, institution = yes -• en
lu = bench lu = bank
Dependency and machine translation
111
Just as before with the syntactic category, the efficiency of the system obviously relies on the possibility that in the process of construction of the abstract representation of the source language sentences the right meaning is selected of the different lexical units that occur in them. This is then a constraint that must be forced upon the abstract representation of sentences, particularly of their predicative structure.
3. A dependency structure To select the appropriate reading of a lexical unit is one of the functions that must be performed during the analysis process, i.e., before the transfer module starts operating. Naturally this selection can only be done by taking into account the context in which the lexical unit occurs. This holds both for the cases of different interpretations based on the syntactic category and for those based on simple meaning differences. In the first one the selection is nearly always determined by the syntactic constituents of the sentence. Consider the following sentences: (10) a. I couldn't book such a car b. John has read a book The occurrence of the auxiliary 'couldn't' just in front of 'book' in (10a) determines that 'book' has to be interpreted as a verb, and that of the article 'a' in (10b) compels the interpretation of 'book' as a noun. The constituency structure of the English sentences suffices to determine the adequate intepretation of 'book' in both cases. What is interesting in this example is that it shows that the constituency structure has to be taken into account, not only to obtain the abstract representation of sentences, but also to perform part of the lexical selection. This structure however cannot contribute to lexical selection when one has to choose between different meanings of the same syntactic category, as in the example of 'bank'. Although they are certainly funny, there is nothing ungrammatical about sentences like those in (11): (11) a. I've deposited some money in the (road) bank b. He's been walking all day along the bank (institution) The constituent structure of these sentences does not favour, nor prevent, any particular interpretation of 'bank', because in all of them it is a noun (which therefore can be the head of a nominal phrase, with or without complements), and even a noun that can have a locative value (which is supposed to be
112 ToniBadia necessary in the sort of complements 'bank' heads in (11)). So, if the sentences in (12) are translated into Spanish the correct translation is going to be obtained in addition to those deriving from the interpretations in (11). (12) a. I've deposited some money in the bank. b. He's been walking all day along the bank of the canal. On the other hand, there is no doubt that these sentences are not ambiguous to a human speaker. The context, the rest of the elements in them and the relations they bear to one another, determine that the interpretation of'bank' is a unique one. To these relations then we turn now, to see whether they can be of any help in performing lexical selection. The first idea that has to be extracted from the previous reasoning is that syntactic relations (as they are expressed in the constituent structures of sentences) are of no use to perform lexical selection in these cases. To establish that a verbal phrase has to contain a verb and, possibly, a list of complements, and even to determine the class of these complements, does not help in selecting the right interpretation of'bank' in the sentences in (12) above. More important is the fact that in (12a) the verb is 'deposit' and in (12b) 'walk'. Depositing money is something that is usually done in public institutions, and walking can be done in different places (among which, along a bank). In the second case there is the complementary information that it is the bank of a canal, along which he has been walking all day (and not, say, along the bank of this or that river). Therefore, the relevant information for choosing the adequate interpretation of 'bank' is relational in nature: in (12a) the bank is related to depositing, and in (12b) it is related to walking and to the canal. Hence, in the analysis process this relational aspect has to be taken into account, besides the configurational one of course. In particular, then, we want the representations to express the relations between the lexical units in such a way that lexical selection is facilitated. Those relations will, moreover, have to be oriented, so that it becomes clear which element is dependent on which, or which one is related to which other; consequently, a sort of hierarchy is established between the lexical units of a sentence, and between the relations they bear to one another. We can then say that in the sentence in (12b) 'along the bank' depends on 'walk' and that 'of the canal' depends on 'the bank'. In order to express these relations we will follow Tesniere's dependency grammar. Another aspect that is worth considering with respect to these relations is that they are often useful for interfacing with conceptual, or world knowledge, structures. On the one hand, it might prove necessary to access such an extralinguistic level of representation in order to fix once and for all the adequate interpretation of a sentence. On the other, the closer the representa-
Dependency and machine translation
113
tions are (from a structural point of view), the easier is it going to be to frame the interface between them. With respect to lexical selection, there is a limit in the possibility of dividing and subdividing interpretations of a particular lexical unit in a linguistically motivated way (Van Eynde, this volume). However, very often these subdivisions prove to be useful, if not necessary, at least from an MT perspective. Consider the Spanish noun 'reloj', which is used to refer both to clocks and to watches. There is obviously no linguistic motivation for a distinction here, but they have a different translation into English (and indeed in many other languages): i.e., 'clock' vs. 'watch'. There certainly are some cases in which it is possible to determine which interpretation is intended. For example, if 'reloj' appears as the object of a sentence with a reflexive pronoun (as in (13)), it has to be interpreted as 'watch': (13) a. Se quito el reloj lit.: (s/he) took him/herself the watch off s/he took off his/her watch b. Me he mirado el reloj lit.: (I) have looked myself at the watch I have looked at my watch There are many cases, however, in which no decision can be arrived at during the analysis process. Then the access to a world knowledge module might help in deciding which one is the adequate interpretation. For example, if 'reloj' appears in a context in which other furniture denoting words occur the preferred interpretation is that of a clock, whereas when the context includes some clothing the selected interpretation is that of a watch. This obviously implies that, in addition to the semantic features which are linguistically motivated, some encyclopedic features should be available (Moreno, 1990), which enable to express the differences in interpretation shown up during the access to the world knowledge module. On the other hand, two different sorts of relations can be found in a conceptual structure that is built to capture the relations that objects bear with one another in the real world. Firstly, there are the paradigmatic relations between conceptual entities which can be substituted for one another in the predicative structures. Among them, the relations can be included that stand between, say, 'bench, chair, table, cupboard', and so on, and that can be structured according to a certain set of characteristics (for example, furniture to seat, eating room furniture, etc.). And secondly, there are the syntagmatic relations between different conceptual entities, which enable them to be articulated and hence to express different situations of the real world. We are interested in this second type of relations, because they are organized around
114 Toni Badia some key concepts, i.e., the predicative elements of the resulting structure (which is usually called frame or conceptual structure). The interesting point about frames is that there is a parallelism between the relations that the conceptual elements bear with one another in a frame and those that the linguistic elements bear in the structure shown by a predicate and its arguments. As a matter of fact, there are some authors that have brought up the possibility of stating a mapping between linguistic and conceptual structures (Allen, 1987), so that when the linguistic head has been mapped onto the relational head of the conceptual structure the actual projection of the different arguments onto the different roles can be directed. As mentioned by Allen, the fact that the two structures are independent from one another makes it possible that the conceptual representation is more fine grained than the linguistic one and that therefore different elements that stand in a unique relation of linguistic dependency can be assigned different roles in the conceptual structure. At the same time, this duplicity allows that in the latter the projection of elements is introduced that are not present in the frame of the linguistic head (even if they are in a dependency relation to it). There are reasons for thinking that the interface representation takes the form of a dependency structure. Firstly, we thus ensure that the relations between the content lexical units are structurally represented and that therefore the (semantic or otherwise) selectional restrictions between them can be easily stated. And secondly, the interface between these structures and those stated in a conceptual level of representation can also be easy to design. The next task is then to characterize as completely as possible how these dependency structures are to be shaped.
4. Level of depth of dependency structures As we have just seen there are reasons to think that a dependency structure is adequate for the interface representations in an MT system. The question, however, remains of what kind of dependency structures we have in mind, since different levels of depth in linguistic analysis entail different sorts of relations between the constituents of a sentence and consequently a different kind of dependency. In the linguistic tradition there is a level of syntactic representation that integrates the notion of dependency rather superficially, namely the level in which the syntactic functions of the constituents are the key relations in the process of building linguistic representations. In fact to say that a constituent is a subject, a direct object or an indirect object is to maintain that they stand in a particular relation to their head (that is to say, to the verb). These are then dependency relations. Consider for example the sentence in (14).
Dependency and machine translation
115
(14) John has given those books to his sister The main building blocks of the predicative structure of this sentence are the verb 'has given' and the three noun phrases that occur in it ('John', 'those books' and 'to his sister'). Each one of the latter stands in a particular relation to the verb and instantiates a different grammatical function: 'John' is the subject, 'those books' is the direct object, and 'to his sister' is the indirect object. A graphical interpretation of this structure could be the one in (15), where the grammatical functions are labels of the different dependency relations. (15)
John
indirect object
has given
those books
to his sister
Such a structure might seem to enable the correct formulation of the selection restrictions that the verb imposes upon its arguments: for example, that the subject and the indirect object have to be animate and that generally the direct object has to be concrete. This, however, is not true, since the verb 'to give' can be realized in the passive voice with an agent (16a) or without one (16b), and in these cases the subject is not animate anymore. (16) a. Those books were given by John to his sister b. Those books were given to his sister In a similar way a structure like the one we are discussing, based on syntactic functions, cannot be mapped uniformly onto a conceptual structure. Let us suppose that the conceptual structure corresponding to an action of giving only has three participants: the giver, the given object, and the receiver. It is not possible that the sentences in (14) and (16) can be mapped in a similar way to the conceptual structure, if the syntactic functions are the distinguishing element in the linguistic dependency structure: in (14) the subject has to be mapped onto the giver role, whereas in the two sentences in (16) it has to be mapped onto the given object role. A dependency structure based on syntactic functions therefore does not fulfil the requirements that we have imposed on the interface structure. Moreover such a structure is not adequate for MT purposes, since syntactic functions vary enormously from one language to another. Even when considering a verb like 'give' which has the same subcategorization structure in English and in
116 ToniBadia Spanish (that is to say, subject, object and indirect object), there are many cases in which the most natural translation is not that which maintains the source language syntactic functions of its constituents, but one that changes them because of a change in the verb's voice. Thus, in normal conditions the sentence in (16a) is not translated into Spanish as a passive sentence, but as an active one, with possibly the object topicalized or dislocated, like the ones in (17). (17) a. Aquellos libros diojuan a su hermana lit.: those books gave John to his sister b. Aquellos libros los dio Juan a su hermana lit.: those books them gave John to his sister Since the syntactic functions of the constituents vary according to the voice of the head verb, in order to obtain the most natural translation of a sentence like the one in (16a) the syntactic function of some of its constituents has to be changed. On the other hand, the verb 'give' can be the head of a sentence having two object noun phrases, thus yielding a construction exemplified in (18a) that does not have an equivalent in Spanish. (18) a. John gave his sister those books b. His sister was given those books by John There is then another situation in which a change in grammatical functions is necessary in order to obtain a translation (not the most natural one, but simply a translation) of that sentence into, say, Spanish. But English can also have those sentences in the passive voice, as in (18b), which results in still another change of syntactic functions, if the translation has to be performed from English into Spanish. A dependency structure based on syntactic functions then is not fit for an interface representation in an MT system. The basic problem is that syntactic functions are too close to the surface form of sentences to be common to translational equivalents in different languages, since several morpho-syntactic factors (like the voice of the main verb) determine the syntactic function of the constituents. A step has to be taken towards a deeper level of representation which complies with the requirements that we have set forth for an interface structure. In our previous discussion of syntactic functions we tacitly assumed the existence of some relation between the constituents of a sentence that is uniformly kept even if the syntactic function varies. For example, no one would accept (19b) as a Spanish translation of the English sentence in (19a), even if the syntactic functions of the constituent noun phrases are maintained in the two.
Dependency and machine translation
117
(19) a. His sister was given those books b. Su hermana dio estos libros lit.: his sister gave those books What makes us think that they are not a translation from one another is that the deep relations between the head verb and the subject noun phrases are different. Although the object noun phrase ('those books' and 'estos libros', respectively) has the same relation in both sentences, the subject in (19a) is the beneficiary of the action of giving and the one in (19b) is the agent. A change in a deep relation is enough to determine that the two sentences do not have the same meaning and that consequently they cannot be a translation from one another. This notion of a deep relation has been developed in various grammatical systems and theories under different names: deep grammatical functions (Perlmutter, 1983; Perlmutter and Rosen, 1984), thematic relations (Jackendoff, 1983; Wilkins, 1988), cases (Fillmore, 1968), valencies (Somers, 1986; 1987; Kirschner, 1987; Schubert, 1987; Gebruers, 1988), transitivity roles (Halliday, 1967/8; 1970), etc. With these notions linguists intend to capture those regularities in the relations between the head verb of a sentence and the noun phrases that are their dependants, even if their surface syntactic functions vary. There is a point then in having them as the starting point of a dependency structure that fulfils the requirements we have set forth. The corresponding structure of the sentence we have been using as an example is the following: (20)
John
beneficiary
has given
those books
to his sister
In this structure we find the same constituents as in (15). The difference lies, of course, in the labelling. The present labelling does not change even if the syntactic function changes. The structures corresponding to some of the sentences we have previously discussed would be the following: (21)
Those books
beneficiary
were given
by John
to his sister
118 Toni Badia b.
theme
Those books c. His sister
beneficiary were given
beneficiary
to his sister theme
was given
those books
Although the grammatical functions of the constituents vary from one particular realization to another, their deep relations, on which the dependency structure is based, remain constant; that is to say, 'those books' is always the theme, and 'his sister' the beneficiary of the act of giving. This regularity is what makes this particular level of analysis appropriate for MT. Firstly, the selection restrictions can be easily stated with a maximum of generality. With respect to predications headed by the verb 'give' it is the theme which has to be concrete (and not the direct object), so that this restriction also applies to the subjects in (21a, b), since they are labelled as themes (and not as subjects). On the other hand, the restriction on animacy applies to both the agent and the beneficiary of the predication (and not to its subject and indirect object), so that it correctly applies in the examples in (21), even if the grammatical functions do not correspond. Secondly, it is easy to state the mapping rules from this representation to a conceptual structure, because it can also be done in a uniform and general way. Different mapping rules are not needed anymore according to the syntactic function of the constituents (which of course would render impracticable the proposal), but general rules can be stated: the agent always maps onto the giver (have it the syntactic function of subject or that of by-object), the theme always maps onto the given object (be it the object or the subject), and the beneficiary always maps onto the receiver (be it the indirect object, the subject or the direct object). Finally, such a structure is also suitable for MT purposes, since what made the previous proposal inadequate was the fact that the syntactic functions of the constituents could be different in various languages (even when meaning was preserved through the translation process). In the examples we have examined the variations in syntactic function were determined by the voice of the head verb of the sentence. By establishing the dependencies at a deeper level of analysis we have succeeded in labelling the dependants in a way that is independent from the voice of the verb; hence the translation of the dependants can be done independently from the rules that govern the translation of voice.
Dependency and machine translation
119
In general, dependency structures are adequate for MT because they are based on relations that are deep enough to overcome the differences in their surface realization in different languages. This is particularly significant with respect to syntactic functions as we have just seen. As a matter of fact a linguistic model based on dependency enables a nice formulation of the interaction between lexical properties and grammatical structure, so that a unique characterization is obtained of the relational properties of lexical units that enter in different surface realizations which nevertheless represent identical relations. Note also that in dependency structures the order of the elements is independent from surface order; as a consequence no special rules for order are needed in transfer. A suitable level of depth for an interface structure in an MT system is that which is determined by the deep relations between constituents, which roughly correspond to cases and valencies. We need, however, to generalize over this level of depth, since it is generally agreed that not all dependants of a verb are given case by it. Consequently, in the next sections we are going to adopt dependency grammar as the formal definition of the interface level of representation and try to design it in such a way that all sorts of constituents of the predicative core of sentences are easily represented. 5. A dependency grammar The dependency grammar to be presented in this section is a derivation of classical dependency grammar (for example, Hays, 1964; Hudson, 1984; Mel'cuk, 1988), worked out so that it is suitable for the interface representations in MT systems, such as those developed in Grenoble (Vauquois, 1975) and Eurotra (Allegranza etal, 1991; Durand etal, 1991). The starting point is a recursive definition of dependency constructions: (22) A DEPENDENCY CONSTRUCTION consists of a governor together with zero or more dependants which are DEPENDENCY CONSTRUCTIONS. This definition relies heavily on the concepts of governor and dependant. What is called the governor here is also known as the head in other traditions. We may characterize it by means of a set of factors which are relevant to its identification (see also Zwicky (1985) and Hudson (1987)): a. the governor determines the type of the denotation of the whole construction; for example, the denotation of 'blue cars' is a subset of the denotation of 'cars'.
120 ToniBadia b. the governor determines the agreement features of (some of) its dependants; for example, in French noun phrases the noun determines the gender and number feature of its adjectival dependants. c. in a construction X + Y, X is the governor if the distribution of X equals the distribution of X +Y. d. the governor is the frame bearer. The lexically-specified frame states its idiosyncratic combinatorial properties; it exerts lexically-specified selectional restrictions on its frame-bound elements, and it conditions the morpho-syntactic realization of its dependants. Although these criteria seem to presuppose that the governor represents both a syntactic and a semantic point of articulation in linguistic constructions, it is not always the case that the governor identified by each of these notions for a given construction is coincident with one single element. For example, in a construction with the verb 'to live' we naturally consider that the verb subcategorizes for a locative prepositional phrase, as is shown in (21). (21) a. John lives in Paris b. * John lives But even if the verb exerts selectional restrictions on the prepositional phrase (the preposition must be locative, and the whole phrase must denote a place), it does not determine its morpho-syntactic realization, since any locative preposition would do. On the other hand, there are cases in which the governor (as a bearer of syntactic properties) does not coincide with the semantically determined one, as appears with attributes: (22) John painted the wall red Clearly both 'the wall' and 'red' are some sort of dependants of the verb: 'the wall' is the object, and it is allowed as such because it is an NP (morphosyntactic frame of the verb) and it is concrete (selectional restrictions of the verb); 'red' is an attribute of the object and though it is not morphosyntactically determined by the frame of the governor, it is certainly subject to verb selectional restrictions. Yet there is also a relation between 'the wall' and 'red', which is clear in languages in which adjectives are inflected, as is shown in the Spanish translation of (22): (23) Juan pinto roja la pared
Dependency and machine translation 121 However, the discrepancies in the application of the factors of governor detection are greater when we consider a surfacy representation of a string than a more abstract one. For example, at a morpho-syntactic level attributes to the subject may pose a problem for governor determination; at this level, the verb has clear morpho-syntactic properties which make it a governor of the copular construction, although it does not exert any selectional restriction either on its object or on its subject. But at a deeper level, the copula may be considered not to belong to the predicative core of the sentence; hence, it would not be present at the level of depth considered and therefore there would only be one candidate for governorhood, namely the attribute. Since we are trying to define dependency with respect to a relatively deep level of analysis, most of these discrepancies will not show. So we accept as valid the list of factors that contribute to the determination of the governor of a phrase or clause. From that list, a characterization of dependants emerges: a. the dependants do not determine the type of the denotation of the construction; for example, if they can be interpreted intersectively they restrict the denotation provided by the governor. b. the dependants are subject to the agreement features that the governor imposes upon them. c. the distribution of a dependant does not necessarily coincide with that of the whole construction; it may coincide however, since a dependant can be of the same syntactic and semantic class as the whole construction (an NP can be a dependant of a noun). d. dependants are subject to the frame restrictions imposed by the governor. Interestingly, the last factor does not hold for all types of dependants. This leads us to recognize two different types of dependency relations, depending on whether they are subject to frame restrictions by the governor or not. We thus distinguish between: (24) a. ARGUMENTS : dependants restricted by their heads. b. MODIFIERS : dependants not restricted by their heads. As a consequence of the definition of a dependency construction (see (22) above) and of the characterization of both the governor and the dependants, we can formulate as corollaries the two following principles: (25) a. Every construction must have (at least) one governor. b. Every dependant must be in a dependency relation to a governor. An example may be useful in order to illustrate the application of our characterization of dependency constructions. Consider the sentence in (26).
122 ToniBadia (26) Today John has given a broken car to the new secretary of the committee Since we are dealing with a representation defined at a certain level of depth, it is clear that some elements in the surface string do not play a role in our dependency structure; this applies to the auxiliary, the articles and the prepositions in sentence (26). In the dependency structure we will build these elements are not present as separate lexical units, i.e. the information they carry is conveyed in some other way (either as features or as operators). This gives us as input to the dependency grammar an abstraction of the string in (26), which we may represent by a simplified constituency structure as in (27), where the elements that do not constitute a component of the dependency structure have been eliminated. (27) [[Today] [John] [give] [[broken] [car]] [[new] [secretary] [committee]]] The first step is to look for the governor of the whole sentence. This clearly is the verb 'give', since its frame indicates that two noun phrases and one prepositional phrase headed by 'to' can be its arguments; moreover these constituents satisfy the selectional restrictions the verb imposes on its arguments, namely that the subject and the indirect object are animate, and that the object is concrete. A further dependant of the verb is to be found in the adverb 'today', which is not subcategorized for by the verb and has thus to be seen as a modifier. This gives us a preliminary dependency structure, shown in (28), in which the main dependency relations are represented. [Today] [John] [give] [broken car] [new secretary committee] However there are still some constituents in (28) that do not enter any relation. For each of the dependants of the verb that contains more than one open class lexical element, we have to decide what is the governor and what are its dependants. So for example, in 'broken car', the noun is the governor because it is the one that distributionally behaves in a similar way as the whole constituent. An interesting example is provided by the other complex dependant, namely the one which has the syntactic function of indirect object. Once the abstraction process has been completed and the closed class words have been eliminated, its constituency structure has the form shown in (29): (29) [[new] [secretary] [committee]] If we now apply to this constituent the criteria we have just mentioned, both 'secretary' and 'committee' are candidates for being the governor of the
Dependency and machine translation 123 construction, since they are both nouns and behave distributionally in a similar way as the whole construction. This however is wrong, because among the eliminated elements there is the preposition 'of which cleary determines the orientation of the dependency relation between 'secretary' and 'committee'. It appears then that the dependency relations between the main constituents of a phrase or sentence are often determined by their surface morpho-syntactic features and that consequently the meaning of these has to be carried over as input to the dependency grammar that builds their dependency structures. After applying these criteria to the complex phrases in (28) we obtain a more complete dependency structure, in which every single constituent formed by a lexical element is in a particular dependency relation to some governor, with the exception of course of the verb governor of the whole sentence. This yields a structure like the one in (30).
[[Today] [John] [give] [[broken] [car]] [[new] [secretary] [committee]]] In the discussion of the example we have just considered, there has been the underlying assumption that the general architecture of the system is multilayered. That is to say that there is some grammar which produces a constituent structure to which the dependency grammar is applied, so that the final dependency construction is achieved. Actually, this overall design is common to Eurotra and GETA, although it does not materialize in the same way in the two systems: in Eurotra different structures are created at each level of analysis whereas in GETA only one structure is created to which further information is added. Since the dependency grammar we have defined corresponds uniquely to the last level of analysis, our claim that a dependency grammar is fit to build the interface representations of MT systems cannot be taken to be a claim about its usefulness for an overall NLP system. In particular it does not claim anything about the grammar that captures the surface constituency of linguistic expressions. Ours is then a claim that differs considerably from that in Mel'cuk (1988:13): 'Dependencies are much better suited to the description of syntactic structure (of whatever nature) than constituency is'. Dependencies, we claim, are suited to build the monolingual abstract constructions that are used in MT systems to interface with one another. As these dependency relations are defined only at this deep level of analysis, we can do with only two such relations, namely arguments and modifiers. This again shows that they are rather different from the larger sets of surface relations found in both the Dependency Grammar and MT literature (Mel'cuk, 1988; Schubert, 1987).
124 ToniBadia Furthermore, there is nothing in the notion of dependency thus defined that prevents it to be applied to structures other than trees. For example, the MiMo system operates with an interface structure grammar very similar to the Eurotra one and yet it is implemented in a PATR like formalism, thus creating feature structures instead of trees (Arnold and Sadler, 1989; van der Eijk, 1989). 6. Conditions on dependency structures The definition we have given of dependency grammar, and of its resulting structures, is very loose, since different applications can yield very different results. Some examples of alternative ways of implementing the relational notions of dependency can be the following: only lexical heads are allowed, or else empty elements are permitted as governor; stress is put on the syntactic conditions for determining the governor of a phrase, or alternatively the semantic selectional restrictions have a greater weight; governors can specify features only at the top node of their dependants, or otherwise they can inspect into their sub-structure. Although there is nothing in a definition of dependency like ours that favours one or the other of these possibilities, it is essential to raise the question of their validity in an MT system because they determine to a large extent the level of depth at which the grammar applies (and, consequently, the level of depth of the interface representations themselves). As a result of considering these questions we will be able to formulate some further conditions on dependency structures. Classical dependency grammars embody what we may call a strong lexical governance condition, which can be stated as follows: (31) STRONG LEXICAL GOVERNANCE CONDITION All and only lexical elements are governors. For example, Schubert (1987) commits himself to this condition by stipulatively excluding any representation of empty elements. However, some linguistic phenomena cannot be treated at a certain level of depth without postulating empty elements. One such phenomenon is control of the subject of infinitival complements as in (32) John promised Mary to go. The constituent 'John' in (32) is an argument of'promise', but also of 'go'. This can be seen by considering both the syntactic frame of those verbs and the selectional restrictions they impose upon their arguments. To promise' sub-
Dependency and machine translation
125
categorizes for three arguments, namely the agent, the theme and the beneficiary, and 'to go' subcategorizes for a single argument (which may be agent or not). On the other hand, the (simplified) selectional restrictions of 'to promise' are that the agent and the beneficiary must be human and the theme propositional; those of 'to go' are that its argument must be mobile. The sentences in (33) fulfil these conditions. (33) a. The train goes b. John promises Mary that the train is going very soon In (33a) the argument of 'to go' is mobile, though not human. The whole sentence of (33a) can be the theme of 'promise', as in (33b). But when the argument of 'go' is not human (i.e., it is just mobile), it cannot be at the same time the agent of 'promise': (34) *The train promises Mary to go very soon In a control structure then there is a constituent that is a dependant of two verbs. In order not to have double dependencies for a single constituent, we allow empty nodes which can be coindexed with a full one whenever a control relation exists between them. Thus the proper dependency representation of (32) is (35), in which the re-entrancy is expressed by coindexation of the full node and the empty one. (35)
/ ^ [LJohnl] [promised]
^ \ [Mary]
[[e^TTgo]]]
We thus reformulate the lexical governance condition in order to allow empty nodes as dependants in the interface constructions: (36) WEAK LEXICAL GOVERNANCE CONDITION All lexical units are governors. Only lexical units and co-indexed elements are governors. It has sometimes been claimed that phrasal nodes are needed, in addition to lexical units and co-indexed empty elements, in order to deal with composite predicates. For example, the sentence in (37) is said to have a composite predicate, which is formed by the verb 'take' and the noun phrase 'a walk': (37) John has not taken a walk since last Monday.
126 ToniBadia This type of predicate poses a problem for MT because the verb cannot be translated on its own, but only in conjunction with the object noun phrase. Consequently, it is claimed, to deal with them in a composite way (i.e., as a composite governor of the sentence) eases transfer, because the transfer rules can operate upon pairs of verbs and nouns. However there does not seem to be reasons enough to introduce this new relaxation on the governorhood condition, since from a pragmatic point of view there are means of dealing with this phenomenon without the need of phrasal heads and from a theoretical point of view composite predicates belong to a larger class of phenomena which pose general problems for translation, namely collocations. A natural corollary of the definition of dependency construction we have provided, is that a construction has only one head, which we may express by the following condition: (38) UNIQUE GOVERNOR CONDITION A construction has one and only one governor. The implementation of this condition at large, however, runs into a new difficulty. In some cases the syntactic and the semantic criteria for governorhood do not point to a single element. This is particularly striking in the case of attributes. Consider the sentence in (39). (39) John is ill From a syntactic point of view, the head of the sentence seems to be the verb 'to be'. However, from a semantic point of view there is no doubt that the governor is the adjective 'ill', because the selectional restrictions are imposed upon the subject by the adjective, and not by the verb: (40) a. *That house is ill b. That house is green c. *This proposal is green Since 'ill' selects an animate argument, (40a) is wrong and (39) is correct. On the other hand, 'green' selects concrete arguments, thus yielding the acceptable sentence in (40b) and the unacceptable one in (40c). Secondly, the use of non-predicative adjectives (relational adjectives in Bennett, this volume), in attribute position results in ungrammatical sentences: (41) a. *The trip was presidential b. *The pollution is industrial
Dependency and machine translation
127
This is a consequence of the adjectives not being predicates at all. Therefore the acceptability, or otherwise, of attributive sentences depends on the adjective: it must be predicative and its argument (i.e., the subject of the sentence) must comply with the selectional restrictions it imposes. If we further consider a series of sentences like the ones in (42), there seems to be no doubt that the verb 'to be' in attributive sentences has to be analysed as a raising verb (i.e., as not belonging to the predicative core of the sentence). (42) a. b. c. d.
It seems that John is ill John seems to be ill John seems ill John is ill
In all of them the propositional core is the same, namely the one expressed in the dependency structure in (43), even though the propositional attitude of the speaker is not the same in all of them.
(43)
s
^ ^ [[John]
, *l is neither possible nor necessary for this treatment within a context-free grammatical system. The node-by-node recursion w.r.t. SLASH occurrence is achieved therein by exploiting the recursion w.r.t. structure building as performed by context-free phrase structure rules (or similar). In a sense, the SLASH treatment of UDCs is 'spread' throughout the grammar, so as to ensure appropriate specifications in any rule capable of building a depth = 1 subtree that may occur within the scope of an unbounded dependency. In this respect, the SLASH treatment is basically unmodular and bound to language- and grammar-specific rules.15 Actually, the situation appears more complex in Gazdar et al 's (1985) most mature version of the framework: in a GPSG, as we define the notion [...], there are no phrase structure rules whatsoever. The class of admissible structural descriptions is determined jointly by a component of rule-like grammatical statements [...] and a set of universal principles governing the way in which they define structures (ibid., pp. 43-4). According to the new organization of GPSG systems, the 'immediate dominance (ID) rules' that replace ordinary phrase structure rules need not specify all features required by the corresponding subtrees, a task which is partly taken over by universal feature instantiation principles of depth = 1. One of these principles, the Foot Feature Principle (FFP) for the percolation of SLASH and similar features, makes it possible to achieve modularity in the GPSG treatment of UDCs. However, the resulting system, although equivalent in generative power to traditional context-free grammars, has lost the property
176 Valerio Allegranza of a biunique correspondence between grammatical representation statements and structure-building instructions which makes such grammars so appealing for processing purposes (cf. Winograd 1983, pp. 327-8). What's more, a direct computational implementation of Gazdar etal 's (1985) framework - preserving, for example, the assumption that feature principles select admissible subtrees among those obtained as extensions of the underspecified ID rules would be practically unfeasible, because of a combinatorial explosion of possibilities to be checked (Busemann & Hauenschild 1988, Hauenschild & Busemann 1988). A way of dealing with this kind of problems is a procedure which converts GPSG grammars into grammars written in a unificationbased formalism [...] which has its own declarative semantics, and [...] a clear constructive interpretation, unlike that used in Gazdar et al, thus making the system more amenable to computational implementation (Shieber 1986). In practice, the procedure will restore in context-free structure building rules those pieces of information - e.g. SLASH specifications - that were taken over by general principles according to Gazdar etal's (1985) framework. Adopting the source and the target formalism as interfaces to the user and the machine respectively, such an approach may well be appropriate for applied NLP systems, modulo an automated off-line compilation of the former formalism into the latter. However, from the viewpoint of theoretical linguistics as we have outlined it in section 1., one would assume that two different grammars to be related by some procedure are required in order to account for a modular/ universal linguistic model and its computationally plausible processing. A general theory of language implying the above assumption would be epistemologically inferior to one that simply assumes a single grammar for both purposes, under the same empirical conditions of descriptive adequacy.16 Interestingly, the empirical predictions of GPSG confirm - rather than compensate - the methodological drawbacks we have discussed in the light of meta-descriptive (and ultimately, explanatory) requirements. There is an empirical adequacy problem with the SLASH treatment as developed in GPSG, which is acknowledged by Gazdar etal (1985) themselves: Suppose we try to derive via feature instantiation a tree with the category PP as the value for SLASH on one daughter, and the category NP as the value for SLASH on another. The FFP will require the mother to have both as the value of SLASH. But this is impossible [...]. NP and PP are distinct, and the feature SLASH can only be assigned one value by a given category (ibid., p. 81). Now, 'multiple UDCs' with a common domain affected by two gaps of unlike category are allowed in Italian (Rizzi 1982):
A new approach to unbounded dependencies 177 (9) Tuo fratello, a cui mi domando che storie abbiano raccontato, era molto preoccupato (Your brother, [to whom] = X I I wonder [which stories] = X2 they told Y2 Yl, was very worried) and even trickier examples are found in Scandinavian languages (cf. Engdahl & Ejerhed 1982, Miller 1991); some cases are marginally possible also in dialects of English (as mentioned by Kaplan & Bresnan 1982, Gazdar et al 1985, Pollard & Sag 1987; 1991). To cover such data while retaining a version of the SLASH approach, one has to increase its complexity by allowing the SLASH feature to take a list of categories as value and increment or decrement it (cf. a similar use of complex set-valued features in Pollard & Sag's 1991 HPSG). As a result one gets a grammatical system inconsistent with one of GPSG's basic assumptions in that the system is not context-free any more, being at least as powerful as an 'indexed grammar' (see, for example, Gazdar 1988, Gazdar & Mellish 1989).17 Along such lines, the SLASH treatment shades off into 'gap threading' techniques, where list-valued features are used to collect and pass in structure-building rules 'gap elements' that are subtracted from the list at corresponding trace nodes. These techniques, implemented e.g. for PATR- or DCG-type formalisms, become increasingly complex for various kinds of unbounded dependencies (Pereira & Shieber 1987) and share the aforementioned drawbacks of unmodularity and unnatural impact on the whole grammar, inherent to any feature passing treatment of UDCs; cf Sadler & Arnold (1992) for similar remarks. To sum up, versions of trace theory that are found in grammatical frameworks popular in present-day computational linguistics appear to be unsatisfactory for our purposes. In our opinion, a better track can be followed by going back to the original trace theory envisaged for the S-structure level of EST/GB. The related account of unbounded dependencies can be schematized by additionally specifying our meta-theoretical representation (4') as in (10) below, where 'i' stands for a numerical index univocally used to co-index in the tree all and only the constituents belonging to the same 'chain' (in a sense that we will clarify soon).
A)^Y 1) of a o = (a/0, a / 1 , . . . a/m^ 1,... a/n>m), where any a/hm fills an A-position. It is this notion of 'UDC chain' that our previous scheme (10) mirrors (see section 2.2). The distinction between null and non-null middles is also reflected in the 'UDC chain' notion. Indeed, the null-middle example (11) above clearly involves just an m = 1 UDC chain (as part of a broader coindexation chain). However, in section 2.1 we came across non-null-middle examples, which can be interpreted now as involving UDC chains where m>l, under the so called 'COMP-to-COMP' or 'successive cyclic' movement hypothesis (cf. Chomsky 1973; 1977). For example, the displaced wh-phrase 'chi' in the Italian sentence (3), section 2.1, is understood in traditional EST/GB terms as moving from its A-position at the D-structure level to the closest complementizer site COMP, and from the latter to the next COMP, and so on, thus yielding the unbounded displacement effect at S-structure as the result of a series of locally bounded movements. As each movement, whether from an A- or a -A-position, leaves a trace behind, a corresponding chain is built, namely a' = a= (Xi/0, ti/l, ti/2, ti/ 3 = m = n) in the example: (12) [chi [provuoi[t che [pro pensino [t che [t dorma]]]]]] i i i i COMP-COMP + subj COMP ** Without resorting to actual movement transformations, the chain can also be achieved by superimposing a UDC scheme on structural descriptions, according to the approach suggested at the end of section 2.2 (see section 4.2 below for details). Of course, an alternative to the COMP-to-COMP analysis is technically possible here, modulo a UDC scheme that differs from (10), section 2.2, only because no ti is envisaged in COMP, while retaining all constraints on the applicability of the scheme to linguistic data. The new approach would make the same empirical predictions as the COMP-to-COMP one, and in fact would be descriptively equivalent to it, yielding simply a notational variant (without traces in COMP) of the same 'disambiguated sentential expressions'. However, we will show that the two approaches (i.e. their respective 'notations') have different explanatory power. The point is important, because the reasons for certain assumptions within the framework of EST/GB may not be fully appreciated if one does not consider that this theory is expected not only to make the correct empirical predictions about linguistic phenomena, but also to
A new approach to unbounded dependencies 181 explain how they follow from a restricted number of universal principles and parameters. We will thus continue our elaboration of the UDC scheme, showing how an analysis that involves traces in COMP is motivated by the explanatory purposes of the theory. Let us consider one of EST/GB's universals, the Subjacency Condition, a version of which is expressed in (13) below by referring to the general characterization of chains we have previously adopted. (13) SUBJACENCY CONDITION: For any subsequence (a/k, where e is an SBAR [COMP, S] construction and Ri + j stands for the target rules Ri and Rj building SBAR and COMP nodes respectively. As a result of applying Ri, the target COMP will be built on the top of the displaced phrase. By simply ignoring in translation rules the local translations of D-structure phrases 'in situ' that are marked for the original co-indexation, the necessary gaps will be created and the S-structure grammar will build its chains in the same way as in analysis. So far we have seen how movements are 'undone' in analysis and 'redone' in synthesis by virtue of a storage mechanism exploiting co-indexations. One could wonder whether some application of this mechanism is also relevant for the D-structure to D-structure mapping, given the 'generalized compositional approach' we have advocated in section 3.2. In order to answer, we have to consider the so-called 'Missing-object (unbounded dependency) constructions' (cf. Gazdar et al 1985), originally treated in transformational grammar as undergoing 'Tough'-Movement (see Lasnik & Uriagereka 1988 for a discussion). The idea of 'Tough'-Movement was to capture the intuitive relation between a. and b. in (33) below by assuming that the former derives from b'., modulo movement of'John' from the (embedded clause) object position to the (matrix clause) subject position.
200
Valerio Allegranza
(33) a. John is tough/easy to please b. It is tough/easy to please John b'. NP[] is tough/easy [PRO to please John] In a principled, restrictive framework such as EST/GB, this transformation is ruled out, because movement between A-positions is explained as a way to assign Case to a phrase lacking it in its original position, whereas the object position is marked for Case by verbs in the active voice.33 More generally, as noted by Pollard & Sag (1991), the fact that the two A-positions at issue in Missing-object UDCs are conflicting Case assigners is an argument against any analysis implying that these positions exchange or share - by movement or unification - a constituent or, at least, its 'root category' features. Compare: (34) a. He(NOM) is tough/easy to please b. It is tough/easy to please him (ACC) Chomsky's (1977) solution (followed, in essence, also by Chomsky 1981) was to assume that the subject of the copular construction is directly generated in that position, its link to the missing-object gap being treated as an anaphorical co-reference relation to an 'empty element' displaced by Wh-Movement from the gap position to the COMP position of the embedded clause: (35) [John is tough/easy [wh-e
J
PRO to please t ] ]
H
COMP ...
V V L2' =>Ll'-->text'
Tn/n'
Tn'/n'-l
i
i
I
I
i
I
Gl' t
T271'
A new approach to unbounded dependencies 203 The EST-GB based instantiation of (38) we are concerned with in the present study is one in which there are only two monolingual levels of representation relevant to a stratificational model of linguistic translation: S-structure and D-structure. (We take for granted a 'lower' level of representation of the text strings in terms of morphemes and ultimately phonemes/graphemes, assuming it will be encoded and related to syntax through specific mechanisms that are beyond the scope of a G-T architecture.) Interestingly, the EST-GB framework lends itself to a formulation conforming to a very strong sort of 'constructivism'. Given some version of Emonds' (1976) 'structure-preserving hypothesis', so as to allow movement only to positions that are independently generated by the phrase structure component, and Chomsky's (1981) 'extended projection principle', according to which the predicate-argument structure projected from the lexicon is invariant across the levels (modulo trace theory), it is possible to assume that a single grammar and a single lexicon as part of this grammar define both S- and D-structure.36 It is up to input data whether surface strings at the beginning of analysis or translation constraints on the output side of translator rules - to determine which structures are actually built by the same grammar at different stages of analysis or synthesis. The intended architecture is as follows: (39)
ANALYSIS , TRANSFER SYNTHESIS G
V V text - - > S => D
I
Ts/d
G'
=>
I
Td/d'
V V D' => S' —>text'
I
Td'/s'
In section 4.3 we will offer a notation for the translator components of this model, along the lines of the approach to (relaxed) compositionality developed in previous sections. Here, drawing on recent frameworks that are much more formally explicit than 'mainstream' GB theory, we want to address the crucial issue of a formalism for specifying the grammatical components presupposed in our class of'translation machines'. A warning is in order: it is not the purpose of this study to develop a full-fledged formalism, nor do we think that the formal system that will be described below is the optimal one for expressing the theoretical approach at issue. We simply aim at showing that the basic aspects of our approach can be formally represented and computed, although we are well aware that the way of formulating the pertinent representations and organizing the system as a whole can be considerably improved in simplicity and modularity by adopting devices that we don't consider here. These
204
Valerio Allegranza
exclusions will help us to keep the presentation relatively short and selfcontained, but additional devices and alternative formulations, whenever relevant, will be cursorily mentioned, usually in notes. In any case, a grammar formalism suitable for our purposes should allow 'a specification [...] intermediate between the abstract principles of GB theory and the particular automatons that may be used for parsing or generation of the language' (Correa 1987). Of course, all the three kinds of system involved (i) the linguistic principles characterizing the language acquisition device' (cf. section 1 above and note 5), (ii) the formal metalanguage to express explicitly competence grammars compatible with (i), and (iii) the algorithms to apply (ii) in performance - are universal in that they are expected to hold for any natural language. However, their methodological status is rather different and the sense of 'universality' that is implied may vary accordingly. We assume that (i) is the ultimate source of 'linguistic universals' as empirical statements determining what may or may not be a natural language. On the other hand, unlike GPSG, the framework we envisage does not require 'formal systems that have putative universals as CONSEQUENCES, as opposed to merely providing a technical vocabulary in terms of which autonomously stipulated universals can be expressed' (Gazdar etal 1985, p. 2). A purely 'logical' view of universality in linguistics (cf. also Montague 1970) does not seem to us a more sensible position than that of disdaining some physical laws because they are expressed in the formal language of a branch of mathematics not entailing them as consequences. In our approach, universal principles (e.g. the Subjacency Condition) are constraints on the possible structure of sentence descriptions, constraints enforced through substantive linguistic assumptions (e.g. the repertoire of bounding nodes) taken together with devices of the formal representation system (e.g. 'structural uncertainty'), in the light of empirical explanatory criteria; cf. section 2.3 above. When possible and convenient, we can also envisage substantive assumptions built into the formalism itself, but this should not obscure their empirical (rather than formal) origin and motivation. We want the grammar formalism to be representation-based, in the sense of 'representation' adopted throughout the chapter (see note 7). That is, a representation encodes some grammatical statement (rule, principle, etc.) by characterizing a class of structural descriptions; such a characterization is simply an under specified version of those descriptions themselves, one that focuses on the aspects they have to share in order to belong to the relevant class. The leading idea that grammatical statements should be encoded by formal constructs of the same kind as the descriptions of specific linguistic data has been advocated by Martin Kay, in a particularly radical form, for 'functional' grammars and descriptions built up through recursive feature structures (cf. Kay 1984, Shieber 1986, Pollard & Sag 1987). One difference, inter alia, is that our approach is restricted to tree-configurational descriptions with features playing
A new approach to unbounded dependencies 205 a role only as components of the 'categories' occurring as tree nodes, and representations are conceived accordingly. More precisely, a category is an unordered set {Fl, F2,... Fn} of n>0 feature specifications meeting certain consistency conditions to be clarified in a moment. A feature specification (or simply, a feature) is an a:v pair, where a is an attribute and v is a value. A value is a constant or a variable, variables being used to share (unknown) type-identical constants between distinct feature tokens. Notationally, we assume that attributes are strings of lower case letters, and constants are either strings of lower case letters or integers; variables are strings of upper case letters (optionally suffixed by £, a marker which will be discussed afterwards). The feature specifications included by a category must be consistent in that each single category cannot include two or more feature specifications having the same attribute but different constant values. Accordingly, the basic operation on categories is simply to 'unify' them (see, for example, Gazdar et al. 1985, Shieber 1986, Gazdar et al. 1988), yielding a category C U C - the 'unification' of the categories C and C' - defined as follows: (i) if a feature specification a:v is in C (conversely, in C') and the same value v or any variable or no value is specified for the attribute a in C' (conversely, in C), a:v is also in C U C';37 (ii) nothing else is in C U C'; (iii) the unification operation fails when a:c is in C and a:c' is in C', where c and c' are any two different constants. Now, a description D is a formal construct of the following kinds: (40) a. C b. C[Dl,D2,...Dn]
forn>0
where C is a category and [Dl, D2,... Dn] is an ordered list of descriptions immediately dominated by C. We draw a distinction here between the depth=0 constructs C and C[ ] in that the former is just a category that may occur as any description node, whether terminal or not, whereas a category dominating the empty list is understood as a terminal node describing a null string, or '[ ]terminal'. The (40) notation corresponds to the labelled bracketing' that is customary in linguistics as an alternative to tree diagrams.38 It is also a particular case of the relatively more complex notation required here for representations. Indeed, a (sub-)tree representation R is a formal construct of the following kinds: (41) a. N b. N[R'l,R'2,...R'n]
fornX)
206
Valerio Allegranza
where N is a node representation and [R'l, R'2,... R'n] is an ordered list of embedded subtree representations R'. On the one hand, a node representation N is a formal construct of the following kinds: (42) a. C b. C(K1, K2,... Kn)
for n>l
where C is a category in the usual sense and (Kl, K2,... Kn) is the 'constraint part' of a node representation, each K being a feature constraint. A feature constraint is an a:w pair, where a is an attribute and w is a value constraint. Value constraints are thus defined: if v is a value, @v (obligatory value) and ~v (negation of value) are value constraints; if cl, c2,... en are n>2 constants, cl/c2/... en (alternation of constants) is a value constraint. A feature specification a:v satisfies a feature constraint a:w with the same attribute iff v is a constant and is in the 'domain' of w. Assuming that the domain function d yields {c} for a constant c or a variable instantiated as c, and the class T of all constants for a non-instantiated variable, d is defined as follows for constraint values: d(@v)-d(v), d(~v)= T-d(v) and d(cl/c2/ ... cn)={cl}+{c2}+ ... {en}. A broader range of constraints would be possible, but these ones are enough, e.g. to call for constant values (number: @X states that some unknown constant value for 'number' must be present), and to express alternatives as such (gendenneut/ fern) or modulo negation of some value (gender:~masc).39 A category satisfies the constraint part of some N when all the feature constraints of the latter are satisfied by feature specifications of the former. On the other hand, an embedded subtree representation R' is a formal construct of the following kinds: (43) a. b. c. d. e.
R N1/N2/... Nn ON O'R #1N[R'1, R'2,... R'j-1, P, R'j+1,... R'j+m]
for n>2 for R — N[] for j>l and m^O
that is, either a representation R as defined in (41), or the result of generalizing over representations by using various operators. We allow the alternation of node representations, the use of prefix operators of two classes O and O' to be discussed soon, and the #1 marker to introduce a path relevant to 'structural uncertainty'. A P(-representation) involved by such a path is a formal construct of the following kinds: (44) a. N[R'l, R'2,... R'j-1, P, R'j+1,... R'j+m] b. #2R
for \>l and m>0
A new approach to unbounded dependencies 207 To terminate the path, (44)b. requires that a representation R (in the sense of (41) above) be prefixed by a #2 marker. In other words, # 1 and #2 (the 'recursion markers' introduced by Allegranza 1988 and further discussed by Allegranza & Bech 1989; 1991) are defined as a pair, providing a simple notation to express the 'bracketing in depth' of a structural unit specified as 'uncertain' in some representation. This means that the two schemata below are understood as perfectly equivalent, the former being directly evocative of the intended interpretation, but notationally more cumbersome; the latter is equally interpreted according to the chart in Fig. A, section 2.1, by equating the # l - # 2 'bracketing' to the 2* one.
A A> A
A
A A
Structural uncertainty is just a special case of the general property of a linguistic formalism to permit the expression of a (possibly infinite) class of alternatives for a structural component of a representation. For example, the context-free backbone of LFG allows daughter nodes in a rule to be marked for alternation, optionality, and Kleene closure (see Kaplan & Bresnan 1982). In our notation, we use the infix operator / for alternation, while optionality and Kleene closure are notated by the prefixes ? and *, respectively; these and other devices yield 'regular expressions' over nodes (cf. (43)b.-c.) or subtrees (cf. (43)d.).40 Whereas the use of alternation and optionality is obvious, it is worth noting that the 'Kleene star' operator makes it possible to simplify the representation of structures in which an unknown number of constituents of some type may occur. A clear example is given by pre- and post-modifiers: e.g. NBAR (or another equivalent N projection) can be preceded in English by an indefinite number of adjectival phrases, notated as *{cat:adjp}; likewise, there can be any number of prepositional or adverbial phrases or clauses following a VP (or another equivalent V projection) and these can be captured by *{}(cat:pp/advp/ sbar) as a first approximation. It is true that we could build such structures by recursive application of some NBAR -* ... NBAR or VP -• VP ... rule introducing a single modifier in turn, but the result would be the production of structures of impredictable depth which could be subsequently checked only by making more complex and pervasive the use of structural uncertainty in general schemata. For example, one would need # l - # 2 representations of
208 Valerio Allegranza
modification constructions embedded into the # l-#2 representation of a UDC middle, and so on. To the extent that a linear Kleene star representation is simpler, it must be preferred by the 'principle of minimal effort' - the structural uncertainty apparatus being reserved for phenomena such that a linear representation is impossible. An O-class operator not commonly found in other formalisms (but cf. Bech& Nygaard 1988 for a precedent) is ! as a marker for 'virtual alternation'. We define this notion as follows: !N is interpreted as an alternation between Nl and N2[], where Nl-N and N2 is like Nl but with category C2= CI U Ce. Ce is a category consisting of all and only the features that characterize a phonetically and lexically empty node as such. (Rather than specify Ce as built into the formalism, we assume it as a parameter to be set according to the grammar under consideration.) As the discussion of some examples will show in section 4.2, to use virtual alternation instead of directly representing a []-terminal allows significant generalizations in the treatment of 'empty elements'. Indeed such elements generally occur as alternatives to a corresponding overt constituent; see, for example, empty pronouns (pro and PRO) with the same distribution of overt ones, or empty COMP positions that could have beenfilledby an overt complementizer, etc. Among the O'-class operators that (43)d. allows, we envisage 'structural macros', i.e. shorthand devices that corresponding definitions expand to more complex pieces of representation. A structural macro operator is a string of lower case letters followed by a sequence (VI, V2,... Vn) of n>0 variables, and its definition comes down to an equality whose left-hand side or 'definiendum' is the operator itself, possibly followed by {). In unexpanded representations, the expression {) (the maximally underspecified category, cf. Gazdar et al 1985, p. 27) will occur as a terminal node prefixed by the operator, whereas macros not restricted to {} may prefix any R. The 'definiens' of a {}-type macro is any formal construct of the R' kind, and the other structural macro definitions just differ because of an instance of the place-holder $ used as a representation N at one terminal node position in the R' (see section 4.2 below for an example). Furthermore, for any macro definition to be well-formed, all the variables occurring in the definiens must be tokens of those listed by the macro, so as to preserve the indication of relevant variable sharing also in unexpanded representations. The result of expanding a macro must be a well-formed R in which the definiens replaces the definiendum; in case of $-type macros, the definiens in its turn embeds at the $ position the subtree representation originally prefixed by the macro. There is another kind of macros we allow, viz. 'feature macros' (cf. Gazdar & Mellish 1989), but in our notation they don't require any special operator, being just single features that a definition expands to n>2 other features. So far we have only considered a general representation system appropriate
A new approach to unbounded dependencies 209 for matching arbitrarily complex descriptions, but we need also some structure building component to produce recursively such complex decriptions from their constituents. Like in most grammatical frameworks (one exception being TAGs, cf. note 8), we expect the linguistic knowledge required for structure building to be encoded by (some equivalent of) context-free phrase structure rules. Therefore, in our approach, a corresponding class of representations must be distinguished as a particular case of the general definition of R given in (41) above. A 'context-free (rule) representation' (CFR) is a formal construct of the following kind: (46) C[N'l, N'2,... N'n]
for n>l
where C is a category and [N'l, N'2,... N'n] is an ordered list of embedded node representations such that each N' is a formal construct of either the (42) or the (43)b.-c. kinds (i.e. a 'regular expression' over nodes). Actually, a CFR definition can say more than that, imposing further requirements on the components of (46) in order to build into the CFR formalism some version of the 'X-bar syntax' (Chomsky 1970, Jackendoff 1977). This entails that the X-bar information be integrated into the feature system and that the 'head' projecting relevant features up to the mother node be distinguished from the other daughters in a rule (cf. Gazdar et al 1985, pp. 50-52). In the following presentation we will simply presuppose that head feature percolations follow from some X-bar definition of CFRs, while keeping for convenience the traditional 'phrasal categories' as macros to be defined by appropriate feature specifications (e.g., cat:np = n:yes, v:no, bar:2). Now, it is obvious that a linguistic formalism involves not only wellformedness conditions, but also an interpretation to specify how the wellformed formulae can be applied to data. We have implicitly assumed throughout this section that the data relevant to S-structure and D-structure are always given in the form of descriptions (in our sense); at the beginning of analysis, some string of morphologically marked lexical entry descriptions of the (40)a. kind, associated with the actual text by mechanisms we neglect here. Therefore, the question is how to apply representations to input descriptions, at each given stage of processing. In the case of occurrence of'recursion-marked paths', 'regular expressions', and more generally, structural components of a representation that are marked by operators not allowed in resulting descriptions, the R or CFR at issue is actually a 'representation-generator', whose application to data is mediated by the delivery of derived representations that such a generator defines. Thus, we can simply neglect the relevant operators/ markers while presenting a data-matching operation, because what is ultimately applied to some input description is always a 'bare' representation without any of them (formally speaking, an R involving R's of the (43)a. kind only, or a CFR involving N's of the (42) kind only).41
210
Valerio Allegranza
We can define the following notion of 'matching' for nodes: a node representation N and a category C in a given description match iff this C unifies with the category of N and the result of the unification satisfies the constraint part of N (if any). For example, a node representation {person: lst}(gender:neut/fem) and a category {gender.fem, catnp} match in that the unification of the latter with {person: 1st} yields the new category {genderfem, catnp, person: 1st), which satisfies (gender:neut/fem). Matching can be recursively extended to (sub-)trees of depths 1 in the case of general representations of the R kind. Consider a (sub-)tree representation N[R1, R2, ... Rn], for n>l, and a (sub-) tree description C[D!kl!, D!k2!, ... Dlkm!], where !ki! and !ki+l!, for any 0(argl) to the young man (arg2) * The book (arg((>) was given the young man (arg2) (argl) * It (argc))) was given the young man (arg2) the book (argl)
Since passive forms are not case assigners, the examples c. and d. are ruled out by the lexical constraint case:@K which requires that an overt NP-arg2 adjacent to the verb according to our CFR (49) be assigned case. The other NP position is inherently marked for accusative case (cf. Chomsky 1981, pp. 170— 71, on 'structural' and 'inherent' case marking). As for the dative PP, it is not assigned case in English, but the correctness of a given phrase occurring in that position can be checked through the 'pform' feature, assuming the prepositional form is percolated up to the top of each PP by the CFR that builds it (cf. Gazdar etal 1985, p. 132). Now, a (non-transformational) treatment of movements should complement subcategorization frames in order to account for dependents of a given head that do not occur in their real 6-assignment position. We have already mentioned that passive constructions involve NP-Movement (see section 2.3 and note 18). Other bounded movements are possible, like the 'Heavy NP Shift' that in English clauses moves some NP argument to the right if it is considerably longer than other arguments or modifiers (cf. note 39). In the approach envisaged here (cf. Sells 1985, p. 55), the displaced 'heavy NP' is found at the bar-level where modifiers of the verb(-projections) occur, and is co-indexed to a tracefillingthe gap in argument position: (51) [John [gave t to Ellen] for her birthday S VP i [an old book he had bought in the bookstore on 17th Street] ] NPi Along similar lines, a treatment of variations in word-order can be developed for English and other languages.42 The basic difference from the UDC case is that bounded movements do not involve structural uncertainty in coindexation schemata applied to match a description so as to link the moved
214
Valerio Allegranza
phrase to the gap. In both cases, the approach to subcategorization sketched above entails that the co-indexation schemata themselves are responsible for the introduction of the trace. This is not the only possible approach, of course. In other treatments of movement, the subcategorization frame could be made the triggering factor of gap-filling operations that would introduce traces just on the basis of unfilled frame slots. However, many of the structures with traces thus created should be rejected at some later stage of processing, in view of the requirement that a displaced phrase has to be found elsewhere in the sentence and correctly linked to the trace. Indeed, a frame slot corresponding to some optional argument may be unfilled not only because of some movement, but also because the phrase is simply lacking in the sentence under consideration. In order to know that the first possibility is the case, we have to wait until an appropriate phrase is found in displacement position. Displaced modifiers raise the same problem, with additional complications, as the number of such phrases in a dependency construction is not predicted by the subcategorization frame. Finally, the range of alternative hypotheses about possible gaps to be taken into account grows, if one does not adopt the simplifying assumption that no more than one gap due to movement can be foreseen for each subcategorization domain. As we have discussed in sections 2.2-2.3, 'multiple UDCs' that contradict such an assumption are possible in various languages. Thus, a treatment of unbounded dependencies that selects the candidate bottom substructures through purely 'local' criteria entails an overgeneration of alternatives to be ruled out during monostratal processing, or by mapping to another representation level, or similar.43 The burden of this overgenerationand-filtering process is often underestimated, in our opinion. At the same time, the 'local' approach to traces fails to exploit linguistic properties of UDCs which could be taken into account in order to reduce the indeterminacy of processing (i.e. the number of wrong alternatives to be tested) according to strategies more directly compatible with realistic models of the speaker/ hearer's performance (see, for example, De Vincenzi 1991). Under the plausible assumption that human beings must avoid useless overgeneration in order to process sentences so efficiently as they do, a straightforward alternative to the overgenerate-and-filter-out approach to UDCs seems to be a treatment that delays the prediction of uncertain gaps until an actual top sub-structure is built. Indeed, the identification of top sub-structures is generally easier than the identification of bottom ones, as a consequence of the configurationally detectable occurrence of a constituent in displaced position.44 A treatment that takes into account this fact operates by looking for a gap only when a displacement is really at stake, and fills the gap and builds the correct link to the displaced constituent simultaneously. The result is achieved in the present framework by virtue of a unification-based formalism whose repre-
A new approach to unbounded dependencies 215
sentation schemata are flexible enough to encode linguistic conditions over unbounded distances, without resorting to procedural facilities depending on the architecture of the parsing automaton as such (cf. Woods' 1970 HOLD mechanism for ATNs or the 'buffer' found in Marcus' 1980 deterministic parser). Our purpose now is to exemplify how UDC schemata that lend themselves to straightforward processing applications, along the lines sketched above, can be encoded in the representation format available in this chapter (but see also Allegranza & Bech 1989; 1991 for a variant version of essentially the same approach). In order to generalize over common components of the representations we introduce a 'structural macro' and some 'feature macros'. The former is an operator corresponding to the UDC middle-unit representation (16) sketched in section 2.3, with an additional feature 'that-trace' which we will motivate soon: (52) middle = #l{cat:s, wh-scope:I, that-trace:no, that-trace:T) {}(cat:~np, cat:~s) {catsbar} [{catcomp} [!{gap-cat:C, wh-gap:G, coindex:I), ?{cat:comp} ]>
#2$
L ] ]
The feature macros allow us to compactly express agreement features as well as others - case, 0-role and (possibly null) prepositional form - that are relevant for sharing between traces and displaced phrases:45 (53) a. agr:A = numbenAN, person:AP b. gap:G = agr:GA, role:GR, pform:GF c. wh-gap:W = gap:WG, case:WK We make also the following assumptions concerning features to be automatically induced by the grammar system:
216
Valerio Allegranza
(a) the parameter Ce for the []-terminal case of the Virtual alternation' (cf. section 4.1) is set as {empty:yes, gap-cat:X, cat:X}; (b) an indexing function assigns a different integer value to every nodal category in a description, so as to make possible for any node - and the constituent possibly rooted in it - to be univocally referenced by sharing its index value through a co-indexation feature; (c) whenever the feature 'type' is not otherwise specified in a category acting as a lexical entry or the root node of a CFR, 'typefulP is assigned by default therein. Finally, in Fig. B hereafter we offer two tentative UDC schemata for a fragment of English, expressing i) co-indexation for 'internal arguments' (possibly involving preposition stranding) and ii) co-indexation for pre-verbal subjects. (For convenience, we neglect the modifiers, which could be treated essentially along the same lines as far as the UDC machinery is concerned, but would require more sophisticated 0-role assignment devices.) The two - or more UDC schemata are expected to apply as alternatives in a grammar, modulo creation of different copies of the original input description, one for each scheme to be applied; only copies that have been successfully matched (and consequently enriched) by a corresponding scheme survive (and replace the original). Concerning the displaced element, the two schemata clearly differ because ii) is restricted to NPs whereas i) allows an alternation between NPs and PPs. In both cases, via 'wh-gap', the 9-role assigned to the trace in A-position is transmitted to the whole chain and ultimately to the displaced phrase. Neglect for the moment the possibility that this constituent be empty (possibility witnessed by the use of'.-marking for the corresponding node representation). We assume that the 'role' feature on it is assigned a ^-variable by the CFR building COMP, thus enforcing the well-formedness condition that the feature must be instantiated by a constant before the end of the monostratal processing. Provided that the sharing of a 0-role with a trace is the only way to perform such an instantiation according to the grammar, the displaced phrase not linked to any trace in the ungrammatical examples ' Who did Mary kiss John?' (no gap) and ' Who are you reading a book that criticizes?' (unacceptable island violation) would determine ill-formedness of the final, global description of the sentence. As mentioned while discussing subcategorization, ^-variables are used in a similar fashion for unfilled obligatory frame slots: thus, to ensure that a trace can satisfy the obligatoriness requirement of a corresponding slot on the lexical head, in the case of scheme i) a feature value on the !-marked node is transmitted to the slot feature that is relevant for the interpretation under consideration. (Kg.to 'slot2', when the alternation on the lexical head yields a version in which a 6-role is assigned from 'arg2'.) This value transmission is also exploited
A new approach to unbounded dependencies 217 i) co-indexation -for internal arguments {cat:sbar,that-tracesno) C (type:@F) C ! {cat: C, wh-gaps G,coinde>:: I> (cats np/pp) ,?{cats comp.;- J , middle{cats s, wh-scopes I> C*{>, O(cat:vp/adjp) C (argl :R,slot.t :£,easel :K,p-formlsP>/ £arg2:R,slot2sE,case2:K,p-form2:P>/ {arg3s R,slot3s E,case3: K,p-form3:P> , ?{cat:np> , ?{catspp,cat:X>, !{cat:X,gap—cat:C,wh—gap:G,coindex : I, role:R, types E,empty: E,case:K,p-form:P> , * 0 (cat:pp/sbar) 3, *{> 1 2
ii) co-indexation -for pre-verbal subjects {cat:sbar,that~trace:T> C-Ccat:comp> (types @F) L!(cat:np,wh-gap:G,coindexsI>,?{cat:comp>1, middie-Ccat: s, wh-scopes I ,comps T> {cat:np,wh-gap:G,coindex s T,typesempty>, *{>
3 3
Fig. B - Sample UDC schemata to ensure the rejection of !-marked nodes in candidate bottom subconstructions already displaying an argument captured locally by some CFR, which has determined the instantiation of the relevant slot feature to a constant value for 'cat' (cf. (47), (48) and (49) above). Indeed the constant values transmitted by the UDC scheme are different, coming from 'type' or 'empty', instead of 'cat'. Let us consider in more detail the various kinds of cases that are possible according to the 'virtual alternation' of the .'-marked node (N hereafter) and the input data for a given position: (a) N as a []-terminal marked for emptyryes (a.l) yields a failure when the position is not a gap, or when the 0-role to be received corresponds to an already filled slot (which creates a conflict with empty:yes), but
218
Valerio Allegranza
(a.2) may yield a successful result when the position is a gap and the relevant slot is unfilled; (b) N as a node to be matched, not marked for empty:yes, (b.l) yields a failure when the position is a gap, or is occupied by a constituent marked typefull (which creates a conflict with the corresponding slot feature), but (b.2) may yield a successful result when the position is occupied by a constituent unmarked for 'type' and 'empty'. The potentially successful cases, (a.2) and (b.2), are subject to further conditions, checking pertinent features and selecting the correct position(s) for a node that is expected to receive the given 0-role. In essence, the bottom substructure in scheme (i) states that within a subcategorization domain for internal arguments, a trace can never occur on the left of NPs and on the right of clauses; moreover, the trace must be a PP if a PP occurs on its left. These restrictions for English reflect a canonical 'linear precedence statement' NP < PP < SBAR (see note 40 and Gazdar et al 1985, p. 110), as well as the constraint that Wh-Movement cannot extract the NP-arg2 in dative constructions (cf. Who did you give a book to t? vs. * Who did you give t the book?). Whereas the introduction of a trace according to case(a.2) should be clear by now, case (b.2) requires some additional provisos. For the fragment of English we are concerned with here we want (b.2) to be restricted to preposition stranding constructions, so that the node already filling the position matched by the relevant version of the !-marked node must be a PP immediately dominating a trace. A stranded preposition certifies the presence of a gap as well as its category (only NPs being extracted from PPs, cf. Koster 1986, p. 164), and therefore these constructions do not give rise to the problems of indeterminacy that in general prevent us from positing traces on purely 'local' grounds. We assume that a CFR for building PPs will introduce a trace when no overt object of the preposition is captured, and will override the default assignment of'type' by specifying the percolation of the corresponding information from the embedded constituent up to the PP root. In this way, a full embedded constituent will transmit 'type:full' upwards, yielding a (b.l) kind of failure in case of attempted co-indexation of the PP. On the other hand, an embedded trace, being unmarked for 'type', will allow the co-indexation, and inherit both the relevant index and the 'gap-cat' and 'wh-gap' features, under the assumption that they be percolated too.46 To sum up, the requisite CFR is roughly as follows: (54) {cat:pp, type:X, coindex:I, gap-cat:C, wh-gap:G,...} [{cat:p,...}, !{type:X, coindex:!, gap-cat:C, wh-gap:G,...}]
A new approach to unbounded dependencies 219 The pre-verbal subject corresponds to another configurationally and categorially univocal argument position which is plausible to fill without waiting for the application of some co-indexation scheme. Actually, this is the most straightforward way to account for empty pronouns as an alternative to full subjects, given the possibility of resorting to 'virtual alternation' in CFRs too. The CFR building an S node will also express subject-predicate agreement and perform case assignment and 'arg(|>' value transmission to the subject, whether full or empty. (Note that the use of variables allows the same CFR to cover null-value instances of 'arg(j>', due to passive or raising verbs.) Thus, all these operations need not be repeated in the co-indexation scheme (ii), which simply ensures feature sharing between a displaced phrase and an empty subject already present at the bottom. Moreover, in Passive or Raising constructions such that the subject target of 'NP-Movement' is further displaced by another movement (see, for example, (11), section 2.3), an NP-Movement scheme will establish a co-indexation between two empty elements, an empty subject and the NP-trace; the former can be interpreted as a wh-trace in its turn by appropriately applying the UDC scheme (ii) to it. On the other hand, whereas a 'pro-drop' language like Italian will just interpret as 'pro' the finiteclause empty subjects that are not co-indexed as some sort of trace, English requires that the final description of a sentence including any of them be ruled out. As usual, the result can be achieved through a ^-variable, assigned to a pertinent feature on the subject node (e.g. 'type', which is instantiated as 'empty' in scheme (ii), or as 'full' by a full subject). We want to conclude this discussion of English UDC-related phenomena by addressing the issue of the empty COMP positions. In the first place, we will account for the so-called 'that-trace effect' - i.e., 'when that is immediately followed by a trace, ungrammaticality results, hence the descriptive name of the phenomenon' (Lasnik & Uriagereka 1988, p. 94). Compare: (55) a. Who do you think (that) John saw t? b. Who do you think (*that) t saw Bill? If we look at the UDC middle representation in (52), we see that a non-null middle yields the instantiation of a feature t/ia£-trace:no, whose value is transmitted to the top and the bottom of the UDC only by the co-indexation scheme (ii). Now, we assume that a feature 'comp:that' on a COMP node indicates the presence of a corresponding overt complementizer and is transmitted to the adjacent S node in the CFR joining the two nodes, whereas 'comp' is not instantiated when the COMP position is empty. Given the sharing of variable between 'comp' and '£/iaMrace' in scheme (ii), a conflict of values rejects the extraction of a subject from a bottom introduced by 'that' if the middle is nonnull (comprthat vs that-trace:no). The internal arguments are excluded from the
220
Valerio Allegranza
ihat-xxdizz effect simply because they are treated by scheme (i), which does not enforce variable sharing between the relevant features.47 Thus, empty COMP positions are allowed - or even called for, in case of subject extraction - adjacently to the bottom of non-null middle UDCs. We still have to account for empty COMP positions at the top of a UDC, though. At first sight, our Fig. B co-indexation schemata might look inconsistent in this connection: on the one hand, they can potentially fill an empty COMP position at the top (the displaced phrase being !-marked), but on the other, they seem to require an overt displaced constituent because of the constraint type:@F. Indeed, such a constraint (which is not automatically satisfied for []-terminals) avoids that the two schemata overgenerate wrong UDC chains triggered by any empty COMP position that happens to c-command some gap accessible to a coindexation relation. However, the possibility of interpreting an empty COMP position as filled by an empty wh-element can be restored by embedding the schemata into higher-structure ones, to match some surrounding construction univocally indicating the presence of a UDC. For example, the higher structure in Relativization or It-Clefting displays a modified noun(-projection) and a clefted phrase, respectively: (56) a. b. (57) a. b.
The politician [who Sandy loves The politician [Sandy loves ] It's Kim [who Sandy loves ] It's Kim [Sandy loves ]
]
This two-step approach is reminiscent of the distinction between 'strong' and 'weak' UDCs proposed by Pollard & Sag (1991) in that we acknowledge that an unmoved antecedent must be crucially referred to in the treatment of the latter (e.g., the b. cases in (56)—(57) above). However, as discussed already in section 3.3 for the treatment of 'Missing object UDCs' (an example of constructions that allow the 'weak' version only), we follow Chomsky's (1977) idea that a generalization over seemingly heterogeneous UDC types can be achieved by positing an empty wh-element that mediates the relation between an anaphorical antecedent and a gap-filler trace. Along these lines, the sub-relation going from the empty wh-element to the trace (i.e. the real unbounded dependency) can be simply treated by the same scheme that would apply to a corresponding 'strong UDC. In the present framework, the result is achieved by invoking such a scheme - one of those in Fig. B - through a {}-type macro used in a higher-structure representation to be matched by the 'weak-UDC antecedent', whether a modified noun(-projection), a clefted phrase, a 'though''-type adjective, or similar48. The antecedent will transmit appropriate feature values to the COMP position found at the top of the embedded scheme, including a constant value for 'type', so as to satisfy
A new approach to unbounded dependencies 221 the constraint type:@F and allow a successful interpretation of the corresponding !-marked node qua empty (i.e. []-terminal) wh-element. On the other hand, the embedding into an appropriately specified 'higher structure' will induce restrictions that hold for weak UDCs only. This involves drawing from a 'though'-type adjective the specification that the case of a missing-object gap is always accusative, or rejecting a relative clause whose 'that-trace' feature on the root node is not instantiated to a constant (which is what happens in case of subject extraction in a null-middle UDC not introduced by overt complementizer or wh-phrase, cf. *NP[The man [ saw Bill]]). 4.3 A 'translator9 notation In the present framework, a 'translator' component is a set of translation equations of the kinds specified by a 'relaxed compositionality definition' like the one previously discussed. The translator notation we will sketch here is just a compact way to state (bilingual or monolingual) inter-level correspondences that are expected to satisfy the equations. The statements take the form of explicit 'translation (henceforth, t-)rules', but we also assume that some of the equations are built into the system and simply activated by parameter-setting. One relevant parameter specifies the index-valued attribute to identify chains (e.g. 'coindex' in section 4.2); different chains are distinguished according to the constant value their elements have received for that attribute, and this makes possible for an appropriate algorithm to build and use the 'chain store' in the way required by (29), section 3.2, an equation which we consider built into the system. (See also section 3.3 for other examples.) There are three kinds of explicit t-rules: lexical t-rules, feature t-rules, and structural t-rules. All t-rules are notated as LHS — RHS correspondences, where an arrow roughly meaning '... translates as ...' links a left-hand side (LHS) and a right-hand side (RHS) specifying source-level and target-level conditions respectively. For example, a lexical t-rule is a formal construct of the following kind: (58) L - LI, L2,... Ln
for n>0
where L and any Li, for Kil cases allow null translation and multi-word translation, respectively. In (58) and the other kinds of t-rules, variable sharing between LHS and RHS is allowed, so as to enforce translational constraints by transmitting relevant feature values from source to target. This is especially the task of feature t-rules, that is, one-to-one correspondences between single features or categories that must be unified with corresponding source and target nodes
222
Valerio Allegranza
as the translation procedure is in progress; {role:X} -+ {role:X} is an example relevant to the discussion in section 3.1. Structural t-rules are more complex. The LHS is a representation of the CFR kind - see (46), section 4.1, for the well-formedness conditions - which can be replaced by its rule-identifier when the intended representation is a CFR actually used as structure-building rule by the source grammar. On the other hand, the RHS of a structural t-rule is basically a formal construct of the following kind: (59) RID0
Here RID is a name for a set of target CFR-identifiers (in the extreme cases, a name for the empty set or for the target grammar CFR component as a whole), and any !ki!, Kim is the number of 'embedded node representations' occurring as daughters in the CFR displayed or named on the LHS of the t-rule.49 Intuitively the interpretation of this notation is that the target structure-building rules identified through RID must be applied in all possible ways (according to the target grammar) to a sequence of component translations, which are identified through positive integers. Such numbers can be also called 'position indices' since they indicate the position of corresponding 'embedded node representations' in the LHS CFR, assuming these are implicitly numbered from 1 to n in a left-to-right fashion; since any order of the m0
where C and any Ci, for l