203 51 6MB
English Pages 275 [276] Year 1992
Functional Grammar in Prolog
Natural Language Processing 2
Editorial Board Hans-Jürgen Eikmeyer Maurice Gross Walther von Hahn James Kilbury Bente Maegaard Dieter Metzing Makoto Nagao Helmut Schnelle Petr Sgall Harold Somers Hans Uszkoreit Antonio Zampolli
Managing Editor Annely Rothkegel
Mouton de Gruyter Berlin · New York
Functional Grammar in Prolog An Integrated Implementation for English, French, and Dutch
by
Simon C. Dik
Mouton de Gruyter Berlin · New York 1992
Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter & Co., Berlin.
© Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability.
Library of Congress Cataloging in Publication Data Dik, S. C. (Simon C.) Functional grammar in Prolog : an integrated implementation for English, French, and Dutch / by Simon C. Dik. p. cm. — (Natural language processing ; 2) Includes bibliographical references and index. ISBN 3-11-012979-5 1. Functionalism (Linguistics) — Data processing. 2. Prolog (Computer program language) 3. Languages, Modern — Data processing. I. Title. II. Series. P147.D525 1992 418'.02'0285635 - dc20 91-45020 CIP
Deutsche Bibliothek Cataloging in Publication Data Dik, Simon C.: Functional grammar in prolog / Simon C. Dik. — Berlin ; New York : Mouton de Gruyter, 1992 ISBN 3-11-012979-5
© Copyright 1992 by Walter de Gruyter & Co., D-1000 Berlin 30. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Printing: Werner Hildebrand, Berlin. — Binding: Dieter Mikolai, Berlin. Printed in Germany
Foreword
This book gives a detailed description of a computer program called ProfGlot, written in Prolog and using the theory of Functional Grammar in the version described in Dik (1989f). ProfGlot is an integrated system in two senses: it can deal with three different languages in terms of very similar structures and procedures; and it not only has the capacity of producing linguistic expressions in these languages, but also of parsing them, of translating in all six directions between the three languages, and of drawing certain logical inferences on the basis of given linguistic expressions. ProfGlot thus simulates some essential components of the linguistic competence of a trilingual speaker. The actual computer programs described in this book are distributed by Amsterdam Linguistic Software, P.O. Box 3602, 1001 AK Amsterdam (email [email protected]). ALS also distribute an elementary computational Prolog course for linguists (Dik and Kahrel 1991). I should like to thank the participants in the research program FG*C*M*NLU (Functional Grammar Computational Model of the Natural Language User, see Connolly and Dik 1989) for the inspiration derived from our discussions on computational Functional Grammar, and Dik Bakker, Helma Dik, Kwee Tjoe Liong, Chris Mellish, Tilly Ruitenberg, Hans Weigand, and Remmelt van Wetter for helpful suggestions on particular points of programming and on the linguistic analyses underlying ProfGlot. Peter Kahrel contributed some important improvements in the formulation of certain procedures. Finally, I am grateful to Inge Genee, Sabine Rummens, and Peter Kahrel for helping me in producing the final camera-ready text, and to Dick Roozendaal for help with the Figures.
Holysloot, August 1991
Simon C. Dik
Table of Contents Foreword
vi
1. Introducing ProfGlot
1
1.0. 1.1. 1.2. 1.3.
Introduction Some practical points Previous work on computational Functional Grammar The structure of this book
2. Some elements of Prolog 2.0. 2.1. 2.2. 2.3. 2.4. 2.5. 2.6. 2.7.
Introduction Declarative and procedural programming Facts, rules, and questions Prolog lists Recursive definitions Looking into Prolog atoms Modifying a program during execution Some built-in features of LPA-Prolog
3. Introducing Functional Grammar
1 2 3 4 5 5 5 5 9 10 13 15 15 19
3.0.
Introduction
19
3.1.
Outline of the Functional Grammar generator
19
4. Overall structure of the ProfGlot program 4.0. 4.1. 4.2. 4.3.
Introduction Structure of the program The separate modules Some special features 4.3.1. Linguistic complexities 4.3.2. Avoiding infinite recursion 4.3.3. Paradigms 5. BasFac: basic facilities 5.0. 5.1.
Introduction The module BasFac 5.1.1. Settings 5.1.2. General relations and operations 5.1.3. Outputting a sentence
31 31 31 34 34 34 36 37 39 39 39 39 41 43
viii Functional Grammar in Prolog
6. EngLex: the English lexicon 6.0. 6.1.
Introduction The module EngLex 6.1.1. Basic predicate frames 6.1.2. Lexical satellites 6.1.3. Meaning postulates 6.1.4. Paradigms
7. FreLex: the French lexicon 7.0. 7.1.
Introduction The module FreLex 7.1.1. Basic predicate frames 7.1.2. Lexical satellites 7.1.3. Meaning postulates 7.1.4. Paradigms
8. DutLex: the Dutch lexicon 8.0. 8.1.
Introduction The module DutLex 8.1.1. Basic predicate frames 8.1.2. Lexical satellites 8.1.3. Meaning postulates 8.1.4. Paradigms
9. UniGen: the universal generator 9.0. 9.1.
Introduction The module UniGen 9.1.1. Extending predicates by redundancy rules 9.1.2. Choosing an arbitrary predicate 9.1.3. Predicate formation 9.1.3.1. The status of predicate formation 9.1.3.2. Predicate formation rules 9.1.4. The core predication schema 9.1.5. The extended predication schema 9.1.6. Creating the extended predication 9.1.7. Terms and term formation 9.1.7.1. Basic terms 9.1.7.2. Derived terms
45 45 45 45 49 51 52 57 57 57 57 59 59 60 63 63 63 63 66 66 66 69 69 69 69 71 72 72 73 80 82 85 85 86 88
Table of contents
ix
9.1.8. 9.1.9. 9.1.10. 9.1.11. 9.1.12.
9.2.
Operations on term positions 91 Building the extended predication into the proposition94 Creating predicational and propositional terms . . . . 96 Building up the clause structure 98 Creating the fully specified clause 99 9.1.12.1. Subject and Object assignment 101 9.1.12.2. Verb agreement 103 9.1.12.3. Anaphora resolution 104 9.1.12.4. Marking reflexive arguments 105 9.1.12.5. Marking equi arguments 106 9.1.12.6. Copula support 107 Conclusion 108
10. ParSel, UniExp and EngExp: universal and English expression . . . . 109 10.0. 10.1.
Introduction 109 The expression rules Ill 10.1.1. Expressing terms Ill 10.1.2. Full term expression 117 10.1.3. Expressing embedded propositions and predications 119 10.1.4. Expressing satellite terms 121 10.1.5. Expressing the verbal complex 124 10.1.6. Formally expressing the clause 130 10.1.7. Placement rules 131 10.1.8. Full clause expression 143 10.1.9. Go! 144 10.1.10. Pseudo-phonology 144
11. FreExp: French expression
147
11.0.
Introduction
147
11.1.
The module FreExp
147
12. DutExp: Dutch expression
155
12.0.
Introduction
155
12.1.
The module DutExp
155
13. UniPar: universal parser 13.0. 13.1.
Introduction The module UniPar
169 169 177
X Functional Grammar in Prolog
13.2.
13.1.1. Introduction 13.1.2. Finding and interpreting terms 13.1.3. Finding and interpreting satellites 13.1.4. Finding and interpreting predicate complexes 13.1.5. Integrating underlying structures 13.1.6. Matching terms with argument positions 13.1.7. Returning sentences Some performance features
14. UniTra: a trilingual translator 14.0. 14.1.
Introduction The module UniTra 14.1.1. The translator 14.1.2. The adjustments 14.1.3. The equivalences
15. UniLog: universal logic 15.0. 15.1.
15.2.
177 177 183 184 189 196 200 201 205 205 209 209 216 219 221
Introduction 221 The module UniLog 223 15.1.1. English basic logical relations 223 15.1.2. French basic logical relations 224 15.1.3. Dutch basic logical relations 225 15.1.4. Universal logical relations and operations 226 15.1.4.1. Logical properties of and relations between predicates 226 15.1.4.2. Entailments between propositions . . 227 15.1.4.3. Properties of and relations between full clause structures 231 15.1.5. Output mechanisms 231 15.1.6. Knowledge base management 232 Conclusion 234
Notes
235
References
237
Index and brief explanation of Prolog predicates
243
Index of names and subjects
261
Chapter 1. Introducing ProfGlot
1.0. Introduction ProfGlot simulates some aspects of the linguistic competence of a trilingual English-French-Dutch speaker. It has the following capacities: — ProfGlot can generate a great variety of grammatical construction types in the three languages; — ProfGlot can analyse or parse an extension of a subset of the constructions which it can generate, again in the three languages;1 — ProfGlot can translate the construction types generated in all six directions between the three languages; — ProfGlot can perform a number of logical operations on the constructions which it can generate (e.g. it can paraphrase these constructions and otherwise infer a number of things from them); — ProfGlot can also combine these operations: e.g., it can parse an English sentence, translate it into French, and infer a number of things from the French translation. Some important features of ProfGlot are the following: — ProfGlot is composed of a number of separate modules which can communicate with each other since they "speak the same language": the theoretical language of Functional Grammar. — Thanks to the design features of Prolog, the program is largely formulated in a declarative mode, containing facts and rules that together define wellformed constructions and permissible actions. This "declarative" knowledge can, however, be put to "procedural" tasks in ways which will become clear in the actual program description. — In the process of constructing the trilingual competence of ProfGlot, care has been taken to separate the language-independent rules and principles as much as possible from the language-particular facts and rules. This has led to a system in which the English, French, and Dutch modules are comparatively small, and the greater part of the program consists of rules and principles which could also be used for other languages. This means that it is not too difficult to extend the competence of ProfGlot with further languages.2 The aims of the present computational exercise are mainly theoretical. The project is part of a wider endeavour to model the linguistic capacities of natural language users by means of the theory of Functional Grammar (for other studies in this direction, compare Connolly and Dik 1989). As the reader will see, however, many grammatical and computational problems have been tackled through strategies which may well be useful to whoever is working on computational linguistic topics, be they of a more practical or of a more theoretical nature.
2 Introducing ProfGlot
1.1. Some practical points Prolog has several dialects. The programs described here have been written in standard Prolog (also called "Edinburgh Prolog"), as described in Clocksin and Mellish (19873) and Bratko (1986). Side by side with Dutch introductions to Prolog by Bakker (1988b) and Des Tombe (1987), these two books have also been the most useful to me in learning Prolog. Even for standard Prolog there are several different interpreters and compilers available. These may differ in minor points, but they will support the present programs, sometimes after minor adaptations. Most Prolog interpreters meant to run on a PC do not offer sufficient space for exploiting the full capacities of the program. The most recent version3 of LPA Prolog Professional Compiler, however, does allow ProfGlot to be run even on a PC.4 Some use has been made of certain built-in facilities of LPAProlog. These facilities will be described in section 2.7. They can be easily converted into standard Prolog conventions available in other interpreters. Obviously, ProfGlot can also be run on bigger systems. Moving to such bigger systems will become necessary when the limited lexica of the present version are extended beyond a certain limit. It is important to note that, even within the constraints imposed by the linguistic theory (Functional Grammar) and the programming language (Prolog), many rules and principles can be formulated in different ways and according to different algorithms. This means that even rules which do work satisfactorily could be formulated in other, and perhaps better ways. "Better" in this context is a difficult notion, since it is composed of values on the dimensions "space" (length of the program) and "time" (processing time). Simpler, more elegant formulations may require more processing time, and therefore we cannot apply a simple one-dimensional simplicity metric to these programs. One of the advantages of programming linguistic phenomena is that any relevant feature of linguistic organization must be made explicit and can thus be studied with an eye on improvement of either the overall architecture or the local formulation of the rules and principles involved. For this reason, it is my conviction that computational programming of this kind will develop (in fact, will continue to develop) into an essential tool for the theoretical linguist. In a real sense this work fits into a tradition which was inaugurated by the Indian grammarian Panini who, several centuries B.C., had the ideal of formulating the principles underlying the grammatical structure of linguistic expressions in explicit rules or "sutras" such that, through applying these rules under stated conditions and in the relevant order, the correct linguistic forms were automatically generated.
Previous work on computational FG 3
1.2. Previous work on computational Functional Grammar Pioneering work on computational Functional Grammar was done by Kwee (1979), who first designed a Functional Grammar generator in the computer language Algol68, and elaborated on this in several later studies (1981, 1987, 1988a 1988b). The first study using Prolog is Connolly (1986), who developed rules for English constituent ordering by means of Prolog testing. Gatward, Johnson and Connolly (1986) showed how Functional Grammar could be used in a natural language processing system. Van der Korst (1987, 1989) first designed a translator English-French. Work on Functional Grammar-Prolog generators was done by Samuelsdorff (1989), Bakker (1988a, 1989), and Bakker, Van der Korst and Van Schaaik (1988). Parsing strategies were discussed in Janssen (1989), Gatward (1989), Dignum (1989a), and Kwee (1989). Voogt-Van Zutphen (1987, 1989) showed how the information contained in Longman's Dictionary of Contemporary English could be automatically converted into Functional Grammar predicate frames. Meijs (1988, 1989) and Vossen (1989) discussed how the network of definitional relations within such a dictionary can be exploited for semantic analysis. Dik (1987a, 1987b, 1987c, 1987d) sketched how Functional Grammar could be used in a wider cognitive environment, compare also Dik (1989a, 1989b). The application of Functional Grammar ideas within a knowledge base environment was developed in Weigand (1986, 1987, 1989, 1990), Dignum et al. (1987), Capel and Westra (1987), and Dignum (1989b). Much of this work was collected in Connolly and Dik (1989). For further work related to the present enterprise, see the References.
1.3. The structure of this book The structure of this book is as follows. Chapter 2 gives a thumbnail introduction to the Prolog language. Chapter 3 discusses the general structure of Functional Grammar, as implemented in this program. Chapter 4 describes the overall structure of the ProfGlot program, its different component modules, and the way they interact. Later chapters then present the program listings of these different modules, with comments on the why and how of each of the rules of the program.
Chapter 2. Some elements of Prolog
2.0. Introduction Prolog is a computer language designed around 1972 by Alain Colmerauer of the University of Aix-Marseille. The language turned out to be an excellent medium for implementing symbolic (as against numerical) problems in general, and linguistic structures in particular. This chapter gives a thumbnail sketch of those features and properties of Prolog which are most important for understanding the ProfGlot program described in this book. For more extensive introduction to Prolog I refer to Clocksin and Mellish (1987), Bratko (1986), and Covington et al. (1988).
2.1. Declarative and procedural programming Most programming languages have a strongly "procedural" character. They require the formulation of instructions for doing certain things. Prolog, on the other hand, is a strongly "declarative" language. It allows for the formulation of "facts" (unconditionally true statements) and "rules" (conditionally true statements). On the basis of these facts and rules, Prolog is able to derive inferences. This procedural side of the program is largely hidden from the user, and is activated when we ask questions about whether something is or is not the case, or for which entities something is or is not the case, as judged in terms of the facts and rules contained in the program. Fed with these questions, Prolog computes the answers by processes of pattern matching (checking two structures for whole or partial identity), unification (unifying two partially specified structures which are compatible with each other into one new structure), and backtracking (systematically trying out alternatives when a solution has been found or when a dead end has been reached).
2.2. Facts, rules, and questions Prolog facts take the following form: (1)
man(socrates).
= 'Socrates is a man* (2)
wife(xanthippe,socrates).
= 'Xanthippe is the wife of Socrates' Prolog facts are statements which are taken to be unconditionally true. They consist of a predicate applied to one or more arguments. In examples (1) and
6 Some elements of Prolog
(2) the arguments are individual constants, and the facts code properties of and relations between individual entities. Prolog rules state that a certain statement holds if certain conditions are fulfilled. They take the following form: (3)
mortal(X) : man(X). = 'Any X is mortal if X is a man.'
In this rule, the property "mortal" is conditionally assigned to some X, the condition being that X should be a man. The sign :- can be read as "if'. X is here a variable; all variables start with a capital letter. All constants start with a lower-case letter. For example: (4)
b, socrates, Xanthippe, john
Constants:
a,
Variables:
Χ, Υ, A, B, Head, T a i l , City
In most versions of Prolog, predicates are always constants. LPA-Prolog, however, offers a possibility of using variables for predicates. The facts of (l)-(2) and rule (3) can together be considered as a (very simple) Prolog program. Presented with the Prolog prompt ?- we can make the program do things by asking questions. We can ask Yes-No questions, such as: (5)
?- wife(xanthippe, socrates). 'Is Xanthippe the wife of Socrates?'
To this question the program will answer yes. The answer is computed through simple pattern-matching with fact (2). More interestingly, we can ask: (6)
?- mortal(socrates). yes Note that in this case the answer is not coded in a fact of the program, but computed through unification and pattern-matching. First of all, the predicate "mortal" is identified in rule (3). The variable X is unified with "socrates" (X=socrates). The rule says that "mortal(X)" is true if "man(X)" is true. Therefore, "mortal(socrates)" is true if "man(socrates)" is true. Therefore, a search is made for whether "man(socrates)" is true. This is true by virtue of fact (1). Therefore, it is concluded that "mortal(socrates)" is also true, and the answer is yes. Besides Yes-No questions we can also ask specific or question word questions such as: (7)
?- mortal(X). 'Who is mortal?' X=socrates
Facts, rules, and questions 7
(8)
?-
wife(X,socrates).
'Who is the wife of Socrates?' (9)
X=xanthippe ?- wife(xanthippe,X).
'Who is Xanthippe the wife of?' X=socrates (10)
?-
wife(X,Y).
'Who is the wife of who?' X=xanthippe Y=socrates
Note that the program only responds positively if it either knows or can infer the relevant fact. Thus, we get a negative response to the question: (11)
?no
mortal(xanthippe).
since, for all the program knows, the predicate "mortal" does not apply to the argument "xanthippe". We can also ask questions such as: "Is anybody mortal?". This is done in the following way: (12)
?- mortal(_). yes
This question contains the so-called "anonymous variable", written "_", in the place of the argument. The question is interpreted as asking whether there is any constant "c" such that "mortal(c)" is true. This is the case, since "mortal(socrates)" is true. Therefore, the answer is yes. Note that no value of "c" is returned for the anonymous variable. Simple questions can be combined into more complex queries: (13)
?-
mortal(X), wife(Y,X).
'Is there a mortal X such that Y is the wife of X?' X=socrates Y=xanthippe
Prolog facts and rules can take any degree of complexity. For example:1 (14)
happy(X) :X=John, nice(weather), is_lying_in(X,grass), is_close_to(mary,X).
'John is happy when the weather is nice, he is lying in the grass and Mary is close to him'
8 Some elements of Prolog
As these examples show, Prolog offers great freedom in the formulation of facts and rules. For example, fact (2) might just as well have been formulated as follows: (15)
wife(socrates,Xanthippe). 'The wife of Socrates is Xanthippe'
where the "husband" takes the first, the "wife" the second argument position. This makes no difference for Prolog computation, as long as we are consistent in applying a choice once it has been made. For example, if we have chosen (15), then all other instantiations of and references to the predicate wife must likewise have the husband in the first, and the wife in the second position. Only when we mix different implementations of the same predicate, we get into trouble. Prolog rules can be used to define auxiliary notions which can then be used in other facts and rules. For example: (16)
(17)
father(X,Y) : male(X), parent(X,Y). 'X is father of Y if X is male and X is parent of Y' grandfather^,Z) :father(X,Y), parent(Y.Z). 'X is grandfather of Ζ if X is father of Y and Y is parent of Z'
This possibility of defining higher-level notions in terms of lower-level ones is extremely useful in terms of efficiency. For example, if at several points in a program an identical series of clauses is to be evaluated, as in: (18)
have (A, B), cost(B,C), glve(D,C,A), glvefA.B.D).
it would be expedient to formulate a rule to the effect that: (19)
sell(A,B,D) : have(A,B), cost(B,C), give(D,C,A), give(A,B,D).
Instead of repeating the conjunction of clauses in (18) over and over again, we can now use sell(X,Y,Z) to "call" (18). We have, in a sense, pushed (18) down into the program, and we do not have to worry about its complexity any more. (18) will be automatically "run" whenever we use sell(X,Y,Z). In this way, we can formulate higher and higher-level predicates which may eventually stand for quite complicated sub-programs.
Prolog lists 9
2.3. Prolog Lists Arguments of Prolog facts and rules can take the form of "lists". A list is an ordered series of elements (a sequence), written as follows: (20) a.
[a,b,c,d]
b.
[a,X,b,Y,c]
c.
[[a.bl.Xjb.faa.p.qJ.Z.c]
The members of lists can thus be individual constants, variables, or mixtures of both. Members of lists may also be facts or rules, or again lists. A special list is the "empty list" [ ]. This is a list without members. In grammars it can be used to indicate dummy elements or zero expression. A list can be divided into one or more Heads (= one or more initial members) and a Tail (= the remaining list). In order to single out the first member from a list, we write: [Head | T a i l ] . In the case of (20), this effects the following divisions: (21) a.
Head a
Tail [ b, c, d ]
b.
a
[X,b,Y,c]
c.
[a,b]
[X,b,[aa,p,q],Z,c]
We could also write: (22)
[ a , b , c , d ] = [H1,H2|Tail]
This will lead to the following division: (23)
H1 = a, H2 = b, T a i l = [ c , d ]
Note that a one-member list such as [X] has a Head X and a Tail [ ] (the empty list). The following mini-grammar illustrates some of the notions introduced: (24)
sentence([the,Adj.N.V.in.the.P]) adjective(Adj), noun(N), verb(V), place(P). adjective(old). adjective(young). noun(man). noun(gorilla). verb(works). verb(lives). place(park). place(zoo).
:-
10 Some elements of Prolog
When we now ask: (25)
?-
sentence(X).
we will get a one-by-one enumeration of all the sentences which can be formed according to grammar (24), beginning with: (26)
[the,old,man,works,in,the,park]
and ending with: (27)
[ the,young,gorilla,lives,in,the,zoo]
We can also use the built-in Prolog predicates w r i t e , nl, and f a i l , in the following way: (28)
?- sentence(X), write(X), nl, fail.
'For any X which is a sentence according to the grammar, write X on the screen, go to a new line, and do this again and again until no more sentences can be formed' This will enumerate all the sentences generated by grammar (24) until the grammar is exhausted. Thus, these built-in predicates have the following effects: (29)
write(X) nl fail
: : :
write the expression "X" on the screen, go to a new line. go on enumerating all the possible solutions until no more solutions are found.
2.4. Recursive definitions Many Prolog procedures can be most elegantly defined by means of recursive definitions. Recursive definitions are used in many places in the ProfGlot program. The concept of recursive definition can be clarified with the following example. Suppose we wish to define a procedure which, when applied to arbitrary lists of letters, doubles all the letters in the list, so that "a" becomes "[a,a]", "b" becomes "[b,b]", etc. For example, the following inputs should yield the corresponding outputs: (30)
input [p]
[a,b,c] [x,x,a,f]
output [p,p]
[[a 3 a],[b,b],[c,c]] [[x,x],[x,x],[a,a],[f,f]]
Recursive definitions
11
The problem in formulating this procedure is that we do not know beforehand how long the input list of letters is going to be. Therefore, if we would try to enumerate all the possible cases, as in: (31)
input [X] [X,Y] [Χ,Υ,Ζ] etc.
output [[Χ,Χ]] [[Χ,Χ],[Υ.Υ]] [[Χ)Χ]ι[YjY].[Ζ,Ζ]]
then we would have to continue formulating new rules for every number of items in the input list. And, however many rules we would have formulated, say 77, they would not account for an input list with a number of members greater than 77. In such a situation we formulate a recursive procedure. First of all, we formulate a rule for doubling single occurrences of any letter: (32)
double(X,[X,X]).
We then recursively formulate a procedure "list_double", as follows: (33) a. b.
l i s t _ d o u b l e ( [Head | T a i l ] , [Headl |Tail1 ]) : double(Head,Head1), list_double(Tail,Tail1). list_double([],[]).
Thus, in order to "list_double" a list consisting of a Head and a Tail, we double the Head into Headl according to (32), and then (recursively) list_double the Tail into Taill. In order to prevent the procedure from continuing its doubling activity beyond the end of the list, we must have a "stop condition" which terminates the procedure when the end of the list has been reached. The last possible Tail of a list is the empty list [ ]. (33b) says that [ ] doubles onto itself, and thus the procedure terminates. This type of recursive definition is extremely useful if we want to apply the same operation or test to the members of lists with variable length. Many Prolog rules are formulated in this recursive form. Consider some further examples: [i] X is a member of list L When is an item X a member of a list L? One thing we know for certain is that X is a member of L if X is the Head of L (34a). But X is also a member of L when X is the Head of the Tail of L. This we can use recursively to search through the whole list by sequentially chopping off the Head, checking whether it is identical to X, and then continuing with the Tail (34b): (34) a.
member (Χ, [X | Tail]).
'Χ is a member of a list if X is the Head of that list'
12 Some elements of Prolog
b.
member(X,[Head|Tall]) : member(X,Tail). 'X is (also) a member of a list if X is a member of the Tail of that list'
Let us see how this works in the following concrete example: (35)
?- m e m b e r ( c , [ a , b , c ] ) . 'Is "c" a member of the list [ a , b , c ] ? '
In evaluating (35) we first try (34a): is "c" the Head of the list [ a , b , c ] ? No. Therefore, we try (34b): is "c" a member of the Tail [ b, c ] ? Again, we try (34a): is "c" the Head of the list [ b , c ] ? No. Therefore, we try (34b): is "c" a member of the Tail [ c ] ? Again, we try (34a): is "c" the Head of the list [c]? Yes, because this single-member list divides into a Head "c" and a Tail []· Therefore, "c" is a member of [ c ]. Therefore, "c" is a member of [ b, c ]. Therefore, "c" is a member of [ a, b, c ]. Therefore, the answer is to question (35) is "yes". [ii] Concatenate two lists LI and L2 into one list L In order to concatenate two lists LI (e.g. [ a , b , c ] ) and L2 (e.g. [ d , e ] ) into one list ( [ a , b , c , d , e ] ) w e can formulate the following recursive rule:2 (36) a. b.
conc([],L,L). conc([XIL1],L2,[X IL3]) : conc(L1,L2,L3). Let us apply this to the concrete example: (37) a.
?-
conc([a,b)c]1[d,e]JL)·
'To which list L do [ a , b , c ] and [ d , e ] concatenate?' Rule (36) says the following: [ a | [ b, c ] ] and [ d, e ] concatenate to [ a | L3 ] if [ b , c ] and [ d , e ] concatenate to L3 (36b); [ b | [ c ] ] and [ d, e ] concatenate to [ b | L33 ] if [c] and [ d , e ] concatenate to L33 (36b); [ c | [ ] ] and [ d, e ] concatenate to [ c | L333 ] if [ ] and [ d , e ] concatenate to L333 (36b);
Recursive definitions
13
By (36a) we know that [ ] and [ d, e ] concatenate to [ d, e ]; therefore, [ c I [ ] ] and [ d, e ] concatenate to [ c | [ d, e ] ]; [ b I [ c ] ] and [ d, e ] concatenate to [ b | [ c, d, e ] ]; [ a I [ b, c ] ] and [ d, e ] concatenate to [ a | [ b , c , d , e ] ] ; and this is the list [ a , b , c , d , e ] . Therefore, L = [ a , b , c , d , e ] . More informally, we can explain the working of this rule of concatenation as follows: in order to concatenate [ a, b, c ] and [ d, e ], first go to the end of the list [ a, b, c ] - this is [ ] - and from there going backwards add the elements of this list one by one to the list [ d, e ]: (37)
[ a , b , c | [ ] ] t o be concatenated with [ d , e ] [a,b,c] —> [] conc [ d , e ] [a,b] —> [c] conc [ d , e ] [a] —> [b] conc [ c , d , e ] —> [a] conc [ b , c , d , e ]
= = = =
[d,e] [c,d,e] [b,c,d,e] [a,b,c,d,e]
As these examples make clear, recursive definitions of this kind may require quite a bit of computation. The advantage is, however, that they specify the relevant relationships for lists of arbitrary length and arbitrary membership. The principle of recursive definition will be further clarified by many examples in the actual ProfGlot program.
2.5. Looking into Prolog atoms Prolog predicates and arguments (constants and variables) are treated as "atoms". The internal structure of atoms is normally not accessible: we cannot immediately apply operations to strings of characters. Among the built-in predicates of Prolog, however, there is a predicate "name", which takes atoms as arguments and yields a list consisting of the ASCII codes of the letters which make up the atom. For example: (38)
?- name(Simon,X). X=[115,105,109,111,110]
Using these ASCII code lists we can perform phonological, orthographical, and morphological operations on atoms. The use of the predicate "name" is exemplified in the following program: (39) [1] [2]
conc ([ ], L, L). conc([XIL1],L2,[X|L3]) conc(L1,L2,L3).
:-
14 Some elements of Prolog
[3] diminutive(Dim) : stem(N), name(N,NName), ending(Ε), name(E,EName), conc(NName,EName,DName), name(Dim,DName). [4] stem(boek). stem(huis). stem(kop). [5] e n d i n g ( J e ) . This program forms the Dutch diminutives boekje, huisje, kopje by concatenating the stems defined in [4] with the ending defined in [5]. In order to do so we must, according to [3], take a stem and an ending, then compute the "name" of both, which yields two ASCII code lists, then concatenate these two lists into one list, and then compute what atom the latter list is the name of. Note that we use "concatenate" as defined in (36). The process of "affixation" defined in (39) is a rather complicated matter, due to the fact that we cannot "look into" Prolog atoms without stepping down to the ASCII code through the predicate "name". In order to simplify matters, we could define a relation "affix" as follows: (40)
affix(A,B,C) : name(A,AName), name(B,BName), conc(AName,BName,CName), name(C,CName).
Through this higher-level definition, we can push the complexity of the clauses connected with the "name" relation down into the program, and then simplify (39)[3] to: (41) [3*] diminutive(Dim) : stem(N), ending(Ε), affix(N,E,Dim). This again exemplifies how a program can be simplified through defining higher-level notions. For wherever we want to apply some sort of affixation in the program, we can now use the predicate "affix", which will then call the more complicated procedure in (40). The LPA-Prolog interpreter in fact offers some "clever" predicates which can be used to circumvent much of this complexity of string-handling. For this, see 2.7. below.
Modifying a program during execution
15
2.6. Modifying a program during execution During execution a program can be modified by adding clauses to, or retracting clauses from the program. Unless special measures are taken these modifications are not retained when the execution is ended. When "Clause" is some Prolog fact or rule, the following instructions can be used for modification: (42)
assert(Clause).
Clause is added to the program. asserta(Clause).
Clause is added at the beginning of the program. assertz(Clause).
Clause is added at the end of the program. (43)
retract(Clause).
Clause is retracted from the program. These instructions can be used for modifying the program "from the outside" while it has been activated. They can also be part of the program itself in such a way that, when certain predicates are called, this will automatically cause certain modifications to be effected. For one example of this, consider the following definition of a "counter" (from Clocksin and Mellish 1987), which, starting at 1, will generate a series of increasing integers 1, 2, 3, 4, ... etc. (44)
current_counter(0). counter(M) :retract(current_counter(N)), Μ Is N+1, asserta(current_counter(M),1.
When "counter(X)" is called for the first time, the current counter (0) is retracted, the counter is fixed at 0 + 1 = 1, and asserted into the program as the new current counter. The second time "counter(X)" is called it will get the value 2, then 3, etc. The value of "!" will be explained below, in 2.7. This is the most important Prolog machinery which I have used in the computational Functional Grammar to be discussed in this book. What has not been discussed here will be clarified in the commentary on the program itself.
2.7. Some built-in facilities of LPA-Prolog Most predicates of the ProfGlot program will be accepted by any standard Prolog interpreter. Some use has been made, however, of built-in facilities specific to the LPA-Prolog interpreter, since these facilities led to considerable
16 Some elements of Prolog
simplifications of the program, and to a speeding up of its operation. This mainly concerns the following predicate: (45)
concat(X,Y,Z).
This is a "clever" concatenation predicate, since it can be applied both to lists and to strings. Thus, the following are both evaluated as true: (46) a. b.
concat([a,b,c],[d,e],[a,b,c,d,e]). concat(pig,let,piglet).
The predicate "concat" thus "recognizes" the type of its arguments (list or string), and responds accordingly. This means that it can be used both for listconcatenation and for string-concatenation. The predicate "concat" can also be used for testing the composition of both lists and strings. Consider the following examples of tests on lists: (47) a.
?- c o n c a t ( L , [ d , e ] , [ a . b . c . d . e ] ) . 'Which list L concatenates with [d,e] to yield [a,b,c,d,e]?' L = [a,b,c] b. ?- c o n c a t ( _ , [ d , e ] , [ a , b ( c , d , e ] ) . 'Does any list concatenate with [d,e] to yield [a,b,c,d,e]?' = 'Does [a,b,c,d,e] end in [d,e]?' yes And the following examples of tests on strings: (48) a.
?- c o n c a t ( X , l e t , p i g l e t ) . X = pig b. ?- c o n c a t ( _ , l e t , p i g l e t ) . yes The predicate "concat" can thus be used both for testing the composition of a form and for affixing elements to it. Consider the following (partial) rule for creating the third person singular present tense form of English verbs: (49) [1] [2] [3]
[4]
stem(S) : member(S,[walk,kiss,buzz,bleach]). sibilant(Sib) :member(Sib,[s,z,ch]). pres_3sg(Stem,Form) : stem(Stem), sibilant(Sib), concat(_,Sib,Stem), concat(Stem,es,Form), !. pres_3sg(Stem,Form) : stem(Stem), concat(Stem,s,Form).
Some built-in facilities of LPA-Prolog
17
In words: (50) [1] S is a stem if S is a member of [ w a l k , k i s s , b u z z , b l e a c h ] . [2] Sib is a sibilant if Sib is a member of [ s , ζ, ch ]. [3] We get the present 3 singular Form of a Stem if Stem is a stem, Sib is a sibilant, Stem ends in Sib, and Form is created from Stem by suffixing es. [4] Otherwise, we get the present 3 singular Form of a Stem if Stem is a stem, and Form is created from Stem by suffixing s. The exclamation mark "!" in (49)[3] is the Prolog "cut". It means so much as: when you have found a solution up to this point, stop and do not backtrack to find any other solution. It can be used to effect what in linguistics is known as "disjunctive rule application" (either (49)[3] or (49)[4] should be applied to any stem, but not both), and what in other programming languages would look like an "IF χ THEN y; ELSE z" procedure. The properties of the Prolog "cut" will be further discussed in the commentary on the program.
Chapter 3. Introducing Functional Grammar
3.0. Introduction ProfGlot implements the theory of Functional Grammar (FG), in the version described in Dik (1989f)1, adapted to the requirements of Prolog programming. The adaptations are of different kinds: — all FG structures, rules, and principles have been cast in a format which can be interpreted by Prolog. — many rules and principles have been formulated more precisely than in the earlier informal descriptions, so as to make them work within a computational model. — at a number of points, programming in Prolog has led to modifications, simplifications, and substantial improvements in the Functional Grammar formalism. For other exercises in computational Functional Grammar, the reader is referred to Connolly and Dik (1989).
3.1. Outline of the Functional Grammar generator The generator forms the central component of ProfGlot. Other modules (the parser, the logic, the translator) use the rules and the output of the generator. The overall structure of the generator is laid out in Figure 1. The brief explanation given here will be further detailed in the commentary on the actual program. The main steps in the construction of a linguistic expression are the following: — building up a clause structure — turning the clause structure into a fully specified clause through a series of operations together called the specification; — mapping the fully specified clause onto a linguistic expression through a system of expression rules. The nucleus of the clause structure consists of a predicate frame. Predicate frames may be basic or derived. Basic predicate frames are contained in the lexicon. The lexicon contains all the basic, non-derivable contentives of the languages, with a specification of their non-derivable formal and semantic properties. An example of a basic predicate frame is the following: (1)
bpredv(eng,[[walk],[act,mve],[[[anim],t,[ag]]]]).
This predicate frame defines the structure: (2)
[ [walk], [ a c t i v e ] , ! [[anim], t , [ a g ] ] ] ]
20 Introducing Functional
Grammar
extended predication
\
proposition
I
clause structure
J Specification 1. subj obj assignment 2. verb agreement 3. anaphora resolution 4. reflexive marking 5. equi marking 6. copula support
Ϊ
fully specified clause + expression rules 1. form 2. order 3. sandhi
J
linguistic expression
Figure 1.
Lay-out of the Functional Grammar generator
Outline of the FG generator 21
as a basic verbal predicate (bpredv) of English (eng). The structure takes the form of a Prolog list, consisting of three sublists, as follows: (3)
[ [walk], [act.mve], [ [ [ an im ], t , [ ag ] ] ] ]
:the form :the type ; the argument positions
The first sublist defines the form of the predicate (generally, the stem from which the other forms can be most easily derived). The second sublist defines the type of the predicate. The type consists of features (in this case "action" and "movement") which are essential to the behaviour of the predicate in the grammar. The third sublist contains one or more sublists defining the argument positions of the predicate. Each argument position consists of three parts, as follows: (4)
[ [anim] t [ag]
: the selection restriction : the term position : the semantic function
] The selection restriction is used to block undesired output such as: (5) a. b.
The table walks. The cloud kisses the car.
The term position is the place where a term may be inserted. The semantic function specifies the role of the entity in question in the State of Affairs designated by the predicate frame. The predicate walk has only one argument position (it is a "one-place" predicate). The following is an example of a two-place predicate: (6)
bpredv(eng,[[kiss],[act], [[[anim],t,[ag]],[[anim],t,[pt]]]]). In (6), kiss is defined as a two-place relation between an animate agent and an animate patient. Adjectives and nouns are likewise coded in predicate frames: (7) (8) a. b.
bpreda(eng,[[clever],[grad,eval,comm], [[[anim],t,[zero]]]]). bpredn(eng,[[man],[hum,masc]]). bpredn(eng,[[John],[hum,masc,proper]]).
Just like verbs, adjectives consist of a form, a type, and a list of argument positions. Thus, (7) defines clever as a one-place, gradable, evaluative, and
22 Introducing Functional
Grammar
commentative predicate applicable to animate entities. (8a) defines man as a basic nominal predicate of the type "human, masculine". (8b) similarly defines the proper noun john. Nouns have no argument positions, for reasons to be explained later. Associated with the predicate frames the lexicon contains meaning postulates/definitions, and paradigms. An example of a meaning postulate is: (9) mean(eng, [ [ k i s s ] , [ a c t ] , [ [ [ anim],X1,[ag]],[[anim],X2,[pt]]]], [_,[[touch],[act],[[[anim],X1,[ag]],[[concr],X2,[pt]]]], [_,[[idiom],'the lips',[instr]],_,_]]). In a way to be clarified later, this meaning postulate claims that to say that XI kisses X2 is to say that XI touches X2 with the lips. The general structure of meaning postulates is: (10)
mean(L,Definiendum,Genus,Differentia).
In this case, "kiss" (Definiendum) is defined as 'touch' (Genus) 'with the lips' (Differentia). The combination of Genus and Differentia forms the Definiens. Note that the Definiens has the same syntactic properties as the Definiendum. This means that in a structure in which we find the Definiendum, we can replace it by the Definiens, thus arriving at a paraphrase of the original structure. This type of paraphrasing will be discussed in chapter 15. The paradigms contain the irregular, unpredictable forms of a predicate. Examples are: (11) a. b.
paradigm(eng,[child,children]). paradigm(eng,pres.be,[is,am,are]).
According to the principle of "lexical priority", the expression rules which define the forms of constituents will first check whether the required form is contained in a paradigm. Only when no such form is found will the regular rule be applied. Derived predicate frames contain those predicates which can be productively formed by means of predicate formation rules. Derived predicate frames have the same syntax as basic predicate frames. The "types" of predicate frames only contain the non-redundant semantic features characterizing the predicate. Redundant features (in the sense in which "human" entails "animate", and "action" entails "control" and "dynamism") are introduced by means of redundancy rules. The abstract underlying clause structure which is built up around a predicate frame has a standardized syntax for all linguistic expressions of the three languages which ProfGlot has knowledge of. This standardized syntax is outlined in Figure 2.
Outline of the FG generator 23 CLAUSE
from LEXICON or PREDICATE FORMATION
Figure 2. Standardized syntax for clause structures.
24 Introducing Functional
Grammar
The over-all structure of the clause can be represented as follows: (12)
[P4,[P3,[P2,[P1,(NUCLEUS],S1],S2],S3],S4]
The nucleus or nuclear predication consists of a predicate frame as selected from the lexicon or created through predicate formation. Predicate frames have up to three argument positions (in this program). The "P" elements symbolize grammatical operators at four different levels, and the "S" elements symbolize lexical modifiers or satellites at these same levels.2 The elements of the different layers have the following semantic import: (13)
nucleus: defines a type of State of Affairs (SoA). predicate operators PI, predicate satellites SI: give additional specifications of the nature of the nuclear SoA. predication operators P2, predication satellites S2: define the "location" of the SoA with respect to spatial, temporal, and objective-cognitive coordinates, propositional operators P3, propositional satellites S3: define the Speaker's attitude towards and evaluation of the content of the proposition illocutionary operators P4, illocutionary satellites S4: define, specify, and motivate the illocutionary force or speech act value of the clause as a whole.
In the present program these various elements have been specified as follows: (14)
core predication = [P1,[NUCLEUS],S1 ] PI = Progressive Aspect or empty ([ ]). 51 = Satellites of Manner, Instrument, Beneficiary, possibly empty ([*]).
Direction,
and
(15)
extended predication = [P2,[C0RE PREDICATION] ,S2] P2 = Tense, Polarity (empty/positive/negative), Perfect Aspect or empty, and the SoA variable "e", with a random integer as identifier. 52 = Satellites of Place, Time, and "Cognition" (Reason, Condition, Concession), possibly empty.
(16)
proposition = [P3, [EXTENDED PREDICATON],S3] P3 = Modal operators (empty/predictive/possibility), propositional variable "xx", plus random identifier.
and
Outline of the FG generator
53 = (17)
25
Modal/attitudinal satellites such as probably and cleverly, possibly empty.
clause = [P4,[PROPOSITION],S4] P4 = Illocutionary operators, illocutionary variable "ee" plus random identifier. 54 = Illocutionary satellites, possibly empty.
Within the over-all structure of the clause, there are certain dependencies between different elements at the same layer or at different layers. These dependencies have been indicated by dotted lines in Figure 2. For example, the specification of Time satellites should correlate with the specification of the Tense operator, as is clear from: (18) a. b. (19) a. b.
John arrived yesterday. *John arrived tomorrow. John will arrive tomorrow. *John will arrive yesterday.
The way in which these dependencies have been captured in the actual program will be clarified later on. In order to form a clause structure we take the following steps: [1] take a basic predicate frame from the lexicon or create a derived predicate frame through predicate formation; [2] add redundant features to the Type of the predicate frame through redundancy rules. [3] create a core predication schema through specifying PI operators and SI satellites around the nuclear predicate frame; [4] create an extended predication schema through specifying P2 operators and S2 satellites around the core predication schema. [5] turn the extended predication schema into an extended predication through inserting appropriate terms into its argument and satellite slots. [6] create a proposition by specifying P3 operators and S3 satellites around the extended predication; [7] create a clause structure through specifying P4 operators and S4 satellites around the proposition. At step [5], terms are inserted into the open argument and satellite slots of the extended predication schema. Terms are expressions which can be used to refer to (concrete or abstract) entities of different types. They range from simple pronouns such as he to complex noun phrases such as: (20)
the man who thought that Sarphati Street was the most beautiful street in Amsterdam
26 Introducing Functional Grammar
Just like clause structures, terms have been assigned a uniform syntax, as outlined in Figure 3. A term structure is a list, consisting of two sublists, one for term operators, and one for one or more restrictors: (21)
term structure = [Operators,Restrictors]
The term operators specify parameters relevant to the set of intended referents as a whole, the restrictors specify properties that an entity must have in order to qualify as a potential referent of the term. The term operators have positions for "definiteness" (including demonstratives and quantifiers) and "number", and a position for the term variable "x", provided with a random identifier. The restrictors (in this program) are divided over four potential positions Rl, R2, R3, and R4: (22)
term structure = [ [Def .Num.Var], [R1 ,R2,R3,R4] ]
— R l is always filled and consists of a nominal predicate plus its type. — R2 is a position for attributive adjectives and participles; no iteration of these attributive modifiers has been allowed so far. — R3 is a position for adpositional attributes such as John's, of John, in the city (as in: the house in the city), and with the red dress (as in: the girl with the red dress). Note that R3 contains terms, and that further terms can be embedded into these terms, as in: the house of the father of the boy with the blue eyes. This creates a danger of "infinite recursion" which Prolog is unable to handle unless certain measures are taken in order to stop the recursive iteration at a certain point. Such measures have indeed been taken, so that quite complex R3 modifiers can be formed without the system running into infinite loops (see 4.3.2.). — R4 is a position for relative clauses, which themselves have the syntax of extended predications (see Figure 2). Again, measures have been taken to control infinite recursion without disallowing relative clauses within relative clauses within ... relative clauses. There are certain dependencies within the structure of terms, indicated by dotted lines in Figure 3. Adjectives and relative clause structures have been made sensitive to the type of the head noun Rl, so as to prevent output such as: (23)
*the clever stones who kiss the book
And the whole term has been made sensitive to the selection restriction of the term position which it is to be inserted into, so as to prevent output such as: (24)
*The book kissed the stones.
Outline of the FG generator 27 term positions: arguments satellites
selection
t
I
1
function
predn type
preda
3 Ο 3 5 9L 3" Φ α> α.
0) Q. Λ' ο
J
< '
Ρ υ 01 3. ο π' (Β
term
&
Go Ο
diminutive maantje, man 'man' --> diminutive mannetje. The affixation of -etje rather than -tje depends on complex phonological factors (including features of accent and syllabification) which have not yet been captured in the program. Through the "2", the correct behaviour of these forms is simply stipulated. All other variants of the diminutive ending are correctly produced by rule. bpredn(dut,[[jan],[hum,masc,proper]]). bpredn(dut,[[marie],[hum,fem,proper]]). bpredn(dut,[[zondag],[time.proper]]). bpredn(dut,[[emmer],[inanim,concr]]). bpredn(dut,[[zak],[inanim,concr]]). bpredn(dut,[[jongen],[hum.masc]]). bpredn(dut,[[boek],[inanim,neut,concr]]). bpredn(dut,[[professor],[hum,masc]]). bpredn(dut,[[professor],[hum,fem]j). bpredn(dut,[[schllder],[hum,masc]]). bpredn(dut,[[schilderes],[hum,fem]]). bpredn(dut,[[vrouw],[hum,fem]]). bpredn(dut,[[kind],[hum,neut,masc]]). bpredn(dut,[[kind],[hum,neut,fem]]). bpredn(dut,[[meisjej,[hum,neut,fem]]). bpredn(dut,[[dame],(hum,fem]]). bpredn(dut,[[moeder],[hum,fem]]). bpredn(dut,[[student],[hum,masc]]). bpredn(dut,[[studente],[hum,fem]]). bpredn(dut,[[vader],[hum,masc]]). bpredn(dut,[[stad],[place,[]]]). bpredn(dut,[[tuin],[place,[]]]).
64 The Dutch lexicon
bpredn(dut,[[dinges],[inanim,concr,instrument]]). The last noun is the dummy noun corresponding to English thingummy and French machin. All Dutch nouns can be divided into "neuter" and "non-neuter" nouns. This Gender difference is important for the selection of articles and demonstratives, and for the formal expression of attributive adjectives and participles. The difference has here been coded by assigning "neut" to the type of neuter nouns, and leaving non-neuter nouns unmarked. Note that the difference masculine/feminine is important as well, and interacts with the neuter/non-neuter distinction. This can be seen in: (1) a. b. (2) a. b.
De dame (non-neut, fem) laat haar hondje uit. 'The lady walks her dog' Het meisje (neut, fem) laat haar hondje uit. "The girl walks her dog' De zeeman (non-neut, masc) laat zijn hondje uit. 'The sailor walks his dog' Het jongetje (neut, masc) laat zijn hondje uit. 'The boy walks his dog'
bpreda(dut bpreda(dut bpreda(dut bpreda(dut bpreda(dut bpredv(dut bpredv(dut [[[anin] bpredv(dut [[[anin] bpredv(dut [[[anim] bpredv(dut bpredv(dut bpredv(dut bpredv(dut [[[anim] bpredv(dut [[[anim] bpredv(dut [[[anim] bpredv(dut [[[anim] bpredv(dut [[[anim] bpredv(dut [[[anim] bpredv(dut
[[goed],[grad],[[[anim],t,[zero]]]]). [[goed],[grad,eval],[[[inanim],t,[zero]]]]). [[slim],[grad,eval,comm],[[[anim],t,[zero]]]]) [[lang],[grad],[[[vert],t,[zero]]]]). [[klein],[grad],[[[vert],t,[zero]]]]). [[loop],[act,mve],[[[anim],t,[ag]]]]). [[beweeg],[act,mve], t)[ag]]ι[[concr],t,[pt]]]]). [[schop],[act], t,[ag]],[[concr],t,[pt]]]]). [[raak],[act], t,[ag]],[[concr],t,[pt]]]]). [[sterv],[proc],[[[anim],t,[proc]]]]). [[lach],[act],[[[anim],ΐ,[ag]]]]). [[hoest],[act],[[[anim],t,[ag]]]]). [[slaa],[act], t,[ag]],[[concr],t,[pt]]]]). [tgaa],[act,mve,tel], t,[ag]],[[concr],t,[dir]]]]). [[kus],[act], t,[ag]],[[anim],t,[pt]]]]). [[geev],[act], t>[ag]],[[inanim],t,[pt]],[[anim],t,[rec]]]]). [[koop],[act], t,[ag]],[[inanim],t,[pt]],[[anim],t,[so]]]]). [[verkoop],[act], t,[ag]],[[inanim],t,[pt]],[[anim],t,[rec]]]]). [[speel,vals],[act],[[[anim],t,[ag]]]]).
The module DutLex 65
bpredv(dut,[[raak.aan], [ a c t ] , [[[anim],t,[ag]],[[concr],t,[pt]]]]). Dutch has a rather large category of "separable compound verbs", consisting of a verbal predicate and some kind of particle: [speel,vals] = 'cheat', [raak,aan] = 'touch', etc. Morpho-syntactically these compounds behave in complex ways, as can be seen from the following examples: (3) a.
b. c. d. e.
Marie speelt altijd vals. Mary plays always false 'Mary always cheats' Jan zegt dat Marie altijd valsspeelt. 'John says that Mary always cheats' Marie heeft valsgespeeld. 'Mary has cheated' Marie gaat valsspelen. 'Mary is going to cheat' Marie is een valsspeelster. 'Mary is a cheater' (lit. false-play-ster).
All these manifestations of the complex predicate [speel,vals] are accounted for by ProfGlot. This is also one of the reasons why all predicate forms are given as lists (usually with only one member). In the predicate formation component (see chapter 9) we shall meet other cases in which complex predicates are needed. Note that for English the notion complex predicate could be used for such verbs as [call,up], [give,away], [play,guitar], which have properties similar to the Dutch separable compounds (cf. Voogt-van Zutphen 1989). This has not yet been implemented in the English module. bpredvm(dut,[[bet r e u r ] , [ p o s i ] , [ [ [ a n i m ] , t , [ p o s ] ] , [[prop,fact],t,[pt]]]]). bpredvm(dut,[[wil],[posi],[[[anim],t,[pos]], [ [ e x t p r e d ] , t , [ rf ] ] ] ] ) . bpredvm(dut,[[geloov],[posi],[[[anim],t,[pos]], [ [prop] ,-t, [pt] ] 11). bpredvm(dut,[[verwacht],[posi],[[[anim],t,[pos]], [[prop],t,[pt]]]]). bpredvm(dut,[[verwacht],[posi],[[[anim],t,[pos]], [[extpred],t,[rf]]]]). degree(dut,erg). degree(dut,nogal). degree(dut,redelijk). degree(dut,verbazend).
66 The Dutch lexicon
8 . U . Lexical satellites The Dutch satellites in this subsection are point-by-point equivalent to the English and French ones. sat2temp(dut,[vandaag],[pres|Z]) :gamble(sat2). s a t 2 t e m p ( d u t , [ g i s t e r e n ] , P2) : gamble(sat2), (P2=[past|Z];P2=[pres,perf|Z]). sat3(dut,[waarschijnlijk],decl,_) gamble(sat3).
:-
sat4(dut,['eerlijk gezegd,'],_) :gamble(sat4). s a t 4 ( d u t , [ ' e e r l i j k gezegd, e e h , ' ] , _ ) gamble(sat4).
:-
8.13. Meaning postulates mean(dut, [ [ k u s ] , [ a c t ] , [ [ [ a n i m ] , X 1 , [ a g ] ] , [ [ a n i m ] , X 2 , [ p t ] ] ] ], [_,[[raak,aan],[act],[[[anim],X1,[ag]],[[concr],X2,[pt]]]], [_,[[idiom],'de lippen',[instr]],_,_]]). mean(dut, [[loop],[act,mve],[[[anim],X1,[ag]]]], [_,[[beweeg],[act,mve],[[[anim],X1,[ag]], [[concr],[[def,same,same],[[ana,[ambi]j j ] , [ p t ] ] ] ] , [_,[[idiom],[te.voet],[]],_,_]]). mean(dut, [[schop],[act],[[[anim],X1,[ag]],[[concr],X2,[pt]]]], [_,[[raak],[act],[[[anim],X1,[ag]],[[concr],X2,[pt]]]], [_,[[idiom],[de,voet],[instr]],_,_]]).
8.1.4. Paradigms paradigm(dut,[stad,Steden]). paradigm(dut,[kind,kinderen]). paradigm(dut,[zeeman,zeelieden]). paradigm(dut,[professor,professoren]). paradigm(dut,goed,[betel r , b e s t ] ) . It is difficult to formulate 'pseudo-phonological' rules for Dutch without at least recognizing the special status of the schwa. In order to distinguish the
The module DutLex 67
schwa (written "e") from the vowels /e/ (written "e" or "ee") and je! (written "e") it has, in some occurrences where this is crucial, been written as "el". This withdraws stems ending in -elC (where C is some consonant) from rules which would otherwise apply to stems ending in -eC. All occurrences of "1" will be removed at a last run through the sentence, just before it is output. paradigm(dut,koop,[gekocht]). paradigml ( d u t , p a s t , _ , k o o p , [ k o c h t ] ) . paradigm(dut,verkoop,[verkocht]). paradigml(dut,past,verkoop,[verkocht]). paradigm(dut,slaa,[geslagen,slaan]). paradigml(dut,pres,_,slaa,[[],sla,[],slaan]). paradigml ( d u t , p a s t , _ , s l a a , [ s l o e g ] ) . paradigm(dut,gaa,[gegaan,gaan]). paradigml(dut,pres,_,gaa,[[],ga,[],gaan]). paradigml ( d u t , p a s t , _ , g a a , [ g i n g ] ) . paradigm(dut,beweeg,[bewogen]). paradigml (dut,past,_,beweeg,[bewoog]). paradigm(dut,bedrieg,[bedrogen]). paradigml(dut,past,_,bedrieg,[bedroog]). paradigm(dut,loop,[gelopen]). paradigml(dut,past,loop,[liep]). paradigm(dut,sterv,[gestorven]). paradigml(dut,past,_,sterv,[stierf,[],[],[],stierv]). paradigm(dut,zijn,[geweest,zijn]). paradigml(dut,pres,_,zijn,[is,ben,bent,zijn]). paradigml(dut,past,_,zijn,[was,was,was,[],war]). paradigm(dut,heb,[gehad]). paradigml(dut,pres,_,heb,[heeft,heb]). paradigml(dut,past,heb,[had,had,had]). paradigm(dut,word,[geworden,worden]). paradigml(dut,past,_,word,[werd]). paradigml ( d u t , p r e s , _ , k u n , [ k a n , k a n , k u n t ] ) . paradigml (dut,past,_,kun,[kon,kon,kon,[],kond]). paradigml(dut,pres,_,zul,[zal,zal,zult]). paradigml ( d u t , p a s t , _ , z u l , [ z o u , z o u , z o u , [ ] , z o u d ] ) . paradigml ( d u t . p r e s ^ w i l , [ w i l ] ) .
68 The Dutch lexicon
paradigm(dut,doe,[gedaan,doen]). paradigml ( d u t , p r e s , _ , d o e , [ [ ] , [ ] , [ ] ,doen]). paradigml(dut,past,_,doe,[deed]). paradigm(dut,geev,[gegeven]). paradigml(dut,past,_,geev,[gaf,gaf,gaf,[],gav]). paradigm(dut,lach,[gelachen]). paradigm(dut,[p1,sg,[], [ik,mij,me,[],mi]n,mi]zelf]]). paradigm(dut,[p1 , p l , [ ] , [wij,ons,[],[],ons,onszelf]]). paradigm(dut,[p2,sg,[], [jij.jou, je.M.jouw.Jezelf]]). paradigm(dut,[p2,pl,[], [J u l l i e , ] u l l i e , [ ] , [ ] , j u l l i e , J u l l i e z e l f ] ] ) . paradigm(dut,[p3,sg,[masc], [hij,hem,'m',[],zijn,zichzelf]]). paradigm(dut,[p3,sg,[fem], [zij,haar,'r',[],haar,zichzelf]]). paradigm(dut,[p3,sg,[inanim], [het,het,'t',[],[],zichzelf]]). paradigm(dut,[p3,pl,[inanim], [zij,ze,[],[],hun,zichzelf]]). paradigm(dut,[p3,pl,[], [zij,hen,ze,[],hun,zichzelf]]). paradigm(dut,[que,sg,[inanim], [[**,wat],[**,wat],[],[**,waar],[**,waarvan]]]). paradigm(dut,[que,sg,[hum], [[**,wie],[**,wie],[],[**,wie],[**,Wiens]]]). paradigm(dut,[que,pl,[hum], [[**,wie],[**,wie],[],[**,wie],[**,'van wie']]])· paradigm(dut,[rel,sg,[neut], [[**,dat],[**,dat]]]). paradigm(dut,[rel,sg,[], [[**,die],[**,die],[],[**,wie],[**,Wiens]]]). paradigm(dut,[rel,pl,[], [[**,die],[**,die],[],[**,wie]]]). paradigm2(dut,[[[],[**,wanneer]],[[],[**,waar]], [[],[**,hoe]],[[],[**,waarom]]]).
Chapter 9. UniGen: the universal generator
9.0. Introduction Together with BasFac and one language-specific lexicon (e.g. EngLex), UniGen produces fully specified underlying clause structures in the language in question. Since underlying clause structures have a standardized syntax across the three languages, most rules of UniGen are generally valid for the three languages. These generally valid rules are marked with the variable language identifier "L", or have no language identifier at all. Only a few rules and principles are language-specific. These language-specific items have been intermingled with the general rules at the appropriate places in the program. They can be recognized by the language identifiers "eng", "fre", and "dut". Sometimes a language-specific rule, marked by a "cut", is followed by a general "default" rule, which then takes care of the other languages. In such cases the order of the rules is crucial: the specific rule must precede the default rule. Another way of defining a rule as relevant to only two of the languages is to provide the rule with the variable language identifier "L", and then specify the values of L disjunctively as, for example (L=eng;L=dut) = 'if the language is either English or Dutch'. In that case the order of the rules is of no influence. The "highest" predicate of this module is: (1)
fully_speclfied_clause(L,X).
where L can be specified as eng, fre, or dut. The language identifier will then "percolate" through all the rules, and the corresponding fully specified clause structures will be generated. All "lower" predicates of the module, however, can be questioned in this way as well, and thus any intermediate stage in the production process can be inspected. This is very useful, first of all to gain insight into the operation of the program, and secondly to see where something may have gone wrong if the program does not work as expected. In studying UniGen it is useful to refer to the Figures 1, 2, and 3 of chapter 3, through which the various stages in the production process can be traced step by step.
9.1. The module UniGen 9.1.1. Extending predicates by redundancy rules predv(L,X1) : bpredv(L.X), add_nedundant_features(X,X1).
70 The universal generator
The "type" of basic predicates as stored in the lexicon has only been specified for non-redundant features. Redundant features, however, such as "animate" which can be inferred from "human", are required at various points in the grammar. Therefore, before they can be used in the construction of clause structures, all predicates must be provided with the various redundant features which can be inferred from those for which they are specified. This is what happens here in the definition of verbal predicates: a verbal predicate is a basic verbal predicate to which redundant type features have been added. The actual addition of redundant features is defined below. predvm(L,X1) : bpredvm(L,X), add_redundant_features(X,X1). The same for "verbal matrix predicate". predn(L,X1) : bpredn(L,X), add_redundant_features(X,X1). The same for "nominal predicate". preda(L,Xl) : bpreda(L,X), add_redundant_features(X,X1). The same for "adjectival predicate". The procedure "add_redundant_features" is defined as follows: extend(act,[dyn.contr]). extend(posi,(contr]). extend(hum,[anim,concr,vert]). extend(grad,[state]). These clauses define which features may be inferred from those already specified in the lexicon: — an "action" is always "dynamic" and "controlled"; — a "position" is always "controlled"; — a "human" is always "animate", "concrete", and "vertical" (the latter feature is used for monitoring the selection of an adjective like tall); — a "gradable" adjective always designates a "state". add_redundant_features([S,Τ|Ζ],[S,T1|Z]) member(F,T), extend(F,Fl), concat(T,F1,T1), !. add_redundant_features(X,X).
:-
This rule inspects the "type" Τ of a predicate frame for a feature F such that there is an extension rule which lists redundant features Fl for F. If so, the redundant features are concatenated with the type list T. If not, the predicate frame remains unchanged.
The module UniGen 71
9.1.2. Choosing an arbitrary predicate If we construe a clause structure around a predicate frame of a certain type, then Prolog will automatically select the first eligible predicate frame from the lexicon, then the second, and so on until the last one. If we were to use a "cut", then we would get the same predicate (the first one) again and again. In order to vary the output we would like to have something equivalent to "select an arbitrary predicate from the lexicon". This is achieved (in one possible way) in the following definition of "arbitrary predicate": arb_pred(L,X) : (choose_arb_verb(L,X); choose_arb_adj(L,X)). We get an arbitrary predicate by choosing an arbitrary verb or an arbitrary adjective. choose_arb_verb(L,X) : f i n d a l l ( Y , p r e d v ( L , Y ) , LIST), choose_random(X,LIST). We choose an arbitrary verb by creating a List of all the items Y such that Y is a "predv", and choosing a random member from that list. The predicate "findall" is a built-in Prolog procedure for collecting all items which have a certain property. predv(L,Y) was defined above. choose_arb_verb(L,X) : gamble(emb), findall(Y,predvm(L,Y), LIST), choose_random(X,LIST). We can also find an arbitrary verbal predicate by gambling for "emb", and then following the same procedure with respect to instances of "predvm". Gambling was defined in section 5.1.1. It will yield no value when the setting for embedding is emb(O), and it gets more and more chance of success as the value of emb(N) approaches 100%. choose_arb_adj(L,X) : gamble(adj), findall(Y,preda(L,Y),LIST), choose_random(X,LIST). An arbitrary adjective can be selected in the same way, now after gambling for adjectives. choose_random(X,LIST) length(LIST,N), M1 i s irand(N), not(M1=0), nth(M1,LIST,X), !.
:-
72 The universal generator
We choose a random member from a list by first determining the length of the list (see 5.1.2.), say 23, then choosing a random non-null number Ml below 23, say 16, and then choosing the 16th member of the list. In this way, we have defined the various basic predicates which may be used in predicative function. How derived predicates can be defined through predicate formation is discussed in the next section.
9.1.3. Predicate formation 9.1.3.1. The status of predicate formation Predicate formation rules are rules which productively derive new predicate frames from given predicate frames. In formulating these rules a choice has been made which has certain theoretical implications: all predicate formation rules have been cast in an abstract format, which does not contain any reference to language-specific material. For example, the rule for Agent noun formation does not add a specific morpheme (e.g., English -er) to the verbal input stem, but adds the abstract features [agent,masc] or [agent,fem] to that stem. These features will then have their impact on the expression of the derived noun in the expression rules. This abstract treatment of predicate formation has several advantages: — The same predicate formation rules can be used for different languages. Only in the expression rules will the derived predicates be differentiated. — Translation at the level of the underlying clause structure becomes easier, as can be seen by comparing: (2) (3)
Dutch valsspeelster [[speel,vals],[agent,fem]]
French tricheuse [[trich],[agent,fem]]
If the forms of (2) formed part of the underlying clause structure a complex decomposition and reassemblage would be needed to arrive from the one at the other.1 The more abstract forms of (3), however, can much more easily be correlated, since [speel,vals] and [trich] are to be found in the lexicon anyway, and the other elements are identical. — A third possible advantage of this abstract analysis is that we can at this level productively derive predicates which, however, are not productively expressed in the language concerned. For example, English has no agent noun cooker 'person who cooks'; instead, it uses the "idiomatic" noun cook. In some sense, however, cook fills the "slot" of the agent noun of the verb cook. We could thus describe this situation in the following way:
The module UniGen 73
(4) [1] take the predicate frame of the verb cook; [2] derive [[cook],[agent,masc]] (or fem) through the rule of Agent noun formation; [3] express this as cook; [4] block the rule which would otherwise result in cooker. Obviously, this method of handling predicate formation presupposes that the Parser can get from (2) to the underlying representation (3). This, however, can be achieved through an inverted application of the expression rules, as will be shown in chapter 13.
9.13.2. Predicate formation rules [i] Agent noun formation prednn(L,X1,Sel) : gamble(predforml), choose_arb_verb(L,[V,T,_]), member(act,Τ), X=[[V,[agent,GEN]],[hum,GEN]], (GEN=masc;GEN=fem), add_redundant_features(X,X1), X1=[F,TY], s u b l i s t ( S e l , T Y ) . Masculine and feminine derived Agent nouns of type "Sei" are formed as follows: gamble on "predforml" and if the outcome is positive, choose an arbitrary verbal predicate with form V and type Τ such that "act" is a member of Τ (it is an "action verb"), then form a derived predicate of the form [V, [agent, masc] ] or [V, [agent,fem] ] and type [hum,masc] or [hum,fem], add redundant features to the latter type, and check whether "Sei" is a sublist of the resulting type ΤΎ. In other words, for any action verb the system knows the corresponding male and female action nouns, although these are nowhere listed as such. Even with the very restricted lexica used in the present program, the rules as formulated here overgenerate slightly. For example, they generate the forms English goer,2 French alleur, alleuse, Dutch gaartder, gaanster. All these forms are of dubious acceptability. Several routes can be taken in order to remedy this kind of overgeneration: [a] if it should turn out that the exclusion hinges on some semantic feature, this feature could be specified with the input predicate in the lexicon, and the rule could be made sensitive to it. [b] if, on the other hand, the non-existence of such forms is a matter of "accidental gaps", we might resort to the device of "negative rule features" (cf. Lakoff 1970): we could put a marker on the verbal predicate go, signalling: "this predicate is not to undergo Agent noun formation".
74 The universal generator
[c] we could also use some such device in monitoring the expression rather than the creation of the derived predicate. That this might not be inappropriate can be seen from such facts as the following: (5) a. b. (6) a. b.
The one who wins this game gets the first prize. The winner of this game gets the first prize. The one who goes will miss the best part. *The goer will miss the best part.
We might thus say that "one who Vs" is an alternative way of expressing Agent nouns, and formulate the rules accordingly. [ii] Dutch diminutive nouns A striking characteristic of Dutch is the ubiquitous usage of diminutive nouns. The Dutch drink a "cup-let" of tea, stir it with a "spoon-let", then put the "spoon-let" on the "saucer-let", and perhaps have a "biscuit-let". Almost any count noun can be productively turned into a diminutive noun, the formation of which is quite regular, although there are several alternating diminutive endings, monitored by complex phonological conditions. No matter whether the input noun is neuter or non-neuter, the output diminutive is always neuter: (7) a. b.
de man 'the man' het kind 'the child'
—> ...
>
het mannetje 'the little man' het kindje 'the little child'
The rule can be formulated as follows: predndim(dut,X,Sel) : gamble(predform8), choose_arb_noun(dut,[N,[Y,GEN|Z]],Sel), X=[[Njdim],[Y,neut|Z]]. In order to form a diminutive noun "predndim" of type "Sei", we gamble on "predform8", choose an arbitrary noun Ν of type "Sei", and create a form [N,dim] in the type of which "neut" replaces whatever Gender there was in the input noun. Note that derived diminutive nouns cannot be re-input to this rule. This is to avoid double diminutives such as *kindjetje. Derived agent nouns, however, can be diminutized: predndim(dut,X,Sel) : gamble(predform8), prednn(dut,[N,[Y,GEN|Z]],Sel), X=[[N,dim],[Y,neut|Z]]. This has the effect of creating the following derivational chains:
The module UniGen 75
(8)
action verb: agent noun: diminutive:
koop 'buy' koper 'buyer' kopertje 'little buyer'
[iii] Attributive participles predaa(L,[[prp,V],T1,[[S,t,[zero]]]]) :gamble(predform2), choose_arb_verb(L,[V,T1,[[S,t,FU]|Z]]). This rule forms (arbitrary) derived participles [prp,V] from input verbs V. The selection "S" on the first argument of the input predicate is retained in the output, so that we automatically account for such facts as the following: (9) a. b. (10) a. b.
The man walks. the walking man *The book walks. *the walking book
The output predicate is tagged "predaa" rather than simply "preda", since we wish to restrict it to attributive usage. In a construction such as: (11)
The man is walking.
we have an expression of Progressive aspect rather than a predicative use of the participle walking. On the other hand, the attributive participle formed here and the present participle figuring in the expression of Progressive aspect will both have the form [prp,V] and be expressed by the same rule in the expression component. Thus, although the forms have different origins in the grammar, they share the same expression device. [iv] Comparatives The rules of comparative formation should produce the underlying structures for comparatives such as: (12) a. b. c.
John is taller than Mary. John is less tall than Mary. John is as tall as [equally tall as] Mary.
Further, provision must be made for the attributive use of comparatives, as in: (13) a. b. c.
the taller man the less tall man the equally tall man
while excluding such constructions as: (14) a. b.
*the taller than John man *the less tall than Peter girl
76 The universal generator
No provision has so far been made for constructions such as: (15)
the man taller than John
I have experimented with several ways of deriving comparatives. The first solution, which I later rejected, was to formulate the rule in such a way that it produces a two-place predicate on the basis of a one-place gradable adjective: (16)
tallA (xt)*
-> taller (χ,), (x2)Standard
This method worked all right, but it had some disadvantages: — "Standard" arguments can (in the three languages modeled here) not be questioned or relativized into: (17) a. b. (18) a. c.
*Who is John taller than? *Than who(m) is John taller? *the boy who John is taller than *the boy than who(m) John is taller
Therefore, "Standard" arguments had to be explicitly excluded from questioning and relative clause formation. — In Dutch constituent order, if Standards are treated as arguments and the comparative is treated as the non-finite part of the verb, we will get incorrect orders such as (19a) instead of (19b): (19) a. b.
*Jan is dan Marie langer. John is than Mary taller Jan is langer dan Marie.
Of course, this could be remedied, but only by again making an exceptional statement for "Standard" arguments. — Thirdly, there is evidence in Dutch that langer dan Marie behaves as one unified constituent for the purposes of constituent ordering, in two respects: [a] the constituent cannot be broken up to yield such orders as: (20) a. b.
*Dan Marie is Jan langer. than Mary is John longer, *Langer is Jan dan Marie. longer is John than Mary
[b] the combination of comparative + standard can occur in clause-initial PI position, a position which usually accepts only single constituents: (21)
Langer dan Marie is Jan niet. longer than Mary is John not
Therefore, the strategy illustrated in (16) requires many stipulations in the grammar concerning exceptional behaviour of "arguments" with the semantic function "Standard".
The module UniGen
77
I then realized that all these stipulations would be superfluous if the Standard were generated as part of the derived predicate. The derivation then takes the following form: (22) a. b.
tallA ( x ^ tall(John)
-> —>
{tallerA (x2)Stand} ( x ^ {taller than Mary} (John)
On this analysis, implemented below, all exceptional stipulations concerning "Standards" could be removed from the grammar. This can be taken as a sign that the analysis in (22) is more adequate than that of (16). However, since constructions such as those in (15) have not yet been captured, I have restricted the output of the comparative formation rule to predicative usage, and formulated another rule which simply produces such comparative adjectives as taller (without Standard) for attributive use. predaderpred(L,[[[S.POL],[SE,T,[stand]]],[state], [[SE,t,[zero]]]]) :gamble(predform3), member(POL,[pos,neg]), choose_arb_adj(L,[[S],[grad|_],[[SE,t,[zero]] ] ]), term(L,T,SE). This rule thus forms predicative positive/negative comparative predicates as illustrated in (22) from gradable adjectival predicates. The derived predicate takes the form [ [S,POL], [SE,Τ, [stand] ] ], where "POL" is either "pos" or "neg", and "T" is a term of selection "SE", which selection is inherited from the input gradable adjective. This inheritance accounts for the following regularities: (23) a. b. c. d.
John is intelligent. John is more intelligent than Peter. *John is more intelligent than the table. *The table is more intelligent than the tree.
For this reason, the predicate good has been divided over two predicate frames, one for animate and one for inanimate entities. The following rule in a similar way creates comparatives of equality: predaderpred(L,[[[S,equal],[SE,T,[eqstand]]],[state], [[SE,t,[zero]]]]) :gamble(predform3), choose_arb_ad](L,[[S],[grad|_],[[SE,t,[zero] ] ] ]), term(L,T,SE). In this case the semantic function of the standard of comparison is "eqstand" rather than "stand". We thus differentiate between the semantic functions "Standard" and "Standard of Equality" in order to get the expression rules correctly produce than or as. Note that in French, where we have que in both cases, such differentiation would not be needed.
78 The universal generator
predader(L,[[S.POL],[state],[[SE,t,[zero]]]]) :gamble(predform3), on(POL,[pos,neg,equal]), choose_arb_adJ(L,[[S],[grad|_],[[SE,t,[zero] ] ] ]). The same rules, now only producing the comparative adjective (without Standard), for attributive usage. Note that these derived predicates can also be used in predicative position, as in John is taller. [v] Degree + adjective predaderl(L,[[deg,DEG,A],TY,AR]) : gamble(predform4), choose_arb_adj(L,[A,TY,AR]), member(grad,TY), degree(L,DEG). This rule creates derived adjectival predicates by combining gradable adjectives with a degree adverbial: very tall, surprisingly intelligent, etc. The degree adverbials are drawn from the relevant lexicon, guided by the language identifier L (= eng, fre, or dut). As said in the commentary on these lexica, some of these degree adverbials might in turn be created by productive rule. [vi] Term predicates predt(L,[X,[state],[ARG]]) :gamble(predform5), member(Sel,[[anim],[inanim]]), term(L,X,Sel), X=[[indef,NUM,RF],[[S,[A|_]]|Z]], AA= [ S e i , t , [ z e r o ] ] , perform(L,[ins,NUM,A],AA,ARG). This rule is used to derive the structures underlying sentences such as: (24) a. b. c.
The man is a sailor. These men are good sailors. The intelligent professor is a lady.
The rule works as follows: — gamble for "predform5". — create an indefinite term X of selection "Sei" (how this is done is discussed below) underlying, for example, a sailor. — this indefinite term is used as a predicate in a predicate frame which indicates a state (the state of "being a sailor"), with one argument position ARG which must be compatible with the Type and the Number of the predicate term: (25) a. b.
* These men is/are a good sailor. *The table is a woman.
The module UniGen 79
— Into this argument position an appropriate term is inserted through the rule: (26)
perform(L,[ins,NUM,A],AA,ARG).
This special rule of term insertion will be defined below. It is rather strange for a predicate formation rule to have the argument term inserted rightaway. A better solution is to formulate a special term insertion rule along the following lines: (27)
When inserting a term into a predicate frame, check the nature of the predicate. If the predicate is a term predicate, the term to be inserted should take into account the Number and the Type of that predicate. Otherwise, only the selection restriction on the argument position where insertion takes place needs to be taken into account.
— The insertion of the argument term, however, does not block the usage of this type of derived predicate frame in the construction of clause schemas, since term insertion has been formulated in such a way that, if an argument position is already filled, it will be passed on unchanged. This was independently required for partially filled idiomatic predicate frames. Note that all derived non-verbal predicates will be automatically provided with a copular verb through the rule of copula support. [vii] Satellite-derived predicates predsat(L,[Y,[state],[[[],t,[zero]]]]) gamble(predform6), (sat1b(L,A,_);sat21oc(L,A,_)), not(A=[*]), 1, perform(L,ins,A,Y).
:-
This rule creates derived adpositional predicates from beneficiary satellites (= sat lb) and local satellites (= sat21oc). Note that such satellites, if not empty (= [*]). have the form [ [anim] , t , [ben] ] or [ [place] , t , [loc] ]. The rule inserts an appropriate term into the "t" position. The derived predicate, Y, has one argument position, which can be filled by any type of term (the selection is [ ]). The end product can be used to generate such constructions as: (28) a. b. (29) a. b.
The The The The
boy is for the professor. book is for the professor. boy is in the city. book is in the house.
In Dutch this type of construction is also possible with directional satellites (= sat Id). Thus, in answering a question such as Waar is Jan? 'Where is John?' we can say:
80 The universal generator
(30)
Jan is naar de bakker. John is to the baker 'John is (has gone) to the baker's'
This is captured in the following rule, which is restricted to Dutch: predsat(dut,[Y,[state],[[[anim],t,[zero]]]]) gamble(predform6), sat1d(dut,A,_), not(A=[*]), perform(dut,ins,A,Y). This ends our discussion of predicate formation rules as programmed so far. Through these rules we have considerably extended the set of predicate frames which can be selected in the creation of predications. Apart from basic (lexical) predicate frames, we now have the following derived predicate frames available for this purpose: arb_pred(L,X) : (ρ red ade r(L,X); predaderpred (L, X); predaderl (L,X); ρ red t ( L , X ) ; predsat(L,X)).
john john john john john
is taller. is taller than peter. is surprisingly talL is a good sailor. is in the garden.
9.1.4. The core predication schema Using the basic and derived predicate frames defined so far, we can now start building up the core predication. For this and the following sections, please refer to Figure 2 in chapter 3. core_pred_schema(L,[P1,[S,T,A],S1]) arb_pred(L,[S,T,A]), p r e d i c a t e _ o p e r a t o r s ( L , P 1 ,T), satsl(L,S1,T).
:-
A schema for the core predication has the form [P1,[S,T,A1,S11, where [S,T, A] is an arbitrary predicate frame of the language (as defined in sections 9.1.1. through 9.1.3.), PI is a predicate operator, and SI are satellites of level 1. However, not every predicate operator or satellite is compatible with every type of nucleus. For example, a beneficiary satellite requires a nucleus of type "controlled": (31) a. b. (32) a. b.
John cut the tree for me. John kept the money for me. *The tree fell down for me. *The grass is green for me.
The module UniGen 81
This problem has been solved through defining predicate operators PI and satellites SI in relation to the "type" Τ of the nucleus. If the nucleus is not of the required type, these operators and satellites cannot be selected. This is more generally the solution for all those dependencies which are indicated by dotted lines in Figure 2. Predicate operators P I are specified as follows: predicate_operators(L,[],_). predicate^peratorsfL.progrjT) gamble(pred1), member(act,T).
: -
The predicate operator can always be [ ], but can, after gambling for "predl", also be specified as "progressive", if the type of the nucleus is "action". The latter restriction is actually too strong, since not only actions, but also processes and positions can be combined with the progressive. But in those cases certain more specific constraints apply which have not yet been integrated into the program. Satellites SI are defined as follows: satsl(L,[Μβη,ΙηβΐΓ,ΟίΓ,Ββη],T) sat1m(L,Man,T), sat1i(L,Instr,T), satlb(L,Ben,T), sat1d(L,Dir,T).
:-
Satellites of level 1 consist of satellites of Manner, Instrument, Direction, and Beneficiary, all of which may be empty. Neither this satellite specification nor the others to follow are meant to be exhaustive. Note that all satellites may be empty: a satellite always provides an optional further specification of its domain. Empty satellites are represented as [ * ] so as to make it possible, by inspecting underlying clause structures, to keep track of which satellites have and which have not been filled. sat1m(L,[*],_). sat1m(L,[[eval],t,[manner]],T) :gamble(sat1), memberjcontr,"!"). sat1m(L,[[eval],ADJ,[manner]],T) gamble(satl), member(act,Τ), preda(L,[ADJ,TY,AR]), member(eval,TY).
The Manner satellite may be empty, or it may be specified. One way of specifying a Manner satellite is by simply adding a term position with the function "manner" and the selection "evaluative". This position is used in the present program for inserting the questioned term which will lead to how?
82 The universal generator
More generally, as captured by the third rule, Manner satellites can be formed from adjectives with the type feature "evaluative". Manner satellites can only be associated with nuclear predicate frames of type "control" (i.e., with actions and positions — the constraint is actually a little too strong). eatli(L,[*],_). sat1i(L,[[instrument],t,[instr]],T) gamble(sat1), member(contr,T).
:-
Instrument satellites, if not empty, can be associated with controlled nuclear predications. Note that the satellite is an open term position which can be filled with appropriate terms. This, however, is not sufficient to capture the rather specific selection restrictions between nuclear predicate and instrument. That is why provisionally only "dummy" nouns such as English thingummy, French machin, and Dutch dinges have been admitted to this position (see 6.1.1.). eat1b(L,[*],_). sat1b(L,[[anim],t,[ben]],T) gamble(sat1), member(contr,T).
:-
A Beneficiary satellite can be associated with any controlled nuclear predication. sat1d(L,[*],_). satld(L,[[place],t,[dir]],T) :gamble(sat1), member(mve,T). A directional satellite can be associated with those nuclear predicates which have the feature "movement".
9.1.5. The extended predication schema ext_pred_schema(L,[P2,[P1,[S,T,A],S1],S2]) core_pred_schema(L,[P1,[S,T,A],S1]), predic_operators(L,P2), sats2(L,S2,T,P2).
:-
Please look again at Figure 1 in chapter 3. We now build a schema for the extended predication onto the core predication schema, by extending this with predication operators P2 and predication satellites S2. Predication satellites S2 may be sensitive both to the type of the nucleus and to the values chosen for the predication operators P2. Predication operators P2 are defined as follows:
The module UniGen
predic_operators(L,[Tense,Pol,Aspect,Rf]) tense(Tense), aspect(Aspect), polarity(L,Pol), event_referent(Rf).
83
:-
tense(pres). tense(past) :gamble(pred2). aspect([ ]). aspect(perf) :gamble(pred3). polarity(L,[ ]). polarity(L,pos) :gamble(pred4). polarity(L,neg) :gamble(pred4). event_referent([e,N]) :Ν is irand(25).
Predication operators comprise operators for Tense, Aspect, and Polarity, and an event referent. The Tense can be present or, after gambling for "pred2", past. The Aspect operator is basically [ ], otherwise perfect. The Polarity is empty ([ ]), and can also be "positive" or negative. "Positive" is here used in the sense of "emphatic positive", as in: (33)
John did (indeed) steal the books!
The event referent is a variable "e", with a random integer Ν as identifier. sats2(L,[Loc,Temp,Pol,Cog],T,P2) :sat210c(L,Loc,T), sat2temp(L,Temp,P2), sat2pol(L,Pol), sat2cog(L,Cog).
Satellites of level 2 consist of those of Place (sat21oc), Time (sat2temp), Polarity (sat2pol) and "Cognition" (sat2cog = Reason, Concession, Condition), all of which may be empty.3 sat21oc(L,[*],_). sat21oc(L,[[place],t,[loc]] ,T) : gamble(sat2), member(contr,T).
A locative satellite can be associated with nuclear predications designating controlled SoAs. The restriction, which is empirically too strong, is meant to exclude output such as: (34)
*The book was good in the city.
sat2temp(L,[*],_). sat2temp(L,[[time],t,[temp]],_) :gamble(sat2).
84 The universal generator
A temporal satellite can be added to any core predication. sat2pol(L,[*]). The polarity satellite is always left empty in this grammar (see above). sat2cog(L,[ *]). sat2cog(L,[[j,t,[nomreason]]) : gamble(sat2). This cognitive satellite creates a position for non-clausal satellites of Reason such as because of John, because of the bad weather. The semantic function has been labelled "nomreason" so as to distinguish it formally from causal subordinators on clausal satellites: because of vs. because. Note, however, that in many languages we find similar or even identical expression on terms and subordinate clauses of reason. sat2cog(L,[[prop],t,[reason]]) :gamble(subsat). sat2cog(L,[[extpred],t,[cond]]) :gamble(subsat). s a t 2 c o g ( f r e , [ [ p r o p , s u b j o n c ] , t , [ c o n c ] ] ) : - I, gamble(subsat). sat2cog(L,[[prop],t,[conc]]) :gamble(subsat). These rules are used to introduce satellites with the internal structure of predications and propositions and the semantic functions of Condition (if), Concession (although), and Reason (because). Conditional satellites have been given the internal status of extended predications, since they do not easily allow for modifiers of level 3: (34) a. b.
*If John probably comes... ??If John, cleverly, answers the question...
Concessive and reason satellites of this type have been given the internal structure of propositions, since they do allow modifiers of level 3. The rules introduce term positions of the form: (35) a. b.
[[prop],t,[reason] ] [[extpred],t,[cond]]
in which the selection restrictions [prop] and [extpred] indicate that the term to be inserted into the "t" position should be a "propositional term" or a "predicational term", respectively. Such terms will be defined below. French concessives have been distinguished because bien que clauses are obligatorily expressed in the subjunctive mood. The feature "subjonc" will be inherited by the tense of the embedded proposition and from there trigger the appropriate expression rules. Remember that the danger of infinite recursion inherent in this construction is avoided by keeping the chance for "subsat" below 100%.
The module UniGen
85
9.1.6. Creating the extended predication The extended predication schema as so far developed is an "open" structure of the form: (36)
[P2,[P1,[Form,Type,Args],S1 ],S2]
in which Args, SI, and S2 contain positions which are to be filled with terms. Such a schema can thus be turned into a "closed" extended predication by inserting appropriate terms into all its argument and satellite positions: ext_pred(L,PREDD) : ext_pred_schema(L,PRED), do_all_terms(L,ins,PRED,PREDD). The operation "do_all_terms" will be defined below. If we want to create an extended predication in which one term position is taken by a Q-word, the following rule can be applied: qext_pred(L,PRED2) : ext_pred_schema(L,PRED), do_one_term(L,que,PRED,PRED1), do_all_terms(L,ins,PRED1,PRED2). We get a questioned extended predication by questioning one term position and inserting terms into all (the other) positions. Note the following points: — Yes-No questions are obtained more directly as extended predications with illocutionary operator "interr" rather than "decl". — The above rule does not account for "multiple" Q-word questions such as: (37)
Who gave what to whom?
Generating such multiple Q-word questions would require slight alterations in the above rule. In order to be able to apply term insertion to extended predication schemas we need a class of terms to be inserted. Such terms are defined in the next section.
9.1.7. Terms and term formation For this section, please consult Figure 3 in chapter 3.
86 The universal generator
9.1.7.1. Basic terms Terms can be basic (listed) or derived (productively formed). Personal and interrogative pronouns are here treated as basic terms. They are given an abstract representation (cf. De Groot and Limburg 1986) with the same syntactic organization as term structures in general. Consider the following example: bterm([[def,sg,sp],[[p1,[hum,ambi]]]]). This basic term structure defines the personal pronoun I. Just like any term structure, it consists of a position for term operators and a position for restrictors. The term operators are "def* (personal pronouns are intrinsically definite), "sg" (singular), and a referent identifier "sp" (the term refers to whoever is the speaker of the utterance in question). In the restrictor position there is a single restrictor, consisting of an abstract predicate "pi" ('first person') and a type, which is specified as "hum" 'human' and "ambi" 'of either gender/sex'. Note that this abstract term can be expressed in different ways: I, me, my, myself, mine. The correct form is selected from the relevant paradigm in the lexicon by the expression rules. The fact that this basic term has the same syntactic structure as other terms facilitates the formulation of rules of agreement and other rules triggered by elements of term structure. Note that, although this complicated term structure may seem strange at first, all the elements distinguished can be considered as being "packed into" the first person singular pronoun. bterm([[def,sg,addr],[[p2,[hum,ambi]]]]). bterm([[def,pi,sp],[[p1,[hum,ambi]]j j). bterm([[def,pi,addr],[[p2,[hum,ambi]j]]). bterm([[def,sg,RF],[[p3,[hum,masc]]]]) :- rf(RF). bterm([[def,sg,RF],[[p3,[hum,fem]]]]) :- rf(RF). bterm([[def,sg,RF],[[p3,[[],inanim]]]]) : - rf(RF). bterm([[def,pl,RF],[[p3,[hum,masc]]]]) rf(RF). bterm([[def,pi,RF],[[p3,[hum,fem]]]]) :- rf(RF). b t e r m ( [ [ d e f , p l , R F ] , [ [ p 3 , [ [ ] , i n a n i m ] ] ] ] ) : - rf(RF). These term structures likewise define the other personal pronouns. In the case of third person pronouns, the referent variable will be specified as [x,N], where Ν is a random positive integer. Whenever we need a term, it must usually be a term with specific selectional properties. We want to be able to choose an "animate" or a "concrete" term, etc. Furthermore, the basic terms defined above must be provided with redundant features. These things happen in the following definition:
The module UniGeti 87
term(L,X1,Sel) : gamble(pro), bterm(X), X=[OPS,[R]], add_redundant_features(R,Rl), R1=[St,ΤΥ], sublist(Sel,TY), X1=[0PS,[R1]]. A pronominal term with selectional property Sei can be formed when gambling on "pro" succeeds, by adding redundant features to the basic pronominal term, and checking whether the resulting type ΤΎ contains Sei as a sublist. For basic questioned terms I have again used the same syntax, somewhat more arbitrarily, since, for example, definiteness does not seem relevant here, and "que" seems a strange candidate for an "abstract predicate" ('the entity has the property of its identity being questioned'). Nevertheless, the following term structures express correctly that who, for example, can be used to indicate a single or multiple human entity whose identity is questioned. bqterm([[*,NUM,RF],[[que,[hum.masc]]]]) : num(NUM), r f ( R F ) . bqterm([[*,NUM,RF],[[que,[hum,fern]]]]) : num(NUM), r f ( R F ) . The number NUM will later be specified as singular or plural. The referent identifier RF as [x,N], where Ν is a random number under 25. Note that a masculine and a feminine form of who are distinguished. This is necessary in order to capture such differences as between: (38) a. b.
Who would kill himself for such a reason? Who would kill herself for such a reason?
bqterm([[*,sg,RF],[[que,[inanim,concr]]]]) :rf(RF). The corresponding basic term structure for inanimate what. bqterm([[*,*,RF],[[que,TY]]]) :rf(RF), member(TY,[[time],[place],[eval]]). The corresponding term structures for questioning satellites by means of where, when, and how, where the Type of the questioned item is "time", "place", and "evaluative", respectively.
88 The universal generator
Q-terms are formed from basic Q-terms by adding the redundant features: qterm(X1) : bqterm(X), X=[OPS,[R]], add_redundant_features(R,R1), X1=[0PS,[R1]]. There is a special basic "anaphorical" term structure which can be freely inserted into term positions, and will be later equated with respect to number and referential index with a possible antecedent through the mapping "anaphora", which is part of the "specification". It may also be marked as "reflexive" or "equi" under the appropriate circumstances: term(L,[[def,same,same],[[ana,[ambi]]]],_) gamble(pro).
:-
These are the basic terms as recognized in the program. Note that they are identical for the three languages. Only their expression differs, as specified by the relevant paradigms in the lexicon.
9.1.7.2. Derived terms All other terms are productively formed by means of the following rules: term(L,[[def,sg,D],[R1]],Sel) :arb_noun(L,R1,[proper]), R1=[F,TY], s u b l i s t ( S e l , T Y ) , rf(D). This rule produces term structures of type Sei for proper names from arbitrary proper noun predicates of type Sei. Note that such term structures are intrinsically definite and singular (as far as this program goes), and cannot have further restrictors. term(L,[[A,C,D],[R1,R2,R3,R4]],Sel) : dem(A,C), num(C), rf(D), arb_noun(L,R1 , S e l ) , R1=[F,TY], not(member(proper,TY)), (R2=[]\ r e s t r 2 ( L , R 2 , R 1 ) ) , (R3=[j; ( g a m b l e ( m o d f ) , r e s t r 3 ( L , R 3 ) ) ) , (R4=[]; ( g a m b l e ( r e l ) . r e l a t i v e ( L , R 4 , R 1 , C ) ) ) . This rule produces all other productively derivable terms of type Sei. It has three operator positions for "dem", "num", and " r f t o be defined below, and four restrictor positions Rl, R2, R3, R4, specified as follows:
The module UniGen
89
— R1 must be an arbitrary nominal predicate of type Sei, the type ΤΎ of which does not contain the feature "proper". — R2 may be empty ([ ]), or a "restrictor 2" in relation to Rl. A "restrictor 2" is an adjectival restrictor. The relation to R l is established in order to ensure that only adjectival predicates which are appropriate to the head noun R l will be chosen in this position. — R3 may be empty ([ ]) or, if gambling on "modf" succeeds, it can be a "restrictor 3". — R4 may be empty ([ ]) or, if gambling on "rel" succeeds, it may be a "relative" in relation to the head noun R l and the number specification C of the term (these relations are again established in order to guarantee selection of "appropriate" relative clause structures). We now proceed to the definition of the different term operators and term restrictors. dem(def,_). dem(indef,_) : gamble(terml). The "demonstrative" operator may have the values "definite" or "indefinite", the latter only when gambling on "terml" succeeds. dem([prox,def],_) :gamble(term2). dem([rem,def],_) : gamble(term2). It may also have the values "proximate, definite" (--> this, these) or "remote, definite" (--> that, those). dem([unlVjdef],_) : gamble(term3). dem([hlgh,indef],pl) :gamble(term3). dem([low,indef],pl) :gamble(term3). Finally, it may have the values "universal, definite" (for all and every, depending on the number specification), and if the term is plural it may have the values "high, indefinite" ( - > many) and "low, indefinite" (--> few). num(sg). num(pl) : gamble(term4). The number operator may have the values "singular" or (after successful gambling on "term4"), "plural". rf([x,N]) :Ν i s irand(25).
90 The universal generator
The referential index is [x,N], where Ν is a random integer below 25 (the latter ceiling being arbitrary). Note that the program as developed so far accounts for only a subset of possible term operator specifications. arb_noun(L,X,Sel) : findall(Y,(predn(L,Y),Y=[F,TY],sublist(Sel,TY)),List), choose_random(X,List). We get an arbitrary noun of type Sei by collecting all the nominal predicates "predn" which consist of a form F and a type TY such that Sei is a sublist of ΤΎ in a List, and then choosing a random member of that List. "choose_random" was defined above, in section 9.1.2. arb_noun(L,X,Sel) : (prednn(L,X,Sel); predndim(L,X,Sel)). We also get an arbitrary noun of type Sei by creating an agent noun "prednn" or a diminutive noun "predndim" of type Sei in the predicate formation component. The latter option is only available for Dutch (see section 9.1.3.2. above). restr2(L,[Χ,Υ],[Ν,ΤΥ]) : arb_attr(L,[X,Y,[[SE,t,F]|Z]]), sublist(SE,TY). The second restrictor may be an arbitrary attribute such that the selection restriction on the (first) argument of the attribute is a sublist of the "type" of the head noun. This ensures that, if the adjectival predicate can only take animate arguments, for instance, it can only be second restrictor to a noun of type "animate". In this way we avoid output such as: intelligent ball, *more intelligent bucket, *laughing garden. arb_attr(L,X) :choose_arb_ad](L,X). We get an arbitrary attribute by choosing an arbitrary adjective. The rule "choose_arb_adj" was already defined in 9.1.2. It is also used in choosing adjectives for predicative usage. arb_attr(L,X) : (predaa(L,X); predader(L,X); p r e d a d e n (L,X)). We also get an arbitrary attribute through the predicate formation component by creating a participle "predaa" (the laughing man), a comparative (the taller man, the less tall man, the equally tall man), or a combination of "degree + adjective" "predaderl" (the very tall man, the remarkably tall man). The third restrictor position can be taken by attributive adpositional terms:
The module UniGen 91
restr3(L,Y) : term(L,T,[concr]), Y = [[]|T,[poss]]. Such a term may be formed by placing a concrete term Τ into a possessor frame with semantic function "poss" (= 'possessor'). This results in restrictors such as this man's or of this man. restr3(L,Y) : term(L,T,[place]), Y = [[place],T,[loc]]. Idem, but now for locative restrictors such as in the garden, in the city. restr3(L,Y) : term(L,T,[inanim]), Y = [[inanim],T,[ass]]. Idem, but now for "associative" restrictors such as with the book, with the book of the boy in the garden, etc. In French, the "associative" function will be expressed by ά: le gargon au ballon, la dame awe camelias, etc. How the danger of infinite recursion through "restrictors-3" can be avoided has been discussed in 4.3.2. Restrictors-4, underlying relative clauses, can be formed as follows:4 relative(L,R42,R1,NUM) : ext_pred_schema(L,PRED), d o _ o n e _ t e r m ( L , [ r e l , R 1 , NUM],PRED, R4), do_all_terms(L,ins,R4,R42). Restrictive relative clauses have been given the internal structure of extended predications rather than propositions or clauses, since within them it is hardly possible to specify operators or satellites of levels 3 and 4: (39) a. b.
?the man who probably/cleverly left... *the man who frankly left...
The relative structure is formed by taking an extended predication schema, relativizing one term position appropriate to the head noun R l , and inserting terms into all the remaining term positions. Note the parallelism with Q-word questioning as defined in section 9.1.6.
9.1.8. Operations on term positions At several places in the construction of a "closed" extended predication we must be able to apply certain operations to all term positions or to "just one" of the term positions in the extended predication schema. These term positions may be argument positions, satellite-1 positions, or satellite-2 positions. For this purpose, the following general rules have been formulated:
92 The universal generator
do_all_terms(L,OP,PRED,PRED1) : PRED = [ P 2 , [ P 1 , [ S , T , A ] , S 1 ] , S 2 ] , list_perform(L,OP,A,A1), list_perform(L,OP,S1 ,S11), list_perform(L,OP,S2,S22), PRED1 = [ P 2 , [ P 1 , [ S , Τ , A 1 ] , S 1 1 ] , S 2 2 ] . We do a certain operation " O P " on all terms by "list-performing" OP on the arguments, the satellites-1, and the satellites-2. list_perform(L,OP,[],[]). list_perform(L,OP,[A|Z],[A1|Z1]) perform(L,OP,A,A1), list_perform(L,OP,Z,Z1).
:-
We "list-perform" O P on a list of term positions by performing OP on the first term position, and "list-performing" OP on the rest of the term positions. do_one_term(L,0P,PRED,PRED1) : PRED=[P2,[P1,[S,T,A],S1],S2], do_one(L,OP,[A,A1]), PRED1 = [ P 2 , [ P 1 , [ S , T , A 1 ] ,S1 ] , S 2 ] . do_one_term(L,0P,PRED,PRED1) : PRED=[P2,[P1,[S,T,A],S1],S2], do_one(L,OP,[S1,S11]), PRED1=[P2,[Ρ1,[S,T,A],S11],S2]. do_one_term(L,OP,PRED,PRED1) : PRED=[P2,[P1,[S,T,A],S1],S2], do_one(L,OP,[S2,S22]), PRED1=[P2,[P1,[S,T,A],S1],S22]. We apply an operation OP to just one of the term positions by "doing one O P " to the arguments, the satellites-1, or the satellites-2. do_one(L,OP,[A,A1]) : member(AA,A), perform(L,OP,AA,AA1), subst(AA,A,AA1,A1). We do one operation to a group of terms by performing the operation on a member A A of that group, turning A A into AA1, and substituting AA1 for AA in the group. W e now consider the actual instantiations of the operation " O P " : perform(L,que,[SE,t,F],[SE,Q,F]) qterm(Q), Α =
:-
[OPS,[[ST,TY]|z]],
sublist(SE,TY).
We perform questioned term insertion by taking a questioned term which conforms to the selectional conditions on the term position, and placing it in the term position marked by "t".
The module UniGen
93
perform(L,[rel,Rl,NUM],[SE,t,F],A1) : R1=[X,[CAT,GEN|RR]], sublist(SE,[CAT,GEN|RR]), A1 = [SE,[[*,NUM,*],[[rel,[GEN,CAT]]]],F]. Relativization of some term position [ SE, t , F ] within a predication must be formulated in relation to the R1 (the head noun) of the term and the number NUM of the term as a whole, if we want to avoid such output as: (40) a.
b.
the ball which laughs [the first argument of laugh requires an animate term; the head noun is inanimate] the boys who laughs [the relative pronoun triggers singular agreement; but the term as a whole is plural]
Therefore, the relative element must be informed on certain Type features of the head noun R l ; the relevant features differ from language to language, but they typically include certain Gender features. For English we need the information on "human" vs. "non-human", for Dutch we need "neuter" vs. "non-neuter" and "masculine" vs. "feminine", and for French we need "masculine" vs. "feminine". Further, we need information on the number of the term as a whole. In its turn, the relative element must trigger correct agreement on its own predicate. In order to fulfil these requirements, the relative element which is introduced here has been given the same syntax as any other term: (41)
[[*,NUM,*],[[pel,[GEN,CAT]]]]
where NUM is the number inherited from the term as a whole, and GEN and CAT represent the relevant Gender features from the type of the head noun. Due to this internal structure, the relative term will automatically trigger correct agreement on its own predicate. Insertion of the relative term into positions for satellites-1 will result in such constructions as: (42) a. b.
the boy for whom john bought the book the city to which the girl went
Insertion into positions for satellites-2 may result in: (43) a. b.
the city in which john kissed the girl the day on which peter met mary
After Q-insertion and Rel-insertion we can now consider the other types of term insertion. First, there is the special kind of term insertion required for constructions with term predicates (see above, section 9.1.3.2):
94 The universal generator
perform(L,[lns,NUM,A],[Sel,t,[zero]],[Sel,T,[zero]]) : term(L,T,Sel), T=[[_,NUM,_],[[_,[A|_]]|_]]. In inserting a term into such a frame as: ...is a sailor we have to take into account both the Number and the Type of the predicate term, in order to avoid such productions as (25) above. perform(L,ins,[SE,t,F],[SE,TE,F]) :term(L,TE,SE). perform(L,ins,X,X) : n o t ( X = [ S E , t , F ] ) , I. Finally, the normal rule of term insertion can be formulated in this simple way. We do term insertion to a term position by creating a term TE which is compatible with the selection SE of the term position, and substituting that term for the "t" element. If the term position is not "open", i.e. does not contain a "t", it remains unchanged. This can happen in two ways: (a) the predicate frame contains an idiomatic ready-made term, (b) the term position has already been questioned or relativized. The net result of these insertion procedures is that all the available term positions of the extended predication schema have been filled with appropriate terms: the extended predication schemas have been turned into extended predications.
9.1.9. Building the extended predication into the proposition Through the procedures defined so far we have created "closed" extended predications which can now be built into propositions: prop(L,X) : ext_pred(L,PRED), PRED = [ P 2 , [ P 1 , [ S , T , A ] , S 1 ] , S 2 ] , X = [P3,PRED,S3], prop_operators(P3), sat3(L,S3,ILL,T). We obtain a proposition by taking an extended predication and providing it with proposition operators P3 and prepositional satellites S3. The satellites-3 have been made sensitive to both the illocution of the clause as a whole (which at this level has not yet been specified), and the type of the nuclear predication. The sensitivity to the illocution of the whole clause is clear from such facts as: (44)
*Does John probably go to the city?
The module UniGen 95
Since the illocutionary variable ILL is not resolved at the level of the propositional schema, the definition of these satellites will be repeated at the next layer. prop_operators([Att,Rf]) :att_operator(Att), prop_referent(Rf). att_operator([]). att_operator(predict) :gamble(pred5). att_operator(poss) :gamble(pred5). prop_referent([xx,N]) Ν is irand(25).
:-
The propositional operators consist of an attitudinal operator and a propositional referent. The attitudinal operator is [ ] in the basic mode of the program, and can get the values "predictive" or "possible" when the corresponding chances result in successful gambles. Predictive mood rather than future tense is used in this program for clauses which take will in English. The propositional referent consists of a propositional variable followed by a random integer under 25. As all satellites, satellites-3 may be empty: sat3(L,[*]>_,_)· Non-empty satellites-3 such as probably can be selected from the lexicon. Other satellites of level 3 are defined as follows: sat3(L,[[eval],ADJ,[opinion]],decl,T) gamble(sat3), member(act,T), preda(L,[ADJ,TY,AR]), member(comm,TY).
:-
This rule accounts for "commentative" propositional satellites of level 3, as in: (45)
John cleverly went to the city. = 'It was clever of John to go to the city'
The illocution for this type of satellite must be declarative, in order to avoid output like (46), and the adjective must have the feature "commentative" in order to avoid output like (47): (46) (47)
*Did John cleverly go to the city? *John tall-fy answered the question.
Both the formal expression and the characteristic order of these propositional satellites will be taken care of by the expression rules. Note that satellites of this type cannot simply be treated as satellites of Manner operating at level 3, since:
96 The universal generator
— they take a different set of positions in the clause, — they have another kind of expression than Manner satellites in certain languages (e.g. Dutch slim genoeg 'clever(ly) enough'), — they have no alternative expression of the form "in an ADJ manner/way", and — they cannot be formed around precisely the same set of adjectives as Manner satellites.
9.1.10. Creating predicational and prepositional terms Recall that extended predication schemas with complement-taking matrix verbs require the insertion of propositional or predicational terms. Such terms are also required for filling "cognitive" satellites-2 of Reason, Condition, and Concession. In order for this to work properly we define "predicational term" and "propositional term" in such a way that they will be inserted, where appropriate, by the regular term insertion procedure: term(L,[[*,sg,*],[[Β,[inanim,masc]]]],[extpred]) :ext_pred(L,B). term(L,[[*,sg,*],[[B,[inanim,masc]]]],[prop,fact]) : prop(L,B). term(L,[[*,sg,*],[[B,[inanim,masc]]]],[prop]) :prop(L.B). Predicational and propositional terms have been given the same syntax as any term, with an extended predication or a proposition in the predicate position B. The term operators only contain "sg", which is important for verb agreement in a sentence such as: (48)
It is believed by John that Mary is intelligent.
The Gender feature "masc" is only needed for French agreement in constructions such as: (49)
II a ete cru (not: crue) par Jean que Marie soit intelligente.
A better approach would probably be to only mark "feminine" in the French grammar, and leave "masculine" to a default rule. This will require some modifications in the system. Note that a predicational term, in which an extended predication acts as the "predicate" B, is defined as being of type [ extpred ]; a propositional term can be either of type [prop] or of type [ p r o p , f a c t ] . It is through these type features that predicational and propositional terms are admitted to the relevant argument and satellite positions by the regular rule of term insertion.
The module UniGen 97
A special problem is created by the fact that in certain conditions the verb in a French predicational/propositional term is expressed in the subjunctive rather than the indicative: (50) a. b.
Marie regrette que Jean soit (subj.) malade. *Marie regrette que Jean est (ind.) malade.
This has been coded in the selection restriction on the relevant argument and satellite positions, which carry the marker "subjonc" (see section 7.1.1. for this type of selection in French matrix verbs, and 9.1.5. for French concessive satellites). The following rules specify how we can get "subjunctive" predicational and propositional terms, fit for insertion into a subjunctive slot. The rules are parallel to those creating other predicational and propositional terms, except that the subjunctive mood marker is added to the Tense specification of the embedded extended predication or proposition: term(L,[[*,sg,*],[[B1,[inanim,masc]]]],[extpred,subjonc]) :ext_pred(L,B), mood_add(B,B1). term(L,[[*,sg,*],[[B1,[inanim,masc]]]],[prop,fact,subjonc]) :prop(L,B), mood_add(B,B1). term(L,[[*,sg,*],[[B1,[inanim.masc]]]],[prop,subjonc]) :prop(L,B), mood_add(B,B1). mood_add(B,B1) : Β = [P3I[[TE|Z]J[SITJA],S2]>S3]> B1= [ P 3 , [ [ [ s u b j o n c , T E ] | Z ] , [ S , T , A ] , S 2 ] , S 3 ] , !. mood_add(B,Bl) : Β = [[TE|Ζ],CORE,S2], B1= [[[subjonc,TE]|Ζ],CORE,S2], !. Thus the Tense TE of the predication/proposition is turned into [ sub Jone, TE]; for example, past --> [subj one, p a s t ] . The complex operator [ s u b j o n c , p a s t ] will now trigger the subjunctive form of the verb in the expression rules. Note that this is only a partial analysis of the complex problem of the subjunctive: we know how to get a subjunctive form when it is needed, but the analysis itself says little about the preliminary question of when the subjunctive is needed. In a more principled analysis of these phenomena, we should like to be able to infer the marker "subjonc" from semantic properties and pragmatic properties of the clause schema, rather than stipulating it as such in the predicate frame of matrix verbs.
98 The universal generator
9.1.11. Building up the clause structure We can now define the clause structure around the proposition: clause_structure(L,CL) : ext_pred(L,PRED), PRED=[P2,1Ρ1,[S,T,A],S1 ] , S 2 ] , CL = [[ILL,R],[P3,PRED,S3], S4], illo_operators([ILL,R]), prop_operators(P3), sat3(L,S3,ILL,T), sat4(L,S4,T). Here the full clause structure is defined on top of the propositional schema through the specification of illocutionary operators P4 = [ILL,R] (where R is an illocutionary referent), and illocutionary satellites S4. The latter are again made sensitive to the "type" of the nucleus. The same is done in the following rule for obtaining questioned clause structures: clause_structure(L,CL) : gamble(que), qext_pred(L,PRED), PRED=[P2,[P1,[S,T,A],S1],S2], CL = [ [ i n t e r r . R ] , [ P 3 , P R E D , S 3 ] , S 4 ] , illo_referent(R), prop_operators(P3), sat3(L,S3,interr,T), sat4(L,S4,T). The illocutionary operators and satellites are defined as follows: illo_operators([ILL,R]) illo(ILL), illo_referent(R). illo(decl). illo(interr) :gamble(que).
:-
The illocutionary operators consist of an illocution and an illocutionary referent. The illocution can be interrogative when the gambling on "que" has been successful. Otherwise it is declarative. Imperative clauses have so far not been defined in the program. illo_referent([ee,N]) Ν is irand(25).
:-
The illocutionary referent consists of a speech act variable "ee", followed by a random integer under 25 (the choice of this ceiling being arbitrary). sat4(L,[*],_). The illocutionary satellite can be empty, or it can be chosen from the lexicon (frankly, ...). Further types of illocutionary satellites have not yet been elaborated.
The module UniGen 99
9.1.12. Creating the fully specified clause The clause structure can now be turned into a fully specified clause through applying the "specification" to the extended predication which is contained in it (see again Figure 1 in chapter 3). fully_specified_clause(L,Xl) :clause_structure(L,X), X=[P4,[P3,EP,S3],S4], specification(L,EP,EP1), X1=[P4,[P3,EP1,S3],S4]. The specification must not only be applied to the clause structure as a whole, but also to all propositional and predicational structures embedded in it. Potentially, these are: [a] propositional and predicational terms embedded in both argument and satellite positions; [b] predicational structures used as restrictors of type 4, underlying relative clauses. Therefore, each term within the clause structure must be inspected for whether it contains a structure which must undergo the specification: s p e c i f i c a t i o n ( L , [ ] , [ ] ) : - I. The specification has no effect on empty predications/propositions. specification(L,EP1,EP3) : do_all_terms(L,tspec,EP1,EP2), clause_specify(L,EP2,EP3). When applied to an extended predication, the specification does termspecification "tspec" on all term positions, and then specification on the whole extended predication. p e r f o r m ( L , t s p e c , [ ] , [ ] ) : - I. p e r f o r m ( L , t s p e c , [ * ] , [ * ] ) : - !. An empty term position [ ] or an empty satellite [ * ] remains untouched by term-specification. perform(L,tspec,[S,T,Q],[S,T,Q]) T= [ [ * , N U M , * ] , [ [ r e l | Ζ ] ] ] , !.
:-
A relative term likewise remains untouched. perform(L,tspec,[S,T,Q],[S,T1,Q]) T=[OPS,[R1,R2,R3,R4]], !, perform(L,tspec,R3,R33), specification^,R4,R44), T1=[0PS,[R1,R2,R33,R44]].
:-
If a term contains a non-empty restrictor-3 (an attributive adpositional term), term-specification is applied to this term (since it may contain a relative clause structure), and when it contains a non-empty restrictor-4, specification is
100 The universal generator
applied to that restrictor. Note that the specification is recursive, because a restrictor-4 may contain terms which are again subject to specification. perform(L,tspec,[S,T,Q],[S,T1,Q]) T ==[ [ * , e g , * ] , [ [ R 1 | Z ] ] 1 , R1 [[ATT,RF],B,S3], I, specification(L,B,B1), R2=[[ATT,RF],B1 .S3], T1=[[*,sg,*],[[R2|Z]]].
:-
p e r f o r m ( L , t s p e c , [ S , T , Q ] , [ S , T 1 ,Q]) : T = [ [ * , s g , * ] , [ [ R 1 | Z ] ] ] , !, s p e c i f i c a t i o n ( L , R 1 ,R2), T1=[[*,sg,*],[[R2|Z]]]. If the term is a propositional or predicational term, the specification must be applied to the extended predication contained in it. Again, the procedure is recursive. perform(L,tspec,T,T). If none of these conditions apply, a term remains unchanged under specification. When all embedded propositional and predicational structures have been specified, specification can be applied to the whole clause: clause_specify(L,R1 ,R7) : subj_obj_assignment(L,R1,R2), verb_agreement(L,R2,R3), anaphora(R3,R4), reflexive(R4,R5), equi(R5,R6), copula_support(L,R6,R7). As shown earlier, the specification consists of a "conveyor belt" on which a number of operations are sequentially applied to the input structure; if the structure does not fulfil the conditions for the operation, it is passed on unchanged to the next operation. The operations in the specification concern such matters as how the State of Affairs as designated by the predication is going to be presented, what types of coreferentiality can or must be established between the terms in the clause structure, and how the clause structure can be prepared for entering the expression component. It may be the case that we have obtained a clause structure from some other source than the generator: from the parser, the translator, or the logical component. If we then want to specify that clause structure, we can use the following rule: specify_clause(L,X,X1) : X= [ P4,[P3,EP,S3],S4], specification(L,EP,EP1), X1=[P4,[P3,EP1,S3],S4].
The module UniGen
101
The separate procedures which together constitute the specification will now be defined in the following sections.
9.1.12.1. Subject and Object assignment sub]_obj_assignment(L,PRED J PRED1) : PRED = [P2,[P1,[S,T,ARG],S1],S2], soassign(L,[S.TjARG],NUC1), PRED1 = [P2,[P1,NUC1,S1],S2]. Subject-Object assignment is done to an extended predication by doing "soassign" to the nucleus of that predication. Languages differ as to their possibilities of Subject-Object assignment. This has been accounted for here by defining a number of potential assignments across languages, and stipulating which of these assignments the three languages allow for. soassign(eng,X,Y) : (sol(X,Y);so2(X,Y);so3(X,Y);so4(X,Y);εο5(Χ,Υ);so6(X,Y), J ) . soassign(dut,X,Y) : (sol(X,Y);so2(X,Y);so3(X,Y);so4(X,Y);so5(X,Y),!). soassign(fre,X,Y) : (S01(X,Y);so2(X,Y);so3(X ) Y);so4(X,Y) > 1). English accepts Subject-Object assignments 1 through 6, Dutch accepts 1 through 5, and French accepts 1 through 4. sol([S,T,ARG],NUC1) : ARG=[AA], AA — [A B,C], NUC1 = [ [ [ a c t ] , S ] , Τ , [ [ A , B , [ s u b j | C ] ] ] ] . Subj-Obj assignment 1 adds the subj function to the semantic function of the argument position of a one-place nucleus: "ARG=[AA]" = 'there is only one argument, AA, in the argument position.' At the same time, the predicate receives the information [ act ] "active". This will later inform the expression rules about voice expression. so2([S,T,ARG],NUC1) : ARG=[AA,EE], AA= [ A, Β, C ], EE=[Aa,Bb,Cc], not(Cc=[pt]), NUC1 = [ [ [ a c t ] , S ] , Τ , [ [ A , B , [ s u b j | C ] ] , E E ] ] . Subj-Obj assignment 2 assigns subj to the first argument of a more-place nucleus if the second argument does not have the semantic function patient. Again, [ act ] is assigned to the predicate.
102 The universal
generator
So3([S,T,ARG],NUC1) :ARG=[AA,EE|Z], AA=[A,Β,C], EE=[Aa,Bb,[pt]], N U C ^ t i f a c t l . S l . T J t A . B . f s u b j |C] ], [Aa,Bb, [obj | [pt] ] ] |Z] ].
Assignment 3 assigns subj to the first, and obj to the second argument of a more-place nucleus with a patient second argument. The marker [ a c t ] is assigned to the predicate. so4([S,T,ARG],NUC1) :ARG=[AA,EEjZ], AA=[X,Y,FUN], EE=[A,B,[pt]], NUC1 = [[[pass],S],Τ,[AA,[A,B,[sub]|[pt]]]|Z]].
Assignment 4 gives subj to the patient argument of a more-place predicate. This time the information [pass] "passive" is transmitted to the predicate. so5([S,T,ARG],NUC1) ARG=[AA,EE,00],
:-
AA=[A,B,C],
00=[Aa,Bb,[rec]], NUC1 = [ [ [ac"t ] >S],Τ,[[A,B,[subj|C]],EE,[Aa,Bb,[obj|[ rec] ] ] ] ].
Assignment 5 assigns subj to the first, and obj to the third argument of a three-place verb with recipient third argument function. The marker [act] is assigned to the predicate. S06([S,T,ARG],NUC1) :ARG=[AA,EE,00], 00=[A,Β,[rec]], NUC1 = [[[pass],S],T,[AA,EE,[A,B,[subj|[rec]]]] ].
Assignment 6 gives subj function to the recipient argument of a three-place nucleus, with concomitant marker [pass] on the predicate. Assignments 1-6 will eventually lead to the following expressions: (51) sol. so2. so3. so4. so5. so6.
John walked. John went to the station. John kicked the boy. John gave the book to the girL The boy was kicked by John. The book was given to the girl by John. John gave the girl the book. The girl was given the book by John.
Note that so6 is ungrammatical in Dutch, while so5 and so6 are ungrammatical in French.
The module UniGen 103
9.1.12.2. Verb agreement
The expression of the verb may be sensitive to Person, Number, and (in French) Gender of the subject term. There are different ways in which such agreement could be captured. One way is to leave it to the expression rules to find the relevant agreement parameters in the clause structure. Another way, implemented here, is to inform the predicate (more particularly, the tense operator on the predication) of the relevant features. The expression rules will then find the relevant information "locally". (a) Agreement in Person, Number, and Gender in French verb_agreement(fre,PRED,PRED1) : - !, PRED = [[TE,P0L,AS,R],[P1,[SS,TY,AA],S1],S2], member(AR,ΑΑ), AR=[_,[[_>NUM)_],[[X,TYY] I J M s u b j |RR]], person_agreement(PERS,X), number_agreement(NUM,NUMM), gender(fre,GEN), member(GEN,TYY), T1 = [TE,NUMM,PERS,GEN], PRED1 = [[T1,P0L,AS,R],[P1,[SS,TY,AA],S1 ],S2]. (b) Agreement in Person and Number in Dutch and English verb_agreement(L,PRED,PRED1) : PRED = [[TE,P0L,AS,R],[P1,[SS,TY,AA],S1],S2], member(AR,AA), AR=[_,[[_,NUM,_],[[X.TYY]|_]],[subj|RR]], person_agreement(PERS,X), number_agreement(NUM,NUMM), T1=[TE,NUMM,PERS,[]], PRED1 = [[T1.POL.AS.R],[P1,[SS,TY,AA],S1 ],S2]. These rules determine the number NUM, the person PERS (and, in the case of French, the Gender GEN) of the subject term, and add these properties to the tense operator of the predication. The Person feature is determined through the following rules, which say: "a first person term is first person, a second person term is second person, and any other term is third person": person_agreement(p1,p1) : - I. person_agreement(p2,p2) : - !. person_agreement(p3,_). The required agreement marker for Number is determined by the following rules: number_agreement(same,sg) 1. number_agreement(N,N).
104 The universal generator
Normally the agreement marker Ν is identical to the Number operator on the subject term. But we may have a so-called "anaphorical term" (see 9.1.7.1. above) in subject position. Such a term provisionally gets singular agreement (see below under "anaphora" for an explanation). The relevant Gender distinctions for French are the following: gender(L,masc). gender(L,fem). gender(L,ambi). Note that the Gender feature is relevant to French in the sense that the expression of a predicative adjective or participle is sensitive to the Gender of the subject. When the structure for "John laughed" is input to the agreement rule, its tense operator past will be turned into: (52)
[paet.eg.pa.M]
This complex of features will then feed the relevant tense expression rule.
9.1.123. Anaphora resolution The treatment of anaphora is rather rudimentary in the present program. The following rule looks at the structure of the extended predication (as so far developed) and checks whether there are two non-identical terms, A and B, such that A has subject function and is not a first or second person pronoun term (which have fixed referential indices), and Β is a "free" anaphorical term in a term position of which the selection matches the type of the subject term. If so, the anaphorical term inherits the Number, the Type, and the referential index from the subject term. If these conditions do not hold, the input predication remains unchanged. The last point implies that there will be "unresolved" anaphorical terms in many underlying structures. I have left this as it is, since an anaphorical term which has no antecedent in the same clause may have one in some preceding clause in the discourse. Resolution of such discourse anaphora is not yet possible within this program. In chapter 10 we will see how such unresolved anaphorical terms are provisionally expressed.
The module UniGen
105
anaphora(P,PP) : Ρ = [P2,[Ρ1,[SS,TY,ARG],S1],S2], member(A,ARG), member(Β,ARG), A = [SEL,[ [ D,NUM,RR],[[S,T]|_]],[subj|RRR]], not(S=pi), not(S=p2), Β = [SELL,[Op,[[ana,TT]]],F], sublist(SELL,T), not(A=B), B1 = [SELL,[[def,NUM,RR],[[p3,T]]],F], subst(B,ARG,B1,ARG1), PP = [P2,[P1,[SS,TY,ARG1],S1],S2], !. anaphora(P,P).
9.1.12.4. Marking reflexive arguments Anaphorical terms, which have inherited the Number and Gender features of a subject antecedent, may (and often will) occur in a position where they must be expressed in reflexive form. In the following rule such anaphorical terms are marked "refl". In the present program the formulation is rudimentary and could have been integrated with "anaphora". In a more developed system, however, I think it is expedient to divide the anaphora problem over different sub-problems: (53) [1] Does a term position ΊΓΡ stand in an anaphorical relation to some other term position inside or outside the clause? If so, [2] is it going to be expressed [2.1] through a zero element, [2.2] through a reflexive pronoun, [2.3] through a weak (clitic) personal pronoun, [2.4] through a strong personal pronoun? The reflexive rule checks the arguments of the nucleus for two non-identical terms A and Β such that A has subject function and Β is a personal pronoun, and Β agrees with A in Number, Gender, and referential index. In that condition it adds "refl" to the person marker pi, p2, or p3 of the pronominal term. This will lead to reflexive expression through the expression rules.
106 The universal generator
reflexive(P,PP) :Ρ = [P2,[Ρ1,[SS,TY,ARG],S1],S2], member(A,ARG), member(Β,ARG), A = [SEL,[[D,NUM,RR],[[S,T]|_]],[sub]|RRR]], Β = [SELL,[[DD,NUM,RR],[[PERS,TT]]],F], not(A=B), perspro(PERS), same_gender(T,TT), B1 = [SELL,[[DD,NUM,RR],[[[refl,PERS],TT]]] ,F], subst(B,ARG,B1,ARG1), PP = [P2,[P1,[SS,TY,ARG1],S1],S2], !. reflexive(P,P). perspro(p1). perspro(p2). perspro(p3). Whether the anaphorical term has the same gender as the antecedent is checked through the procedure "sanierender", which is defined as follows: same_gender(T,TT) : gender(GEN), member(GEN,T), member(GEN,TT), !. same_gender(X,X).
9.1.12.5. Marking equi arguments In certain conditions anaphorical terms must or may be expressed as "zero". This is the case, for example, in: (53)
Johni wanted / / , to go to the city.
This is taken care of by marking the relevant anaphorical term as "equi" through the following rules: equi(P,PP) : Ρ = [P2,[P1,[[V,S],T,[A1,A2]],S1],S2], predvm(L,[S,_,_]), equi_subst([A1,A2],[A1,A22]), PP = [ P 2 , [ P 1 , [ [ V , S ] , T , [ A 1 , A 2 2 ] ] , S 1 ] , S 2 ] ,
!.
An argument pair [A1,A2] must be turned into [A1,A22] (where A22 is marked "equi"), when [A1,A2] are arguments of a higher matrix predicate, and "equi-substitution" holds between the argument pairs.
The module UniGen
107
equi_subst([A1,A2],[A1,A22]) : A1 = [S,[[D,N,R],RES],F], A2 = [SS,[[DD,NN,RR],[[PR,[inanim,masc]]]],FF], PR = [P22,[P11,[SSS,TTT,ARG],S11],S22], member(PRO,ARG), PRO = [S5,[[DDD,N,R],[[PERS|Z]]],[SUbj|ZZ]], perspro(PERS), PR01 = [ S 5 , [ [ D D D , N , R ] , [ [ [ e q u i , P E R S ] | Z ] ] ] , [ s u b j | Z Z ] ] , subst(PRO,ARG,PR01,ARG1), PR1 = [P22,[P11,[SSS,TTT,ARG1],S11],S22], A22 = [SS,[[DD,NN,RR],[[PR1,[inanim,masc]]]],FF]. equi(P,P). Equi-substitution holds between two argument pairs if the second argument A2 is a predicational argument with a pronominal subject PRO which agrees in Number and referential index with the first argument A l . In that case, "equi" is added to the predicate of the pronominal term. If the conditions do not hold, equi leaves the input proposition untouched. In the expression rules, anaphorical terms marked "equi" will be expressed by zero. At the same time, the expression of the embedded predication will be turned into an infinitival construction.
9.1.12.6. Copula support The following rules introduce a copular verb where this is needed for correct expression. copula_support(L,C,C) : C = [P2,[P1,[[V,S],T,[AA|Ζ]],S1 ] , S 2 ] , (bpredv(L,[S,_,_]);bpredvm(L,[S,_,_]); p r e d v d e r ( L , [ S , _ , _ ] ) ) , !. Copula support leaves the input predication untouched when its nuclear predicate is a basic or derived verbal predicate. copula_support(L,C,C1) : C = [P2,[P1,[[V,S],T,[AA|Z]],S1],S2], C1 = [P2,[P1,[[V,[COP,S]],T,[AA|Z]],S1],S2], copula(L,COP). Otherwise, copula support adds the copular verb of the language in question to the (non-verbal) predicate. This applies when the predicate is an adjective (basic or derived), a term predicate, or an adpositional predicate. The form of the copula is defined in the expression component.
108 The universal generator
9.2. Conclusion We have now defined the set of "fully specified clauses" for the three languages. The fully specified clause can now either be input to the expression rules, or be used for other purposes: it could be translated into any of the other languages, or be used as a point of departure for logical inferencing. The next three chapters describe the systems of expression rules.
Chapter 10. ParSel, UniExp and EngExp: universal and English expression 10.0. Introduction The task of the expression component is to map fully specified clauses in L onto one or more sentence forms through which they can be expressed in L. This is done by defining what form and what order the constituents of the underlying clause structure must or may take, given their structural and functional properties within that structure. The expression component is organized in such a way that first almost all the rules which determine the form of constituents are applied, then these constituents are placed in a correct order, and finally (mainly in French expression) a number of low-level formal adjustments are made, which are collectively referred to with the term sandhi. As will be seen presently, many expression rules can be formulated in a language-independent way, or in a way which applies to two out of the three languages treated here. Computationally, the best organization of the expression component would have been to interlace the language-specific and the general rules, so as to profit optimally from rule order differences which would allow for "default" formulations once the more specific cases had been treated. On the other hand, it is interesting to see which rules are of general validity and which are language-specific, and it has certain advantages to be able to replace the English-specific expression module by the corresponding French or Dutch module, the rest of the system remaining the same. After some experimentation I have organized the expression component as follows: (1)
ParSel I I {EngExp / FreExp / Dutexp} I I UniExp
ParSel contains the rules for selecting irregular forms from paradigms in the lexicon. Although the irregular forms are, by definition, fully language-specific, the rules for lifting them out of their paradigms could be given a general formulation, since the irregular forms have been listed in standard order in their paradigms. EngExp, FreExp, and DutExp then contain the expression rules specific to the three languages, and UniExp contains the expression rules common to the
110 Universal and English expression
three languages. In doing English expression we can now load ParSel, EngExp, and UniExp, while for expression in the other languages we only have to change EngExp for FreExp or DutExp. A disadvantage for the reader would be, however, that one procedure, say "Express the Tense of the Verb" would now be divided over the three components ParSel, EngExp/FreExp/DutExp, and UniExp. For expository reasons, therefore, I have in this chapter re-integrated ParSel, EngExp, and UniExp so that all sub-clauses of one procedure are found in the same spot. FreExp and Dutexp will then be discussed in chapters 11 and 12. In those chapters only those points which have required special attention in connection with language-dependent differences will be commented on. The expression of a fully specified clause has been formulated "decompositionally", in the sense that the rules say, for example, that in order to express a fully specified clause of the form: (2)
[A, [ B , [ C , D ] , E ] , [ F , G ] ]
you have to formally express the constituents of the clause, and to order these constituents. In order to formally express the constituents of the clause, you have to formally express A, [B, [C,D] ,E], and [F,G]; in order to formally express constituent [B,[C,D],E] you must express B, express [C,D], and express E, and so on, until you arrive at the "ultimate constituents" for which the expression is immediately given in the expression rules. In the presentation of the expression component, however, we follow a "bottom-up" strategy, first defining how the ultimate constituents can be expressed and then gradually building up the full compositional hierarchy from there. I believe this makes for better understanding than a "top-down" presentation. The highest rule in this module, which makes the expression component run, is the following: go(L) : repeat, ex_clause(L,C,X), adjust_spelling(L,X,X1), counter(Ν), tab(1), write('['), write(N), w r i t e ( ' ] ' ) , nl, writelist(X1), nl, nl, fail. in which "L" is a variable ranging over the language indices "eng", "fre", and "dut". When, for example, the order "go(eng)" is given, the module will produce a sequence of English sentences, each preceded by a running counter [1], [2], [3], etc.
The expression rules 111
10.1. The expression rules 10.1.1. Expressing terms ex_noun(eng,[X,[agent,_]],Y) : derlvational_stem1(eng,X,ST), concat(ST,er,Υ), i . ex_noun(eng,[[N],dim],N1) : c o n c a t ( ' l i t t l e - 1 , N , N 1 ) , I. ex_noun (eng, [ X ], X). The expression rule "ex_noun" is used to give the correct expression to agent nouns derived through predicate formation. The input might be [give, [agent,masc] ]. In order to get the output giver we first compute the "derivational stem-1" giv, and then affix er to this derivational stem. The notion "derivational stem" is defined below, in 10.1.10. The assumption here is that giver can be used for both male and female agent nouns. The second noun-expression rule expresses a diminutive which is not even created for English! The reason is the following: in the translator a Dutch diminutive must be translatable into English. Temporarily, all diminutives translated from Dutch are translated by prefixing little-. Thus, Dutch mannetje, the diminutive of matt 'man', will be translated as little-man, etc. For all other nouns only the brackets are removed. ex_number(L,[pi,A],B) : paradigm(L,[A,B]), !. ex_number(eng,[pl,X], Y) : derivational_stem2(eng,[X],X1), concat(X1,s,Y). ex_number(L,[sg,X],X). We express the plural of a noun A as Β by selecting Β from the paradigm of A, in case there is one. The "cut" in that case prevents the application of the next rule. Because of the uniform coding of paradigms, this rule, although it selects very language-specific forms, can be formulated in a languageindependent way. If there is no paradigm form, the English plural is formed through affixing s to the "derivational stem-2" of the noun. The derivational stem-2 (see 10.1.10. below) of a noun such as city is defined as citie. Thus, the plurals of city, lady, buddy are automatically specified as cities, ladies,
buddies
rather than as *citys etc. Note that this is more a matter of orthography than of phonology. The singular of a noun is formed through leaving the stem unchanged. This rule is, again, identical for the three languages. e x _ d e t ( e n g , [ [ h i g h , i n d e f ] , p l , _ ] , many) : - I. e x _ d e t ( e n g , [ [ l o w , i n d e f ] , p l , _ ] , few) : - !. e x _ d e t ( e n g , [ [ u n i v , d e f ] , p l , _ ] , a l l ) : - !.
112 Universal and English expression
e x _ d e t ( e n g , [ [ u n i v , d e f ] , s g , _ ] , every) : - 1. e x _ d e t ( e n g , [ [ p r o x , d e f j , p i , _ ] , t h e s e ) : - !. e x _ d e t ( e n g , [ [ p r o x , d e f ] , s g , _ j , t h i s ) : - !. e x _ d e t ( e n g , [ [ r e m , d e f ] , p i , _ ] , those) : - 1. e x _ d e t ( e n g , [ [ r e m , d e f ] , s g , _ ] , t h a t ) : - 1. e x _ d e t ( e n g , [ d e f , _ , _ ] , the) : - !. e x _ d e t ( e n g , [ i n d e f , s g , _ ] , a) : - !. e x _ d e t ( e n g , [ i n d e f , p l , _ ] , []) : - 1. The clauses for "ex_det" express the various term operator combinations. Note that the combination "indefinite plural" leads to the empty list [ ]. This is the Prolog equivalent of "zero expression". One point concerning these rules requires comment. Note that the input to every rule ends in an empty position marked by the arbitrary variable "_". The presence of this position here is solely due to the fact that the system is also used for French and Dutch, in which Gender distinctions co-determine determiner expression. In English, these Gender distinctions have no effect. One might say here that we have complicated the English expression component because of the fact that it is used in a multilingual environment. The strategy has here been to give the rules in question the most general form, as in: (3)
The expression of term operators may depend on Number and Gender.
and then to cancel the categories not relevant to a given language through using the arbitrary variable. Thus, though the rules are more complicated than is needed for English, they generalize more easily to other languages. e x _ r e s t r 2 ( L , _ , [ ] , [ ] ) : - !. ex_restr2(eng[A,_],A1) :ex_adj(eng,A,A1). A second restrictor is expressed through expressing the adjectival predicate in that restrictor (which also contains the "type" of the adjective). An empty restrictor 2 remains as it is. Again, these rules contain an empty place for parameters of Definiteness, Number, and Gender which may be relevant to the expression of second restrictors in Dutch and French. ex_adj(eng,[A,pos],B) : paradigm(eng,A,[B|_]), I. ex_adj(eng,[A,pos],A1) : concat(A,er,A1), !. A positive comparative adjective (created through predicate formation) is expressed through lifting the relevant form out of its paradigm — if there is one, as in the case of better)·, otherwise by affixing er to the adjectival stem. The conditions for the choice between er and more have not yet been captured.
The expression rules 113
ex_adj(eng,[A,POL],[P0L1,A]) : ex_pol(eng,POL,POL1), !. ex_pol(eng,pos,more) : - 1. ex_pol(eng,neg,less) : - !. ex_pol(eng,equal,equally) : - I. Otherwise, a comparative of an adjective A is expressed by more A, less A, equally A (however, more cannot be selected by the present rules, awaiting the formulation of conditions for the choice between -er and more). ex_adJ(eng I [prp > V],A) : ex_prp(eng,V,A), I. A derived "participial" adjective is expressed by participializing the underlying verb (see below, section 10.1.5). ex_adj(eng,[deg,DEG,A],[DEG,A]) : - !. A derived adjective such as [deg,very,good] is expressed as [very,good]. Note that the element "deg" only serves to help the expression rules recognize this type of predicate. ex_adj(eng,[A],A). If none of the conditions specified so far apply, only the brackets around the adjectival predicate are removed. ex_fun(L,[subj|R],[subj]). ex_fun(L,[ssubj|R],[subj]). ex_f un(L, [obj |Ft], [obj ] ) . The functions "subject", "ssubject" (where this comes from will be clarified below), and "object" are expressed as "subj" and "obj"; the semantic functions which follow in the Tail R of the function slot remain unexpressed. The retention of "subj" and "obj" is actually a kind of trick in order to let the placement rules do their work properly. Once the relevant constituents have been placed in their proper position, these markers will be left out. ex_fun(eng,[poss], o f ) . ex_fun(eng,[gen], ' * s ' ) . ex_fun(eng,[pt], [ ] ) . ex_fun(eng,[agj, by). ex_fun(eng,[pos], by). ex_fun(eng,[zero], [ ] ) . ex_fun(eng,[rec], t o ) . ex_fun(eng,[ben], f o r ) . ex_fun(eng,[locj, i n ) . ex_fun(eng,[temp], on). ex_fun(eng,[superloc], on). ex_fun(eng,[dir], t o ) . ex_fun(eng,[so], from). e x _ f u n ( e n g , [ i n s t r ] , with). ex_fun(eng,[ass], with). ex_fun(eng,[causee], [ ] ) .
114 Universal and English expression
ex_fun(eng,[rf], []). ex_fun(eng,[stand], than). ex_fun(eng,[eqstand], as). ex_fun(eng,[cond], i f ) . ex_fun(eng,[reason], because). ex_fun(eng,[nomreason], [ b e c a u s e , o f ] ) . ex_fun(eng,[conc], although). ex_fun(eng,[conc], though). ex_fun(eng,[], []). Otherwise (i.e., when no subj or obj function has been assigned) all the other semantic functions get their proper expression, which is sometimes zero [ ]. Note that sometimes different semantic functions are mapped onto the same preposition. Note further that conjunctions such as i f , because, although, though come out as the expressions of semantic functions on predicational/propositional satellites. They are thus, in a sense, "prepositions on predications/propositions". We now have the elements through which simple terms can be expressed. The following rules serve this purpose. ex_term(L,[[def,sg,RF],[[PROPER,T]]],PR0PER1) member(proper,T), ex_noun(L,PROPER,PR0PER1), ! .
:-
A proper term is expressed by applying "ex_noun" on the proper noun contained in that term. Usually the output form is identical to the lexical proper noun. But in the Dutch module, diminutive formation also applies to proper nouns: marie 'Mary', marietje 'little-Mary'. So far, at the level of term structure (in contrast to the level of clause structure) the specification of the form and the specification of the order of constituents have not been separated: they are taken care of in one go. Therefore, in expressing non-proper nominal terms, several cases must be distinguished, because of different ordering possibilities in the three languages. Splitting up the tasks of determining the form and establishing the order might lead to simplification of the rules. The first term expression rule below determines when the possessive restrictor is to be realized as a possessive pronoun, as in his book, their book, etc. Note that in this case the possessor term can only be realized as a premodifier. Further, this is only possible if the whole term is definite, and no determiner is expressed in this case.
The expression rules
115
ex_term(L, [[def,NUM,R], [ [ S t , [ , G E N | _ ] | _ ] , R 2 , [ [ ] , T , [ p o s s ] ] , R 4 ] ] , [Χ,Υ,Ζ,ΖΙ]) : T=[[def,N,RR],[[PP,TY]]], pronoun(PP), ex_poss_pro(L,[PPJN.GENJNUM],X),
ex_noun(L,St,Stl), ex_number(L,[NUM,St1],Z),
ex_restr2(L,[def,NUM,GEN],R2,Y),
!,
ex_emb_prop(L,[[[]|_],R4,_],Z1). In such circumstances we can express a term as [Χ,Υ,Ζ,ΖΙ], where X is the pronominal expression of the possessor, Y expresses a potential adjectival restrictor, Ζ is the result of expressing number on the result of expressing the noun, and Z1 is the expression of a potential relative clause structure (through expressing the embedded proposition, to be defined later). In this rule GEN indicates the (relevant) Gender distinction in the type of the head noun. Note that the expression of the possessive pronoun may be sensitive to the Gender of the head noun (in French, but not in English and Dutch), and that adjective expression may be sensitive to Definiteness, Number, and Gender (in French and Dutch but again, not in English). In the English version of these expression rules the features which do not apply are represented by the anonymous variable "_". ex_poss_pro(L,[PP,N,GEN,_],E) : (L=eng;L=dut), paradigm(L,[PP,N,GEN,[A,B,C,D,Ε|_]]), n o t ( E = [ ] ) , !. In English and Dutch the possessive pronoun is lifted out of the relevant paradigm in the lexicon. The second non-proper term expression rule concerns the case in which there is a non-pronominal possessive restrictor 3, which in English and Dutch may be realized as either a premodifier or a postmodifier of the noun (but not under precisely the same circumstances in the two languages): (4) a. b. (5) a. b.
John's book the book of John the boy's book the book of the boy
— — — —
Jans boek het boek van Jan *de jongen's boek het boek van de jongen
The prenominal expression of such terms is defined by the following rule:
116 Universal and English expression
ex_term(L, [[def,NUM,R],[[St,[_,GEN|_]|_],R2,[[],T,[poss]],R4]], [Χ,Υ,Ζ.21]) : (L=eng;L=dut), T=[Ops,[[Stem,Type]|ZZ]], preposing_parameter(L,XX), sublist(XX,Type), ex_full_term(L,[[],T,[gen]],X), ex_noun(L,St,St1), ex_number(L,[NUM,St1],Z), ex_restr2(L,[def,NUM,GEN],R2,Y), ex_emb_prop(L,[[[]|_],R4,_],Z1). The preposing is conditioned by a "preposing parameter", which is sensitive to the type of the possessive noun. This parameter is defined as follows: preposing_parameter(eng,anim). preposing_parameter(dut,proper). Note that this parameter, when loaded into the above rule, will account for the difference between English and Dutch as specified in (4)-(5). Premodifier expression of non-pronominal possessive terms is not provided with a "cut", since the premodifier may also be realized as a postmodifier. The more general case of term expression for English and Dutch is defined in the following rule: ex_term(L, [[A,C,D],[[X,[_,GEN|_]|_],R2,R3,R4]], [P,CI,R,S,T]) : (L=eng;L=dut), ex_det(L,[A,C,GEN],P), ex_noun(L,X,X1), ex_number(L,[C,X1 ],R), ex_restr2(L,[A,C,GEN],R2,Q), ex_fUll_term(L,R3,S), ex_emb_prop(L,[[[]|_],R4,_],T). Here, the potential restrictor 3 is realized as a postmodifier S: (6) a. b. c. d.
the professor of the girl the book of the lady the man in the garden the lady with the roses
Finally, we need an expression rule for unresolved anaphorical terms: ex_term(L,[[def,same,same],[[ana,[ambi]]]], nn) : - I. The formulation of anaphora in 9.1.12.3. was such that not all occurrences of the "free" anaphorical term will be "resolved" by matching with a possible antecedent. Nevertheless, such an anaphorical term could be (»referential to some term in a preceding sentence:
The expression rules
(7)
117
John greeted some men,. Then the girl greeted themt
Because it does not yet handle sequences of clauses, the present program cannot resolve such cases of cross-clausal anaphora. The "unresolved" anaphorical term is therefore expressed as "nn", awaiting further development of the program.
10.1.2. Full term expression A term always occurs in a term position [Sei.Term, Fun], with selection Sei and function Fun. Sei will go unexpressed, but Fun will be expressed through adpositions and/or case marking. At the next compositional level, then, we have to consider how a "full term" consisting of these elements is expressed. First, we define recursively what it means to say that a whole series of full terms (e.g., the list of arguments in the nucleus, or a list of satellites) is expressed: ex_full_terms(L, [ ] , [ ] ) . ex_full_terms(L l [TT|TA],[TT1|TA1]) ex_f u l l _ t e rm (L, TT, TT1), ex_full_terms(L,TA,TA1).
:-
In order to do full terms expression on a list of "full terms", we express the first full term and then do full terms expression on the rest of the list. ex_full_term(L,[],[]). A zero full term will be expressed by zero. This is needed here since the third restrictor in a term (R3), which is expressed as a full term, may be empty. We first get the expression rules for pronominal "full terms". These apply to personal, interrogative, and relative pronouns: pronoun(P) : member(P,[ρ1,p2,p3,que,rel]). The following rules lift the relevant pronominal forms out of their paradigm: ex_full_term(L, [S,[[D,N,R],[[Ρ,ΤΥ]]],[subj|Z]],[[subj],A]) pronoun(P), paradigm(L,[Ρ,N,GEN,[A|_]]), subllst(GEN,TY), I.
:-
The expression for subject pronouns is to be found in the first position of the relevant pronominal paradigm.
118 Universal and English expression
ex_full_term(L, [S,t[D,N,R],[[Ρ,ΤΥ]]],[obj|Z]],[[obj],B]) pronoun(P), paradigm(L,[P,N,GEN,[A,B|_]]), sublist(GEN,TY), !.
:-
Idem for Object pronouns: these are found in the second paradigm position. ex_full_term(L,[S,[[D,NjR],[[Ρ,TY]]],F],[X,B]) pronoun(Ρ), paradigm(L,[P,N,GEN,[_,_,_,B|J]), not(B=[]), sublist(GEN,TY), ex_fun(L,F,X), !. ex_full_term(L,[3,[[0,Ν,Ρ],[[Ρ,TY]]],F],[X,B]) pronoun(P), paradigm(L,[P,N,GEN,[A,B|_]]), sublist(GEN,TY), ex_fun(L,F,X), I.
:-
:-
The "oblique" form of a pronoun (= the form used after prepositions) is found in the fourth position in the paradigm or, when it is identical to the object form, it can be lifted from second position in the paradigm. ex_full_term(L,[S,[[DD,N,R],[[[ref1,Ρ],TY]]],F], pronoun(P), paradigm(L,[P,N,GEN,[_,_,_,_,_,D|_]]), sublist(GEN,TY), ex_fun(L,F,X), !.
[X,D]) : -
If a pronominal term has been marked "refl" its expression is found in the sixth position of the relevant pronominal paradigm. e x _ f u l l _ t e r m ( L , [ S , [ [ D D , N U M , R ] , [ [ [ e q u i , P ] , T Y ] ] ] , F ] , [ ] ) : - !. If a pronominal term has been marked "equi" it will be expressed by zero, as (8)
John wants [] to go home.
ex_full_term(L, [S,[[DD,N,R],[[Ρ,ΤΥ]]],[gen,subj|RR]],[[subj],C]):pronoun(P), paradigm(L,[P,N,GEN,[_,_,_,_,C|_]]), sublist(GEN,TY), !. When a subject pronoun has been marked "genitive" (which may be the case in nominalization, see below), its expression is found in the fifth position of the relevant pronominal paradigm: (9)
John deplored my kissing Mary.
The expression rules 119
ex_full_term(L,[[],T,[gen]],[TT,F]) (L=eng;L=dut), ex_term(L,T,TT), e x _ f u n ( L , [ g e n ] , F ) , 1.
:-
If a non-pronominal full term contains a "genitive" term, the expression of the genitive function (the element "*s" in English) is added to the expression of the term. ex_full_term(eng,[S,T,[gen,sub]|R]],[[sub]],C,'*s']) ex_term(eng,T,C), I.
:-
When a non-pronominal subject term has been marked "genitive", it is expressed in English with "*s". ex_full_term(L,[[idiom],T,F],[X,T]) ex_fun(L,F,X), 1.
:-
When a term is marked "idiom" in the lexicon it is left as it is, and the function expression is added to it.
10.1 J . Expressing embedded propositions and predications Full terms which have the internal structure of predications or propositions require special expression rules: ex_full_term(L, [[prop,fact|_],[[*,sg,*],[[PR,TY]]],F],[Q,FACT.Y]) ex_emb_p rop(L,PR,Y), ex_fun(L,F,Q), factive_marker(L,FACT). factive_marker(eng,[the,fact,that]).
:-
These clauses express factive embedded propositions in the following way: (10)
John deplored the fact that Mary cheated.
ex_full_term(L,[S,[[*,sg,*],[[PR,TY]]],F],[Q,SUB,Y]) ex_emb_p rop(L,PR,Y), ex f u n ( L , F , Q ) , subl (L,SUB). subl ( e n g , t h a t ) .
:-
These clauses express an embedded proposition or an embedded extended predication along the pattern of: (11)
John deplored that Mary cheated.
ex_full_term(eng, [[fact,prop|_],[[*,sg,*],[[PR,TY]]],F],[Q,Y]) nominalisation(eng,PR,Υ), ex_fun(eng,F,Q).
:-
120 Universal and English expression
In English we can also express an embedded proposition by nominalizing it (this is also true for Dutch, but this has not yet been programmed). nominalisation(eng,PR,Y1) : PR = [ P 3 , [ P 2 , [ P 1 , [ S , T , A R ] , S 1 ] , S 2 ] , S 3 ] , member(SUBJ,AR), SUBJ=[SE,TE,[sub]|R]], SUBJ1 = [ S E , T E , [ g e n . s u b ] | R ] ] , subst(SUBJ,AR,SUBJ1,AR1), Υ = [P3,[P2,[P1,[SjTJARI],S1],S2],S3], ex_nom_clause(eng,Y,Y1). We nominalize a proposition in English by marking the subject term with the marker "gen" ('genitive') and then doing "nominalized clause expression" to the proposition. This will be defined below. This will produce: (12) a. b.
johtt deplored mary's cheating. mary's cheating was deplored by john.
ex_full_term(eng, [[extpred|_],[[*,sg,*],[[PR,TY]]],F],[Q,Y]) PR = [P2,[P1,[S,T,AR],S1],S2], member(SUBJ,AR), SUBJ = [SE,TE,[sub]|R]], SUBJ1 = [SE,TE,[ssub]IR]], subst(SUBJ,AR,SUBJ1,AR1), PR1 = [P2,[P1,[S,T,AR1],S1],S2], I, i n f i n i t i v a l _ e x p r e s s i o n ( e n g , P R l ,Y), ex_fun(eng,F,Q).
:-
In English the complement of a verb like want must be expressed in the form of an "accusativus cum infinitivo", as in (13a); a finite subordinate clause, as in (13b), gives bad results: (13) a. b.
John wanted Mary to cheat. *John wanted that Mary cheated.
This pattern is achieved by marking the subject as "ssubj" (which will yield the object form of the term in question) and doing infinitival expression to the embedded predication. Note that I have provisionally solved this problem in the expression component, without doing any "deeper" Raising operation. This is a matter for further consideration. ex_full_term(L, [[extpred|_],[[*,sg,*],[[PR,TY]]],F],[Q,Y]) (L=dut;L=fre), PR = [ P 2 , [ P I , [ S , T , A R ] , S 1 ] , S 2 ] , member(SUBJ,AR), SUBJ = [SE,TE,[sub]|R]], TE = [OPS,[[[equl,P],ΤΥΡ]]], infinitival_expression(L,PR,Y), ex_fun(L,F,Q), I.
:-
The expression rules 121
In Dutch and French this type of complement only gets infinitival expression when the subject term has been marked "equi". ex_full_term(L, [[extpred|_],[[*,sg,*],[[PR,TY]]],F],[Q,SUB,PRED]):(L=fre;L=dut), SUb1(L,SUB), ex_fun(L,F,Q), ex_emb_prop(L,[[[],_],PR,S3],PRED).
Otherwise, this type of clause gets finite subordinate clause expression in French and Dutch. Compare: (14) a. b. (15) a. b.
Jan wil dat Marie lacht. John wants that Mary laughs *Jan wil Marie (te) lachen. John wants Mary (to) laugh *Jani wil dat Jan, lacht. Jan wil lachen. John wants laugh 'John wants to laugh'
ex_full_term(L,[S,T,F],[X,Y]) ex_term(L,T,Υ), ex_fun(L,F,X).
:-
In all other cases, expressing the full term means expressing the function and expressing the term, in that order.
10.1.4. Expressing satellite terms ex_sats1(L,[M,I,D,B],[Ma,In,Di,Be]) ex_sat1m(L,M,Ma), ex_sat1(L,I,In), ex_sat1 (L,B,Be), ex_sat1(L,D,Di).
: -
We express the series of satellites of level 1 (Manner, Instrument, Direction, Beneficiary) by expressing each of these satellites separately. ex_sat1m(L,[*],[]) :I. ex_sat1m(L,[S,[OPS,[[que,ΤY]]],F],C) :paradigm2(L,[A,B,C|_]), not(C=[]), 1. ex_sat1m(eng,[[idiom],X,[manner]],X) : - 1. ex_sat1m(eng,[[eval],[ADJ],[manner]],[in,a,ADJ,manner]). ex_sat1m(eng,[[eval],[ADJ],[manner]],[in,a,ADJ,way]). ex_sat1m(eng,[[evalj,[good],[manner]], well) :!. ex_sat1m(eng,[[eval],[ADJ],[manner]j, ADV) :concat(ADJ,ly,ADV), I.
122 Universal and English expression
ex_sat1m(L,X,X1) : - 1, βχ_1Ίΐ11_ίβπιι(Ι-,Χ,Χ1). An empty Manner satellite [ * ] is expressed by zero [ ]. A questioned Manner satellite is expressed by lifting how from the relevant paradigm. An idiomatic Manner satellite is expressed in the way it is coded in the predicate frame. Manner satellites based on adjectives can be expressed by the circumlocutions "in an ADJ manner" or "in an ADJ way". Alternatively, such Manner satellites can be expressed by Manner adverbs. If the underlying adjective is good, the adverb is idiosyncratically specified as well; otherwise it is formed by affixing -ly. Thus we get the alternative realizations: (16) a. in a clever manner b. in a clever way c. cleverly e x _ s a t 1 ( L , [ * ] , [ ] ) : - I. ex_sat1(L,X,X1) : ex_full_term(L,X,X1), !. ex_sat1(L,X,X). All other non-empty satellites of level 1 (e.g. with the thingummy, to the city, for the boy with the blue eyes) are expressed through full term expression. If full term expression does not apply to them (as in the case of "lexical satellites"), they are simply expressed as they are specified in the lexicon. This last type of rule can also be used as a check on the expression component: if the earlier rules have been incorrectly formulated, the last rule will output a piece of underlying clause structure. ex_sats2(L,[Loc,Temp,Pol,Cog],[A,B,C,D]) ex_sat21oc(L,Loc, A), ex_sat2temp(L,Temp,Β), ex_sat2pol(L,Pol,C), ex_sat2cog(L,Cog,D).
:-
The satellites of level 2 are expressed by expressing each of the satellites separately. e x _ s a t 2 1 o c ( L , [ * ] , [ ] ) : - !. ex_sat21oc(L,[S,[OPS,[[que,TY]]],F],B) paradigm2(L,[A,B|_]), !. ex_sat21oc(L,X,X1) : ex_full_term(L,X,X1), !. ex_sat21oc(L,Χ,X).
:-
Empty locative satellites are expressed by [ ]. A questioned locative satellite is expressed by choosing where from its paradigm. Other locative satellites are expressed through full term expression (e.g. in the city), or left as they are (if they are lexical satellites).
The expression rules
e x _ s a t 2 t e m p ( L , [ * ] , [ ] ) : - !. ex_sat2temp(L,[S,[OPS,[[que,TY]]],F],A) paradigm2(L,[A|_]), !. ex_sat2temp(L,X,X1) : ex_full_term(L,X,X1), !. ex_sat2temp(L,X,X).
123
:-
The same, mutatis mutandis, for temporal satellites. e x _ s a t 2 p o l ( L , [ * ] , [ ] ) : - !. An empty "polarity satellite" is expressed by zero. Recall that this position is never used in the present grammar. e x _ s a t 2 c o g ( L , [ * ] , [ ] ) : - !. ex_sat2cog(L,[S,[OPS,[[que,TY]]],F],D) : paradigm2(L,[A,B,C,D|_]), !. ex_sat2cog(L,[[prop|_],[OPS,[[PR.TY]]],FUN],[X,Y]) : ex_fun(L,FUN,X), ex_emb_prop(L,PR,Y), !. ex_sat2cog(L,[[extpred|_],[OPS,[[PR,TY]]],FUN], [X,Y]) : ex_fun(L,FUN,X), ex_emb_prop(L,[[[]|_],PR,_],Y), I . ex_sat2cog(L,X,X1) : ex_full_term(L,X,X1), !. ex_sat2cog(L,X,X). Cognitive satellites of level 2 are (in the present program) satellites of Reason, Condition, and Concession. The Reason satellite may either be nominal (e.g. because of the rain) or propositional (e.g. because it was raining). The other cognitive satellites have the internal structure of predications or propositions. The above rules say that empty cognitive satellites are expressed by zero; that a questioned cognitive satellite (of Reason) may be expressed by lifting why from its paradigm; and that predicational/propositional cognitive satellites may be expressed through expressing the function (e.g. by means of because), and expressing the embedded predication or proposition. ex_sat3(L, [ * ] , [ ] ) : - !. ex_sat3(eng,[[eval],[ADJ],[opinion]],ADV) concat(ADJ,ly,ADV),!. ex_sat3(L,X,X).
:-
Satellites of level 3 are expressed through adverbializing the adjective (e.g., cleverly), or they are left as they are in lexical form (e.g., probably). ex_sat4(L,[*],[]) ex_sat4(L,X,X).
: - !.
Satellites of level 4 are expressed in the form they have as lexical satellites. No other types of illocutionary satellites have so far been implemented.
124 Universal and English expression
We have now defined the expression rules for all the term types (arguments and satellites) which may occur in underlying clause structure. We now turn to the expression of the verbal complex.
10.1.5. Expressing the verbal complex The expression of the verbal complex is achieved through a "conveyor belt" which successively defines the effect of different operators on the verbal predicate, in the order: Voice, Progressive, Aspect ( = Perfect), Attitude (= Predictive, Possibility), Polarity, Illocution, and Tense (remember that Tense also carries the agreement features of Number, Person, and Gender). When any of these operators is irrelevant, the input is handed over unchanged to the next partner on the conveyor belt. The order of the operators has been chosen in such a way that, starting from the verbal stem, we can work "from the inside to the outside". The last worker at the conveyor belt, Tense expression, places the finite verb at the Head of the verbal complex. The final output is "flattened" in order to get rid of unnecessary bracketing. ex_verbal_complex(L, [TE1 ,[ILL,AR],POL,ATT,ASP2,ASP1,[VCE,SS],TY],A):ex_voice(L,TEl,[VCE.SS],E), ex_progr(L,[ASP1,VCE,E],E1), ex_aspect(L,[ASP2,VCE,E1,TY],E2), ex_attitude(L,[ATT,E2],E2a), ex_polarity(L,[P0L,E2a],E3), ex_illO(L,[[ILL,AR],E3],E4), ex_tense(L,[TE1,ASP1,E4],E5), flatten(E5,A). Note the following points: — the tense marker TE1 includes the agreement features which have been added to the Tense through verb agreement. — the expression of "Aspect-1" (= Progressive, but in other languages also Perfective/Imperfective) is co-dependent on Voice, at least in Dutch (see chapter 11). — the expression of "Aspect-2" (Perfect) is co-dependent on Voice in Dutch, and co-dependent on the Type of the predicate in Dutch and French (see chapters 11 and 12). — the effect of the illocution on the verbal complex is only relevant to English (do-support), and is in that language co-dependent on the nature of the subject-term; therefore, the arguments "AR" have been taken along as a parameter. — the expression of Tense + agreement features may be co-sensitive to "Aspect-1" in those languages which have different past tense expression for
The expression rules
125
Perfective and Imperfective predications. This parameter is not further used in the present grammar. For the verbal complex within a nominalization we need a reduced version of the conveyor belt in order to produce such output as: (17)
Mary deplored John's not having been kicked by Peter.
This is achieved in the following rule: ex_nom_verbal_complex(L,[POL,ASP2,[VCE.SS],ΤΥ],A) ex_voice(L,_,[VCE,SS],A1), ex_aspect(L,[ASP2,VCE,A1,ΤΥ],A2), ex_nom(L, [nom,A2],A3), ex_nom_polarity(L,[POL,A3],A).
:-
The same applies to the infinitival complex, as in: (18)
John wanted Mary not to have been kissed by Peter.
ex_inf_complex(L,[P0L,ASP2,[VCE,SS],ΤΥ],A) ex_V0lce(L,_,[VCE.SS],A1), ex_aspect(L,[ASP2,VCE,A1 ,ΤΥ],A2), ex_inf(L,A2,A3), ex_nom_polarity(L,[POL,A3],A).
:-
We now consider the individual partners at the conveyor belt: ex_voice(L,_,[[act],[COP,Y]],[C0P,Y1]) copula(L,COP), ex_term(L,Υ,Y1), !. copula(eng,be).
:-
In copular constructions with a term predicate, active voice simply leads to expressing the copula and the term in the predicate. In English, the copula is be. The result of this rule is, e.g.: [be, [a,good,boy] ]. ex_voice(L,_,[[act],[COP,Y]],[C0P.Y1]) copula(L,COP), ex_full_term(L,Y,Yl), I.
:-
Idem for copular constructions with a satellite (or adpositional) predicate. Result: [be, [ I n , t h e , c i t y ] ]. ex_voice(L,[T,N,P,G],[[act],[C0P,[Y,ST]]],[C0P,Y1,Z]) : ex_predadJ(L,[_,N,G],[Y,_],Y1), ex_full_term(L,ST,Z), I. Copular constructions with a comparative predicate, which incorporates the "Standard" term, are expressed by expressing the copula, expressing the predicative adjective, and expressing the Standard term. Result, e.g.: [be,taller,[than,[the,professor]]]. ex_voice(L,[T,N,P,G],[[act],[COP,Y]],[COP,Z]) copula(L,COP), ex_predadj(L,[_,N,G],[Y,_],Z), !.
:-
126 Universal and English expression
In copular constructions with only a predicative adjective we express the copula and the predicative adjective. Result, e.g.: [ b e , c l e v e r ] . The predicative adjective in English is expressed by expressing the adjective (without any modification): ex_predadj(eng,_,[A,_],A1) : ex_adj(eng,A,A1), !. The expression of the predicative adjective in French will turn out to be more complicated, since it has to agree in Number and Gender with the subject term. ex_voice(L,_,[[act],X],X). Active Voice has no effect on verbal predicates. ex_voice(eng,_,[[pass],[X|Z]],[be,Y]) :ex_pap(eng,[X|Z],Y). Passive voice introduces the auxiliary be and turns the verbal predicate into a past participle through past participle expression "ex_pap". ex_pap(L,[X|Z],[A|Z]) : (L=eng;L=dut), paradigm(L,X,[A|_]), not(A=[]), I. The past participle is expressed by lifting it out of the first position of the paradigm for non-finite forms. ex_pap(eng,[X|Z],[Y|Z]) : derivational_stem1(eng,[X],X1), concat(X1,ed,Y). Otherwise, it is expressed by taking the "derivational stem-1" and affixing -ed to that stem. The derivational stem-1 is defined in 10.1.10. below. The output of Voice expression is now handed over to Progressive expression: ex_progr(L,[[],_,[X|Z]],[X|Z]). e x _ p r o g r ( e n g , [ p r o g r , _ , [ X | Z ] ] , [ b e , X 1 |Z]) : ex_prp(eng,[X],X1). If the PI operator is empty ([ ]), this has no effect on the verbal complex. The operator "progr" triggers the introduction of the auxiliary be and turns the first verb in the input list into a present participle. ex_prp(L,[X],C) : paradigm(L,X,[A,B,C|_]), !. ex_prp(eng,[X1 | Z ] , [ Y | Z ] ) : derivational_stem1(eng,[X1],X2), concat(X2,ing,Y). The present participle is either lifted out of the third position of the relevant paradigm, or formed through affixing -ing to the "derivational stem-1".
The expression rules 127
Note that, depending on the prehistory of the input, the rules for progressive expression have such effects as the following (leaving out irrelevant bracketing): (19)
input [be,polite] [walk] [be,kicked]
output [be,being,polite] [be,walking] [be,being,kicked]
In a sense, our description of the verbal complex works like a "push-down" mechanism: all elements already expressed are pushed down into the Tail of the list, and only the Head of the list is available for further processing. The output of Progressive expression is now input to Aspect (Perfect) expression: ex_aspect(L,[[],_,[X|Z],_],[X|Z]). ex_aspect(eng,[perf,_,[X1|Z],_],[have,Y|Z]) ex_pap(eng,[X1],Y).
:-
Empty aspect ([ ]) has no effect. The operator "perf' triggers the introduction of the auxiliary have and turns the initial verb of the input list into a past participle. This has such effects as the following: (20)
input [be,polite] [walk] [be,walking] [be,being,kissed]
output [have,been,polite] [have,walked] [have,been,walking] [have,been,being,kissed]
The next step concerns the expression of attitudinal operators: e x _ a t t i t u d e ( L , [ [ ] , [ X | Z ] ] , [ X | Z ] ) : - !. ex_attitude(eng,[predict,[X|Z]],[will,X|Z]) : - I. e x _ a t t i t u d e ( e n g , [ p o s s , [ X | Z ] j , [may,X|Z]) : - 1. The empty attitudinal operator has no effect. The operator "predict" triggers will, and the operator poss triggers may. Then follows the expression of polarity: e x _ p o l a r i t y ( L , [ [ ] , [ X | Z ] ] , [ X | Z ] ) : - I. e x _ p o l a r i t y ( e n g , [ n e g , [ X ] ] , [ d o , n o t | X ] ) : - !. ex_polarity(eng,[neg,[X|Z]],[X,not|Z]). e x _ p o l a r i t y ( e n g , [ p o s , [ X ] ] , [ d o , i n d e e d | X ] ) : - I. ex_polarity(eng,[pos,[X|Z]],[X,indeed|Z]). Empty polarity (= non-emphatic positive) has no effect. Negation produces [ do, not, X ] when X, at this moment, is still the only element in the input list; otherwise, not is placed after the initial verb in the input list. Consider:
128 Universal and English expression
(21)
input [walk] [be,walking] [have,walked] [be,polite]
output [do,not,walk] [be,not,walking] [have,not,walked] [be,not,polite]
Emphatic positive polarity introduces indeed plus Jo-support in the same way. Note that without indeed some of the output would be correct as well, as in: (22)
John did walk.
In other cases, the finite verb would need to be stressed to get the emphatic positive reading: (23)
John HAS walked.
We now define the effect of the illocution on the verbal complex: ex_illo(eng,[[interr,AR],[X]],[X]) : member(Q,AR), Q = [ _ , [ _ , [ [ q u e , _ ] | _ ] ] , [ s u b j | R R ] ] , !. e x _ i l l o ( e n g , [ [ i n t e r r , A R ] , [ X ] ] , [ d o , X ] ) : - !. ex_illo(eng,[[ILL,AR],[X|Z]],[X|Z]). The illocution influences the verbal complex only in English, not in French and Dutch. The English rule is that we get do-support in all interrogative clauses except when the subject term is questioned, as in: (24)
Who killed the dog?
This case is exempted from do-support by the first clause, which says that interrogative illocution leaves the input unaffected when there is a questioned subject among the arguments. Otherwise, interrogative illocution triggers do. Declarative illocution has no effect. Finally, as a last step in expressing the verbal complex, we come to the rules for Tense expression. Remember that through verb agreement, the Tense has been informed on Person, Number, and (in French) Gender of the subject term. First, we have the rules which select the correct irregular tense forms from the relevant paradigms: ex_tense(L,[[TE,sg,p3,_],ASP,[X|Z]],[A|Z]) paradigml(L,TE,ASP,X,[A|_]), not(A=[]), I. ex_tense(L,[[TE,sg,p1,_],ASP,[X|Z]],[B|Z]) paradigml(L,TE,ASP,X,[A,B|_]), not(B=[]), I . ex_tense(eng,[[TE,_,_,_],ASP,[X|Z]],[C|Z]) paradigml(eng,TE,ASP,X,[A,B,C|_]), n o t ( C = [ ] ) , !. ex_tense(L,[[TE,sg,p2,_],ASP,[X|Z]],[C|Z]) paradigml(L,TE,ASP,X,[A,B,C|_]), n o t ( C = [ ] ) , I.
::::-
The expression rules
129
ex_tense(dut,[[past,sg,_,_],ASP,[X|Z]],[A|Z]) paradigml(dut.past,ASP,X,[A|_]), not(A=[]), I. ex_tense(dut,[[TE,pl,_,_],ASP,[X|Z]],[D|Z]) :paradigml(L,TE,ASP,X,[A,B,C,D|_]), not(D=[]), 1. ex_tense(L,[[TE,pi,p3,_],ASP,[X|Z]],[D|Z]) :paradigml(L,TE,ASP,X,[A,B,C,D|_]), not(D=[]), I. e x _ t e n s e ( d u t , [ [ T E , p l , _ , _ ] , A S P , [ X | Z ] ] , [ E 1 |Z]) paradigml(L,TE,ASP,[A,B,C,D,Ε|_]), not(E=[]), prevocalic_stem(E,EE), concat(EE,en,E1), 1. e x _ t e n s e ( d u t , [ [ p a s t , p l , _ , _ ] , A S P , [ X | Z ] ] , [ A 1 |Z]) : paradigml (L,past,ASP,[A| ] ) , not(A=[]), prevocalic_stem(A,AA), concat(AA,en,Al), I. ex_tense(L,[[TE,pl,p1 , _ ] , A S P , [ X | Z ] ] , [ E | Z ] ) : paradigml(L,TE,ASP,[A,B,C,D,Ε|_]), n o t ( E = [ ] ) , I. ex_tense(L,[[TE,pl,p2,_],ASP,[X|Z]],[F|Z]) :paradigml (L,TE,ASP,X,[A,B,C,D,E,F|_]), n o t ( F = [ ] ) , I. Some language-specific selection rules have been intermingled with the general ones: there are some special rules for Dutch, since the Dutch past tense plural form can often be regularly derived from the past tense singular form. Similarly, there is a rule for English stipulating that for any tense the plural form is identical to the singular form. The other rules simply lift the irregular verbal forms out of their paradigms. If the paradigm selection rules do not apply, the regular tense expression rules take over: ex_tense(eng,[[pres,sg,p3,_],_,[X|Z]],[Y|Z]) sibilant(S), concat(_,S,X), concat(X,es,Y), !. ex_tense(eng,[[pres,sg,p3,_],_,[X|Z]],[Y|Z]) concat(X,s,Y), J. ex_tense(eng,[[pres,_,_,_],_,[Χ|Ζ]],[X|Z]).
:-
:-
The present tense third person singular is formed by affixing -es (if the stem ends in a sibilant: s, z, or ch, as defined below, in 10.1.10), otherwise -s; for other persons and numbers, present tense has no effect. ex_tense(eng,[[past,_,_,_][X|Z]],[Y|Z]) derivational_stem1(eng,[X],X1), concat(X1,ed,Y).
:-
The past tense affixes -ed to the "derivational stem-1" of the stem.
130 Universal and English expression
This ends the expression of the verbal complex for finite verb constructions. For nominalizations and infinitival constructions we need some special rules: ex_nom(L,[nom,[H|T]],[H1 |T]) :ex_prp(L,[H],H1). The nominalized verb is given participial expression. e x _ i n f ( L , [ X | Y ] , [ t o , B | Y ] ) :paradigm(L,X,[A,B|_]), not(B=[]), !. ex_inf(eng,[X|Z],[to,X|Z]). The infinitive is selected from second position in the non-finite paradigm and provided with the marker to. Otherwise, the infinitive equals the stem plus to. ex_nom_polarity(L,[[],[X|Z]],[X|Z]) :- !. ex_nom_polarity(eng,[neg,[X|Z]],[not|[X,Z]]) :- I. ex_nom_polarity(eng,[pos,[X|Z]],[indeed|[X,Z]]) :- I. Polarity inside nominalizations requires special rules in English, since in that condition there is no Jo-support. The polarity items not and indeed are simply added at the beginning of the verbal complex: (25) a. b.
Peter deplored John's not buying the car. Peter deplored John's indeed buying the car.
10.1.6. Formally expressing the clause We formally express a clause CL in L by expressing the verbal complex, expressing all the full terms in argument positions, and expressing all the satellites. formally_ex_clause(L,CL,[ILL,[A,B,C,D,E,F]]) :CL=[[ILL,R],[[ATT,REF],[[TE1,POL,ASP2,R1 ], [ASP1,[SS,TY,AR],S1],S2],S3],S4], ex_verbal_complex(L, [TE1,[ILL,AR],P0L,ATT,ASP2,ASP1,SS,TY],A), ex_full_terms(L,AR,B), ex_sats1(L,S1 ,C), ex_sats2(L,S2,D), ex_sat3(L,S3,E), ex_sat4(L,S4,F). The same must be formulated for embedded propositions, because the formal expression of embedded clauses may differ from that of main clauses:
The expression rules
131
formally_ex_emb_prop(L,CL,[A,B,C,D,E,F]) : CL=[[ATT,REF],[[TE1,POL,ASP,R1 ], [ASP1,[SS,TY,AR],S1 ] , S 2 ] , S 3 ] , ex_verbal_complex(L, [TE1,[decl,AR],POL,ATT,ASP2,ASP1,SS,TY],Α), ex_full_terms(L,AR > B), ex~sats1 (L,S1,C), ex_sats2(L,S2,D), ex_sat3(L,S3,E), ex_sat4(L,S4, F). Likewise, we need a definition of formally expressing a nominalized clause: formally_ex_nom_clause(L,CL,[A,B,C,D,E,F]) : CL=[[ATT,REF],[[TE1,P0L,ASP2,R1 ], [ASP1,[SS,TY,AR],S1],S2],S3], ex_nom_verbal_complex(L,[P0L,ASP2,SS,TY],Α), ex_full_terms(L,AR,B), ex_sats1(L,S1,C), ex_sats2(L,S2,D), ex_sat3(L,S3, Ε), ex_sat4(L,S4, F). And an infinitival construction: formally_ex_inf(L,CL,[A.B.C.D,[],[]]) :CL = [ [TE1,POL,ASP2,R1],[ASP1,[SS,TY,AR],S1],S2], ex_inf_complex(L,[P0L,ASP2,SS,TY], A), βχ_ίυ11_ΐβΓη3(ί,ΑΡ,Β), ex_sats1(L,S1,C), ex_sats2(L,S2,D).
10.1.7. Placement rules After formal expression, the main clause has the following form: (26)
[ILL,[[VfI Vi],Β,C,D,Ε,F]] where: illocutionary operators ILL = Vf = the finite verb the rest of the verbal complex (including non-verbal Vi = predicates) Β = the arguments C = SI satellites D = S2 satellites Ε = S3 satellite F = S4 satellite
132 Universal and English expression
Embedded propositions have the same form, except for the absence of ILL and F; embedded predications also lack E. These output structures of the formal expression rules serve as input to the placement rules. These serve to bring the different constituents of the fully specified clause, after they have been formally expressed, to their relative positions in the final linear sequence of the linguistic expression. Two elements are important here: placement rules work with a schema or "template" consisting of a series of initially empty positions in which constituents of given types may or must be placed.1 Secondly, there is a number of "placement rules" or "placements", each of which takes care of certain types of constituents. The placements cooperate on a conveyor belt, such that each placement hands over its output to the next placement rule. When a given placement rule is not applicable (because the relevant constituent is not present in the input), the input is handed over unchanged. Any position which has not been filled along the line remains empty ([ ]). With some simplification we can illustrate this procedure as follows (A = still to be ordered, P: placement rule, Ο = already ordered): (27)
AO: [ i n t e r r , [ [ h a s , k i s s e d ] , [ J o h n , m a r y ] , [ w h y ] ] ] OO: [ [ ] , [ ] , [ ] , [ ] , [ ] ] PI: placement of Vf and Vi Al: Ol:
[lnterr,[[John,mary],[why]]] [[],has,[],kissed,[],[]]
P2: placement of Q-words A2: [ i n t e r r , [ [ J o h n , m a r y ] ] ] 02: [why.has,!],kissed,[]] P3: placement of subject A3: 03:
[interr,[[mary]]] [why,has,John,kissed,[]]
P4: placement of object A4: [ i n t e r r ] 04: [why,has,]ohn,kissed,mary] At the beginning of the procedure, A is full and Ο is empty, while at the end Ο is full and A is empty (except for the illocution). Note that each placement can be defined in terms of its input and its output : (28)
Pn =
However, the first placement rule needs no information on the input O, since it is always the same empty template, and the last placement rule needs no
The expression rules 133
information on the output A, since it is always a list in which there are no more elements to be placed. The full placement procedure thus consists of a series of separate placement rules, followed by "punctuation" and "sandhi": full_place(L,A,F) : placel(L,A,A1,01), place2(L,A1,01,A2,02), place3(L,A2,02,A3,03), place4(L,A3,03,A4,04), place5 (L,A4,04,A5,05), place6(L,A5,05,A6,06), place7(L,A6,06,A7,07), place8 (L, A7,07, A8,08), place9(L,A8,08,A9,09), placel0(L,A9,09,010), punctuation(L,010,011), flatten(011,012), sandhi_list(L,012,F). Full placement applies to main clauses. "Punctuation" expresses the illocution in a punctuation mark: punctuation([decl,A],[A,'.'])· punctuation([interr,A],[A,'?']). Then, the whole output, which may contain many bracketings and empty lists, is "flattened" into a flat list. To that flat list, sandhi rules may apply. These effect low-level adjustments of the forms which have already been linearly ordered. We start by recursively defining the effect of applying sandhi to a list: sandhi_list( ,[Χ],[X]). sandhi_list(fre,[X,L,Z|R],[New|R1]) on(L,[le,la]), sandhi(fre,X,L,Z,New), I, sandhi_list(fre,R,Rl). sandhi_list(L,[X,Y|R],[X1|R1]) :sandhi(L,X,Y,X1), !, sandhi_list(L,R,Rl). sandhi_list(L,[Y|R],[Y|R1]) :s a n d h i _ l i s t ( L , R , R l ) , !. sandhi_list(L,X,X).
:-
Sandhi applies "across the board" to the output of flattening. In principle, we "sandhi_list" the whole list by applying "sandhi" to the first pair of the list [X,Y] and then doing sandhi_list to the rest of the list. In French, as we shall see in the next chapter, sandhi turns a sequence such as (29a) into (29b): (29) a. b.
[a,le,fils.de,le.professeur] [au,fils,du,professeur]
134 Universal and English expression
The simple method sketched above would yield wrong results, however, on a sequence such as (30a), which would come out as (30b) instead of (30c): (30) a. b. c.
[ä,le,enfant,de,le,ouvrier] *[au,enfant,du,ouvrier] [ä,l'enfant,de,l'ouvrier]
For this reason, the first sandhi rule looks at three rather than two initial elements, and if it finds a second and a third element which qualify for sandhi (such as le and enfant in (30a)), it applies sandhi to these two rather than to the first two elements. The detailed sandhi rules for French will be discussed in chapter 11. In English, sandhi turns e.g. john *s into john's:2 sandhi(eng,X,'*s',X1) :concat(X, 1 " s ' ,X1). Note that we cannot achieve this result by immediately affixing 's onto the stem when "genitive" is expressed, since at that moment we do not know what the last constituent of the term phrase in question is going to be. It is only when placement has applied that we can attach 's to the preceding word, as in: (31) a. b. c. d.
john's house the man's house the friend of my sister's house the man in the garden's house
For the ordering of constituents in embedded clauses we need a separate placement procedure, since first of all there is no final punctuation, and secondly embedded clause ordering may be different from main clause ordering (as is the case in Dutch): emb_place(L,A,F) : placel(L,[emb,A],[emb,A1],01), place2(L,[ILL,A1],01,[ILL,A2],02), place3(L,[decl,A2],02,[decl,A3],03), place4(L,[ILL,A3],03,[ILL,A4],04), place5(L, [emb,A-4] ,04, [embjAS] ,05), place6(L,[emb,A5],05,[emb,A6],06), place7(L,[interr,A6],06,[interr,A7],07), place8(L,[ILL,A7],07,[ILL,A8],08), place9(L,[emb,A8],08,[emb,A9],09), placel0(L,[ILL,A9],09,[ILL, F]). Note that in formulating embedded placement we have exploited various placements which are also used for other purposes. For example, the formulation of placement 5 in terms of "interrogative" means that in embedded clauses the relevant constituents (Pl-constituents such as relative pronouns) are placed in the same way as in interrogative sentences.
The expression rules 135
In formulating the individual placement rules, we use a single, uniform template for all the three languages. Some positions of the template will not be used in some of the languages, and thus remain empty all along for those languages. This slightly complicates the individual grammars in favour of general applicability of the rules. The full template contains 16 potential positions. These 16 positions accomodate the following constituent types: position
constituent
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
satellite-4 (e.g., frankly) clause-initial PI-constituents preverbal subject possible satellite-3 (e.g. probably) clitics in French finite verb postverbal subject possible satellite-3 (rest of) verbal complex object (remaining) arguments (remaining) satellites-1 (remaining) satellites-2 (rest of) verbal complex extraposed complements postposed clausal satellites-2
type
The placement rules are intended to produce only correct orders. They sometimes define alternative possible orders. But they do not yet define all possible orders of constituents. We now consider the individual placements. Note that, in order to make clear what exactly is done by the placement rules, I have here collected all the rules for the three languages. Through placement 1, satellite-4 is placed in position 1 and Vf and Vi are placed in their proper positions (depending on the language and the clause type). placel(L, [emb,[[Vf|V1],B,C,D,E,F]], [emb,[BJCJD.E]],
[F,[],M,[],[],[],[],[]>[],[],[],[],[],[Vi|VT],[],[]]) (L=dut;L=ger), !.
:-
In Dutch embedded clauses the whole verbal complex goes to position 14. The order [Vi|Vf] is not the only possible order, but it is chosen here because it is the least constrained order. Note that this also applies to German ("ger").
136 Universal and English expression
This fact is not relevant in the present context, but I have left it as it stands in order to show how the placement rules can be extended to other languages. placel(L, [ILL, [ [V"f IVi] ,B,C,D,E,F] ], [ILL,[B,C,D,E]], [ F, [ ] , [ ] , [ ] , [ ] , V f , [ ] , [ ] , [ ] , [ ] , [ ] , [ ] , ( ] , V 1 , [ ] , [ ] ] ) (L=dut;L=ger), I.
:-
In Dutch (and again, German) main clauses the Vf is placed in position 6 (positions 4 and 5 are not used in Dutch placement rules), and Vi goes to position 14. placel(L, [ILL,[[Vf|V1],B,C,D,E,F]], [ ILL, [ B,C,D,E]], [ F , [ ] , [ ] , [ ] , [ ] , V f , [ ] , [ ] , Vi, [ ] , [ ] , [ ] , [ ] , [ Μ Μ Π ) ι. In English and French (main and subordinate) clauses the Vf is placed in position 6, and the Vi in position 9. Placement 2 assigns a position to Pl-constituents, and at the same time places satellites-1 (here represented by "C") in position 12. place2(L, [ILL,[B,C,D,E]], [F,[],[],[],[],Vf,[],[],Vi,[],[],[],[],V2,[],[]], [ILL,[B1,D,E]], [F,TER,[],t],[],VT,[],[],Vi,[],[],C,[],V2,[],[]]) :member(AR,Β), AR=[X,[**,TER]], (X=[subJ],I;X=[ob]], I ) , subst(AR,B,[],B1). Placement 2 first checks whether among the arguments Β there is a term AR which is marked by "**" and by [subJ ] or [ob] ]. If so, it places the term in position 2 ("Pl-position" in terms of Functional Grammar), the initial position of the clause proper. Note that all constituents which must go to PI have been marked "**" by the formal expression rules. This symbol is used to retrieve these constituents, and then left unexpressed. place2(L, [ILL,[B,C,D,E]], [F>l],[l>[],[])Vf,[],[],Vi,[],[],[],[],V2,[],[]l, [ILL,[B1,D,E]], [ F , [ X | T E R ] , [ ] , [ ] , [ ] , V f , [ ] , [ ] , V i , [ ] , [ ] , C , [ ] ,V2, [ ] , [ ] ] ) member(AR,Β), AR=[X,[**,TER]], subst(AR,B,[],B1), !.
:-
It may also be the case that among the arguments Β there is a term AR marked "**", but not [ sub] ] or [ob] ]. This will then always be a term with
The expression rules
137
a preposition X, for example, [ t o , [ **,whom] ]. In such a case the combination of preposition + term is placed in position 2. place2(L, [ILL,[B,C,D,E]], [F,[],[],[],[],Vf,[],[],Vi,[],[],[],[],V2,[],[]], [ILL,[B,D,E]], [F,[X,TER],[],[],[],Vf,[],[],Vi,[],[],C1,[],V2,[],[]]) :member(AR,C), AR=[X,[**,TER]], subst(AR,C,[],C1), I. There may be a similar term among the satellites-1, e.g. [ f o r , [**,whom] ]. If that is the case then only the rest of the satellites-1 (CI) are placed in 12. place2(L, [ILL,[B,C,D,E]], [F,[],[],[],[],Vf,[],[],Vi,[],[],[],[],V2,[],[]], [ILL,[B,D1,E]], [F,[X,TER],[],[],[],Vf,[],[],Vi,[],[],C,[],V2,[],[]]) member(AR,D), AR=[X,[**,TER]], subst(AR,D,[],D1), !.
:-
The same, mutatis mutandis, for satellites-2. place2(L, [ILL,[B,C,D,E]], [F,[],[],[],[],Vf,[],[],Vi,[],[],[],[],V2,[],[]], [ILL,[B,D,E]], [F,[],[],[],[],Vf,[],[],Vi,[],[],C,[],V2,[],[]]). If there are no PI-constituents, only the satellites-1 are placed in position 12. In French, if after placement 2 position 2 is still empty and the clause is interrogative (and must therefore be a Yes-No interrogative), the expression est-ce que is placed in PI position by placement 3: place3(fre, [interr,[B,D,E]], [F,[],[],[],[],Vf,[],[],Vi,[],[],C,[],V2,[],[]], [interr,[B,D,E]], [F, [ 'est-ce* , q u e ] , [ ] , [ ] , [ ] , V f , [ ] , [ ] , V i , [ ] , [ ] , C , [ ] , V 2 , [ ] , [ ] ] ) : - ! . place3(L,X,0,X,0). If these conditions do not hold, placement 3 has no effect. Placement 4 now looks at satellites of level 3 (e.g., probably, cleverly): place4(L,[ILL,[B,D,[]]],0,[ILL,[B,D]],0)
: - I.
If there are no such constituents (E = [ ]), placement 4 has no effect. place4(L, [ILL,[B,D,E]], [F,[],[],[],[],Vf,[],[],Vi,[],[],C,[],V2,[],[]], [ILL,[B,D]], [F,Ε,[],[],[],Vf,[],[],Vi,[],[],C,[],V2,[],[]]).
138 Universal and English expression
If at this point position 2 is still empty ([ ]), the satellite-3 may be placed in that position. place4(eng, [ILL,[B,D,E] ], [F,P1,[],[],[],Vf,[],[],Vi,[],[],C,[],V2,[],[]], [ILL,[B,D]], [F,P1,[]>E,[]>Vf,[],[],Vi>[],[]JC,[],V2,[],[]]) : ( V i = [ ] ; V i = [ i n d e e d | J , ! ; V i = [ n o t | _ ] ) , !. In English, when there is no Vi or when the Vi contains indeed or not, the satellite-3 is placed in position 4. place4(L, [ILL,[B,D,E]], [F,P1,[],[],[],Vf,[],[],Vi,[],[],C,[],V2,[ ],[]], [ILL,[B f D]], [F,P1,[],[],[],Vf,[],E,Vi,[],[],C,[],V2,[],[]]). Or it can be placed in position 8. Placement 4 accounts for such facts as the following: (33) a. b. (34) a. b. c. (35) a. b.
Probably John walks. Probably John does not walk. John probably walks. John probably does not walk. John probably does indeed walk. John has probably walked. John will probably walk.
Note that in (33) the satellite-3 is in PI position; this can always be done, if PI is not occupied by some other constituent. In (34) probably precedes the Vf (= is placed in position 4), whereas in (35) it follows the finite verb (= is placed in position 8). Placement 5 is (so far) only used in the Danish version of ProfGlot, and thus not relevant to the three languages discussed here. I simply give the "identity" version: places(L,[ILL,[B,[Lo,Te,[],Co]]],00, [ILL,[B,[Lo,Te,Co]]],00) :- !. Placement 6 now takes care of locative, temporal, and "cognitive" satellites of level 2. place6(L, [ILL,[B.fLo.Te,[]]]], [F,P1,[],E,N,Vf,[],E1,Vi,[],[],C,[],V2,[],[]], [ILL,Β], [F,P1,[],E,N,VfJ[],E1,Vi,[],[],C,[Lo,Te]IV2,[],[]])
:- !.
The expression rules 139
If there is no cognitive satellite in the input of placement 6, the only effect this rule has is to place the locative and temporal satellites (Lo and Te) in position 13. place6(L, [emb,[Β,[Lo,Te,Co] ] ], [F,P1,[],Ε,Ν,Vf,[],E1,Vi,[],[],C,[],V2,[],[]], [emb,B], [ F . P l . U . E . N . V f . U . E I , V i , [ ] , [ ] , C > [ L 0 , T e ] > V 2 l [ ] , C o ] ) : - I. If there is an overt cognitive satellite in an embedded domain, it is obligatorily placed in position 16. In the present version of placement 6 this applies both to "nominal" and to "verbal" cognitive satellites, as in (36a) and (36b), respectively: (36) a. b.
Peter deplored that John went to New York because of Mary. Peter deplored that John went to New York because Mary wanted him to come.
place6(L, [ ILL, [ Β, [ Lo, Te, Co ] ] ], [F,[]J[])E)N)VfJ[],E1,Vi,[])[],C,[],V2,[],[]], [ILL,Β], [F,Co,[],Ε,Ν,Vf,[],Ε1,Vi,[],[],C,[Lo,Te],V2,[],[]]). In a main clause, if PI-position is still open, the cognitive satellite can be placed in PI (position 2): (37) a. b.
Because of Mary John went to New York. Because Mary wanted him to come John went to New York.
place6(L, [ILL,[B,[Lo,Te,Co]]], [F,P1,[],E,N,Vf,[],E1,Vi,[],[],C,[],V2,[],[]], [ILL,B], [ F, P1, [ ], Ε, N, Vf, [ ], E1, Vi, [ ], [ ], C, [ Lo, Te ], V2, [ ], Co ]) : - !. Alternatively, the cognitive satellite can, in that condition too, be placed in position 16. Placement 7 now treats subordinate clauses with subject function. These must be extraposed to position 15 in interrogative clauses, while at the same time the "expletive" element of the language in question is substituted for the subject in the arguments Β (later to be placed in subject position): (38)
Was it deplored by John that Peter kissed Mary?
The rule says: if, among the arguments B, there is a subject term starting with a subordinator-1 (e.g. that), substitute the expletive element for the subject in B, and place the subject clause itself in extraposition site 15. This is obligatory in interrogative clauses:
140 Universal and English expression
place7(L, [interr,B], [ Ρ ι P1 > Π j Ε , Ν, V f , [ ] , E 1 , V i , [ ] , [ ] , C , D, V 2 , [ ] , C o ] , [interr,B1], [F,P1,[]5E,N,Vf,1],E1,Vi,[],[],C,D,V2,[SUB1|Z],Co]) member(AR,B), AR=[[subj],SUB1|Z], subl (L,SUB1), subst(AR,B,[[subj]|EXPL],B1), expl(L,EXPL), !.
:-
Note that the expletive remains in Β to be placed in Subject position later. In declaratives, extraposition is optional: (39) a. b.
That Peter kissed Mary was deplored by John. It was deplored by John that Peter kissed Mary.
This is accounted for by the following rule: place7(L, [ILL.B], [F,P1,[],E,N,Vf,[],E1,Vi,[],f],C,D,V2,[],C0], [ ILL,B1], [F,P1,[],E,N,Vf,[],E1,Vi,[],[],C,D,V2,[SUB1|Z],Co]) member(AR,B), AR=[[sub]],SUB1|Z], subl(L,SUB1), subst(AR,B,[[sub]]|EXPL],B1), expl(L,EXPL).
:-
In a non-interrogative clause, the extraposition is optional (no "cut"). This means that the clausal Subject, if not extraposed, may remain in B, to be placed in Subject position later. place7(L,X,0,X,0).
If there is no clausal Subject, the input remains unchanged. expl(eng,it).
The English expletive is it. Placement 8 likewise places Object complements in position 15. In this case, no expletive is used, and the rule applies obligatorily to all clause types: place8(L, [ILL,B], [F,P1,[],E,N>Vf,[],E1,Vi,[],[],C,DJV2>[]>C0]l [ ILL,B1], [F,P1,[]>E,N>Vf,[],E1,Vi,[],[],C,DJV2,[SUB1|Z],Co]) memberfAR,Β), AR=[MARK,SUB1|Z], SUb1(L,SUB1), (MARK=[obj];MARK=[]), subst(AR,B,[],B1), 1.
:-
The expression rules
141
This rule implies, for example, that if it is yesterday that John deplored that Mary left, we will get (40a) rather than (40b): (40) a. John deplored yesterday that Mary left. b. John deplored that Mary left yesterday. place8(L,X,0 l X,0). If there is no clausal Object, the input remains unchanged. Placement 9 now carries the Subject from Β to the available Subject position(s). In French interrogatives the Subject is placed as in declaratives: place9(fre,[interr,X],08,[interr,X1],09) :p l a c e 9 ( f r e , [ d e c l , X ] , 0 8 , [ d e c l , X 1 ] , 0 9 ) , 1. In English and Dutch interrogatives, the Subject is placed in position 7, after the Vf: place9(L, [interr,B], [F,P1,[],E,N,Vf,[],E1,Vi ) [],[],C,D,V2,Ex,Co], [ILL,B1], [F,P1,[],E,N,Vf,TER,E1,Vi,[],n,C,D,V2,Ex,Co]) : member(SUBJ,Β), SUBJ=[[subj]|TER], subst(SUBJ,B,[],B1), !. In Dutch declaratives, if PI is still empty, the Subject is placed in PI. Otherwise, the Dutch Subject is placed in position 7, after the Vf: place9(dut, [decl,B], [F,[],[],Ε,N,Vf,[],E1,Vi,[],[],C,D,V2,Ex,Co], [decl,B1], [F,TER,[],Ε,Ν,Vf,[],Ε1,Vi,[],[],C,D,V2,Εχ,Co]) : member(SUBJ,Β), SUBJ=[[subj]|TER], subst(SUBJ,Β,[],B1), !. place9(dut, [decl,B], [F,P1 J [],E ) N,Vf,[],E1,Vi,[],[],C,D l V2,Ex,C0], [decl,B1], [F,P1,[],E,N > Vf,TER,E1,Vi,[],[],C,D,V2,Ex,Co]) : member(SUBJ,Β), SUBJ=[[subj]|TER], subst(SUBJ,B,[],B1), !. In English and French declaratives the Subject is placed in position 3: place9(L, [ILL,B], [F,P1,[],E,N > Vf,[],E1,Vi,[],[],C,D,V2,Ex,Co], [ILL,B1], [F,P1,TER,E,N,Vf,[],E1 l Vi,[],[],C,D,V2,Ex,Co]) : member(SUBJ,Β), SUBJ=[[subj]|TER], subst(SUBJ,B,[],B1), !.
142 Universal and English expression
If none of these conditions apply, the input remains unchanged: place9(L,X,0,X,0). Placement 10, finally, takes care of Object terms and other arguments remaining in B. place10(fre, [ILL.B], [F,P1 >"r,E,N,Vf ,T1 , E1 »Vi, [ ], [ ] ,C,D,V2,Ex,Co], [ILL, [ F,P1, Τ, Ε, TE, Vf, T1,E1,Vi,[],B1,C,0,V2,Ex,Co]]) :member(OB,B),
0B=[[obj],[[*,TE] l o a n , subst(OB,B,QQ,B1), 1. In French, if there is a clitic Object (marked "*"), it is placed in preverbal position 5. The remaining arguments are placed in position 11. At the same time, the illocution is added to the final output, so as to trigger the punctuation. place10(L,
[ILL,Β], [F)P1,T,E)N,Vf,T1,E1,Vi,[],[],C,D,V2,Ex,Co], [ILL,[F,P1,T,E,N,Vf,T1,E1,Vi,TE,B1,C,D,V2,Ex,Co]]) member(OB,B), 0B=[[Obj]|TE], subst(ΟΒ,Β,[],B1), 1.
:-
Otherwise, Objects are placed in position 10, and the rest of the arguments in position 11. place10(dut, [ILL.B], [F,P1,T,E,N,Vf,T1,E1,Vi,[],[],C,D,V2,Ex,Co], [ ILL, [ F, P1, Τ, Ε, Ν, Vf, T1, E1, Vi, [ ], B, C, D, V2, Ex, Co] ]) :- I.
If there is no Object term in B, the remaining arguments in Β are simply placed in position 11. place10(L, [ILL.B], [F > P1,T,E,N,Vf,T1 J E1,Vi,[],[],C > D,V2,Ex,Co], [ILL, [ F, P1, Τ, Ε, N, Vf, T1,E1,Vi,[],B1,C,D,V2,Ex,Co]]) :reverse(B,B1), 1.
In English and French, these remaining arguments are reversed and then placed in position 11. The reason for this difference is that the most natural order in constructions of the following type is different in Dutch than it is in English and French: (41) a. b. c.
This car has been bought from Peter (Source) by John (Agent). Cette voiture a ete achete de Pierre (So) par Jean (Ag). Deze auto is door Jan (Ag) van Piet (So) gekocht.
The expression rules 143
If none of these conditions apply, only the illocution is added to the final output: place10(L,[ILL,B],0,[ILL,0]). Note that, apart from the reversal in English and French, remaining arguments in Β and remaining satellites in C and D keep the positions which were already assigned to them by the rules creating the underlying clause structure. In many cases this results in acceptable orders, but sometimes the resulting order is not quite what it should be. Further measures will have to be taken to prevent this. The underlying idea, however, is that if nothing special happens to a constituent (subject or object placement, PI placement, extraposition), it ends up by default in a position which directly reflects its position in underlying clause structure.
10.1.8. Full clause expression We can now define what it means to express a fully specified clause. If CL is a fully specified clause we express it by first doing formal expression and then doing full placement: ex_clause(L,CL,F) : fully_specified_clause(L,CL), •formally_ex_c'lause(L,CL,A), full_place(L,A,F). It may also be the case that we already have a fully specified clause (e.g., as output of the parser, the logical component, or the translator). For expressing such clauses we define: ex2_clause(L,CL,F) : formally_ex_clause(L,CL,A), full_place(L,A,F). For the expression of embedded propositions, nominalizations, and infinitival constructions, we have the following rules: ex_emb_prop(L,[],[]). ex_emb_prop(L,[_,[],_],[])• If the relevant positions are empty, they are expressed by zero. ex_emb_prop(L,CL, F) : formally_ex_emb_prop(L,CL,A), emb_place(L,A,F). Otherwise, an embedded proposition is expressed by formally expressing it, and then doing embedded placement on the output.
144 Universal and English expression
ex_nom_clause(L,CL,F) : formally_ex_nom_clause(L,CL,A), emb_place(L,A,F), 1. Idem for nominalized embedded constructions. irTfinitival_expression(L,PR,Y) formally_ex_inf(L,PR,A), emb_place(L,A,Y), 1.
:-
Idem for infinitival expression of embedded constructions.
10.1.9. Go! The following instruction can be used to start the generator: go(L) : repeat, ex_clause(L,C,X), adjust_spelling(L,X,X1), counter(N), t a b ( 1 ) , w r i t e ( ' [ ' ) , write(Ν), w r i t e r ] ' ) , writelist(X1), nl, nl, f a i l .
nl,
"go(L)'\ that is, "go(eng)", go(fre)", or "go(dut)", forms the highest clauses in the Generator. This instruction causes the system to create a fully specified clause, express it, write the expression on the screen with a [N] integer as identifier, and continue repeating this procedure until no more sentences can be formed. ad j ust_spelling(L,Χ,X) : (L=eng;L=fre), !. "Adjust spelling" has no effect in English and French. Its application to Dutch will be specified in the Dutch expression component.
10.1.10. Pseudo-phonology In this section I have collected some facts and rules of a phonological/orthographical nature, relevant for capturing certain regularities of expression in general and in the English module. Because of their nonprincipled character, these clauses are referred to as "pseudo-phonology". They hold no claim about the proper treatment of the relevant phenomena. What they do show is that phonological rules can in principle be formulated in Prolog. vowel(V) : member(V, [ β , ε , Ι , ο , υ , ά , ά , έ ί , έ , έ , έ ] ) .
The expression rules 145
V is a vowel if V is a member of the list [ a, e, 1 , . . . ]. In the same way, the following qualities are assigned: consonant(C) : member ( C , [ b , c , d , f , g , h , J, Κ , Ι , π ι , η , ρ , ς , Γ , β , ί , ν , ν ν , χ , ζ ] ) . liquid(C) : member(C,[1,r]). nasal(C) : member(C,[n,m]). sonant(G) : mentber(G,[n,m,1,r]). voiceless(C) : member(C,[p,t,k,f,s,h]). obstruent(C) : member(C,[p,t,k,f,s,h,b,d,g]). sibilant(S) :member(S,[s,z,ch]). The following rules serve to define the different "derivational stems": d e r i v a t i o n a l _ s t e m 1 ( e n g , [ h i t ] , h i t t ) : - I. derivational_stem1(eng,[A],A1) : concat(A1,e,A), !. derivational_stem1(eng,[X],X). The notion "derivational stem-1" is needed in English for getting the stem into shape when it occurs before vowels. Without measures taken, we get expressions such as: (42) a. b.
hit give
*hiting, *hited *giveing
The derivational stem-1 of hit has been stipulated as hitt (for this, a rule must be formulated later on). Otherwise, if the stem ends in e, this element is chopped off. derivational_stem2(eng,[A],A1) : concat(X,y,A), consonant(C), concat(_,C,X), c o n c a t ( X , i e , A l ) , !. derivational_stem2(eng,[X],X). "Derivational stem-2" is used in English to account for the relations between singular and plural nouns as in: (43)
lady city
ladies cities
Derivational stem-2 turns lady into ladie before plural -s is affixed. Note that this is a matter of orthography rather than phonology.
Chapter 11. FreExp: French expression
11.0. Introduction In combination with ParSel and UniExp, FreExp does the same work for French as EngExp does for English. Remember that in the actual program the ordering is ParSel — FreExp — UniExp. There are therefore "universal" rules which precede, and universal rules which follow the French expression rules. These universal rules are contained in the preceding chapter and are not repeated here. In the description of the French-specific expression rules in this chapter I will only comment on those points which might not be immediately clear for someone who has carefully read the preceding chapter.
11.1. The module FreExp ex_noun(fre, [ [V], [agent,masc] ] ,X) : concat(V,eur,X), ]. ex_noun(fre,[[V],[agent,fem]],X) :concat(V,euse,X), !. Unlike English, French has separate expression for male and female agent nouns. ex_noun(fre,[[Ν],dim],N) : - !. At the moment a Dutch diminutive noun is translated into a French nondiminutive. ex_noun(fre,[X],X). e x _ n u m b e r ( f r e , [ p i , X ] , Y) : concat(X,s,Y). e x _ d e t ( f r e , [ [ h i g h , i n d e f ] , p l , _ ] , [ [ b e a u c o u p , d e ] , [ ] ] ) : - !. e x _ d e t ( f r e , [ [ l o w , i n d e f ] , p l , _ ] , [ [ p e u , d e ] , [ ] ] ) : - !. e x _ d e t ( f r e , [ [ u n i v , d e f ] , p i , m a s c ] , [ [ t o u s , l e s j , [ ] ] ) : - !. e x _ d e t ( f r e , [ [ u n i v , d e f ] , p i , f e r n ] , [ [ t o u t e s , l e s ] , [ j ] ) : - !. e x _ d e t ( f r e , [ [ u n i v , d e f ] , s g , _ ] , [ c h e q u e , [ ] ] ) : - !. e x _ d e t ( f r e , [ [ p r o x , d e f ] , p l , _ ] , [ c e s , [ ] ] ) : - !. e x _ d e t ( f r e , [ [ p r o x , d e f ] , s g , m a s c ] , [ c e , [ ] ] ) : - I. e x _ d e t ( f r e , [ [ p r o x , d e f j , s g , f e m ] , [ c e t t e , [ ] ] ) : - !. e x _ d e t ( f r e , [ [ r e m , d e f ] , p l , _ ] , [ c e s , ' - l ä ' ] ) : - !. e x _ d e t ( f r e , [ [ r e m , d e f j , s g , m a s c ] , [ c e . ' - l ä 1 ] ) : - ]. e x _ d e t ( f r e , [ [ r e m , d e f j , s g , f e m ] , [ c e t t e , ' - l ä * ] ) : - I. e x _ d e t ( f r e , [ d e f , s g , m a s c ] , [ l e , [ j ] ) : - I. e x _ d e t ( f r e , [ d e f , s g , f e m ] , [ l a , [ ] ] ) : - I. e x _ d e t ( f r e , [ d e f , p l , _ ] , [ l e s , [ ] ] ) : - !. e x _ d e t ( f r e , [ i n d e f , s g , m a s c ] , [ u n , [ ] ] ) : - I. e x _ d e t ( f r e , [ i n d e f , s g , f e m ] , [ u n e , [ ] ] ) : - !. e x _ d e t ( f r e , [ i n d e f , p i , _ ] , [ d e s , [ ] ] ) : - !.
148 French expression
In French the Gender of the head noun may co-determine the expression of the determiner. Note that one possible expression of determiners is the discontinuous ce...lä, as in ce garqon-lä, cette dame-la, etc. The two elements of such realizations of the demonstrative are introduced here; they will be placed around the noun in the rule for expressing terms. This is also the reason why all the expressions without -la have an empty second position. This can probably be formulated more elegantly. ex_predadj(fre,[_,NUM,GEN],[A,_],A1) : ex_adj(fre,[NUM,GEN],A,A1). ex_restr2(fre,[_,NUM,GEN],[A,_],A1) : ex_adj(fre,[NUM.GEN],A,A1). ex_adj(fre,[NUM.GEN],[A,pos],A1) : paradigm(L,A,[B|_]), ex_adj(fre,[NUM.GEN].[B],A1). I. ex_adj(fre,[NUM.GEN],[A,POL],[P0L1,A1 ]) : ex_pol(fre,POL,P0L1), ex_adj(fre,[NUM.GEN],[A],A1), ! . ex_pol(fre,pos,plus) : - !. e x _ p o l ( f r e , n e g , m o i n s ) : - 1. e x _ p o l ( f r e , e q u a l , a u s s i ) : - I. ex_ad](fre,[NUM.GEN],[prp,[V]],Al) : ex_prp(fre,[V],[V1]), ex_ad](fre,[NUM.GEN],[V1],A1), 1. ex_adj(fre,[NUM.GEN],[deg,DEG,A],[DEG,A1 ]) : ex_adj(fre,[NUM.GEN],[A],A1), I . ex_ad](fre,[NUM.GEN],[A],A1) : ex_adjl(fre.GEN.A.AA), ex_adJ2(fre,NUM,AA,A1). ex_ad]1(fre,fem,A,A1) : paradigm(fre,[A,[_,_,AA]]), concat(AA,e,A1), I . ex_adj1(fre,fem,A,A1) : c o n c a t ( A , e , A 1 ) , 1. ex_adjl(fre,_,A,A). ex_adj2(fre,pl,A,A1) : concat(A,s,A1), !. ex_adJ2(fre,_,A,A). Adjective expression in French is sensitive to Number and Gender in both attributive and predicative position. Number and Gender can easily be distinguished in the form (at least the written form) of the adjective, as in: (1)
stem: masc sg: fem sg: masc pi: fem pi:
intelligent intelligent intelligent-e intelligent-s intelligent-e-s
The module FreExp 149
For this reason ex_adj has been split up into ex_adJ1, which takes care of the Gender effect, and ex_adj2 (which works on the output of ex_adjl), which takes care of the Number effect. Note that forms taken from paradigms are also inflected for Gender and Number: it is the stem rather than the whole word which is irregular in such cases. For example, we must know that the positive comparative stem of bort is meilleur. Once we know this, the latter form can be regularly inflected. Note, finally, that attributive (derived) participles are likewise inflected for Gender and Number. ex_fun(fre,[poss], de). ex_fun(fre,[pt|_], []). ex_fun(fre,[agj, par). ex_fun(fre,[pos], par). ex_fun(fre,[zero], []). e x _ f u n ( f r e , [ r e c ] , ä). e x _ f u n ( f r e , [ b e n ] , pour). e x _ f u n ( f r e , [ l o c ] , dans). ex_fun(fre,[superloc], sur). e x _ f u n ( f r e , [ d i r ] , ä). ex_fun(fre,[so], de). e x _ f u n ( f r e , [ i n s t r ] , avec). ex_fun(fre,[ass], a). ex_fun(fre,[causee], []). ex_fun(fre,[rf|_], []). e x _ f u n ( f r e , [ s t a n d ] , que). e x _ f u n ( f r e , [ e q s t a n d ] , que). ex_fun(fre,[cond], s i ) . ex_fun(fre,[reason], [parce,que]). ex_fun(fre,[nomreason], [ä,cause,de]). ex_fun(fre,[cone|_], [bien,que]). ex_fun(L,[], [ ] ) . Note that for French it would not have been necessary to distinguish the semantic functions "stand" and "eqstand". For English and Dutch this is necessary, because they get different expression. ex_term(fre,[[A,C,D], [[X,[_,GENj_]|_],R2,R3,R4]],[P,Q,R,PP,S,T]) : ex_det(fre,[A,C,GEN],[Ρ,ΡΡ]), ex_noun(fre,X,Xl), ex_number(fre,[C,X1],R), R2=[ADJ,TY], (ADJ=ibon];ADJ=[grand];ADJ=[petit]), ex_restr2(fre,[A,C,GEN],R2,Q), ex_full_term(fre,R3,S), e x _ e m b _ p r o p ( f r e , [ _ , R 4 , _ ] , T ) , !. This term-expression rule takes care of the following phenomena: the elements of the potentially discontinuous demonstrative (Ρ,ΡΡ) are placed before the prenominal adjective position and after the noun position. The prenominal adjective position can be used for a selected set of adjectives, which have here
150 French expression
been enumerated. The rule is more coercive than it should be: it places the relevant adjectives always in prenominal position. French has no equivalent to English and Dutch premodifiers such as the man's, John's, and Jans. Possessive attributive phrases of this kind will always be placed in the postmodifer position S, provided with the preposition de. ex_term(fre,[[A,C,D], [[X,[_,GEN|_]|_],R2,R3,R4]],[P,R,PP,Q,S,T]) : ex_det(fre,[A,C,GEN],[Ρ,ΡΡ]), ex_noun(fre,X,X1), ex_number(fre,[C,X1],R), ex_restr2(fre,[A,C,GEN],R2,Q), ex_full_term(fre,R3,S), ex_emb_prop(fre,[_,R4,_],T), I. This is the alternative rule which places adjectives not treated so far in the postnominal position. Thus we get (2a), and not (2b): (2) a. b.
la dame intelligente ?la intelligente dame
Care has further been taken to avoid output such as: (3)
cette dame-lä intelligente
which appears to be impossible in French. ex_poss_pro(fre,[PERS,N,masc,sg],A) : poss_paradigm(fre,[PERS,N,[A|_]]), !. ex_poss_pro(fre,[PERS,N,fem,sg],B) : poss_paradigm(fre,[PERS,N,[A,B|_]]), !. ex_poss_pro(fre,[PERS,N,_,pl] ,C) : poss_paradigm(fre,[PERS,N,[A,B,C|_]]), !. In these rules the appropriate possessive pronoun is selected from the possessive paradigms in the French lexicon. Note that these paradigms are more complex than those of English and Dutch, since French possessive pronouns are not only sensitive to the Person and Number of the Possessor, but also to the Gender and Number of the Possessed: Possessor Person 1st 1st 2d 2d etc.
Possessed Number Gender masc sg fem sg masc sg pi —
Number sg sg Sg
pi
mon ma . . . > ton . . . > vos —
>
... >
For this reason, four parameters have been specified in the rules for possessive pronoun selection.
The module FreExp 151
ex_sat1m(fre,[[eval],[ADJ],[manner]], [de,une,niani6re, /,fagon,ADJJ]) : concat(ADJ,e,ADJJ). e x _ s a t 1 m ( f r e , [ [ e v a l ] , [ b o n ] , [ m a n n e r ] ] , bien) : - !. e x _ s a t 1 m ( f r e , [ [ e v a l ] , [ A D J ] , [ m a n n e r ] j , ADV) : concat(ADJ,ement,ADV), !. e x _ s a t 3 ( f r e , [ [ e v a l ] , [ A D J ] , [ o p i n i o n ] ] , ADV) : concat(ADJ,ement,ADV), !. e x _ v o i c e ( f r e , [ _ , N U M , _ , G E N ] , [ [ a c t ] , [ e t , Y ] ] , [et,Y1]) : ex_ad](fre,[NUM.GEN],Y,Y1), !. e x _ v o l c e ( f r e , _ , [ [ a c t ] , [ X , Y ] ] , [ X , Y ] ) : - I. ex_voice(fre,[_,NUM,_,GEN],[[pass],[X]], [ e t , Y ] ) : ex_pap(fre,[NUM.GEN],[X],Y). ex_progr(fre,[progr,_,[X|Z]],[et,en,train,de,X1|Z]) :ex_inf(fre,[X],X1). ex_progr(fre,[progr,_,[X|Z]],[X|Z]). I have here taken a specific stance with respect to the status of the Progressive operator in French. This operator is admitted to French underlying clause structures, and expressed in two ways: (5) a. b.
Jean est en train de marcher. Jean marche.
We will therefore also get these two translations for English John is walking. The point is that expressions with "£tre en train de" are much more marked in French than the progressive is in English. ex_aspect(fre,[perf,[act],[X|Z],ΤΥ],[et,Y|Z]) (member(proc,TY);member(tel,TY)), e x _ p a p ( f r e , [ X ] , Y ) , !. ex_aspect(fre,[perf,_,[X|Z],_],[av,Y|Z]) :ex_pap(fre,[X],Y).
:-
The Perfect in French can take different auxiliaries, depending on certain semantic parameters of the nuclear predication: (6) (7) (8)
Jean John Jean John Jean John
est alle au jardin. is gone to-the garden est tombe. is fallen a embrasse Mane. has kissed Mary
The rules have provisionally been formulated in such a way that etre is chosen when the nuclear predicate designates a process or a telic movement. This must be improved. ex_attitude(fre,[predict,[X|Z]],[all,X1|Z]) e x _ i n f ( f r e , [ X ] , X 1 ) , !.
:-
152 French expression
ex_attitude(fre,[poss,[X|Z]],[pouv,Xl|Z]) :e x _ i n f ( f r e , [ X ] , X 1 ) , I. ex_polarity(fre,[neg,[X|Z]],[ne,X,pas|Z]). ex_polarity(fre,[pos,[X jzj],[X,vraiment|Z]). The elements of the discontinuous negation are directly placed around the finite verb. ex_illo(f re,[[ILL,AR],[X|Ζ]],[X|Ζ]). ex_tense(fre,[AGR,[ne,X|Z]],[ne,Y|Z]) : e x _ t e n s e ( f r e , [ A G R , [ X | Z ] ] , [ Y | Z ] ) , !. The first element of the negation must be disregarded in the further expression of the verbal complex. e x _ t e n s e ( f r e , [ [ p r e s , s g , p 2 , _ ] , _ , [ X | Z ] ] , [Y|Z]) : concat(X,es,Y), I. e x _ t e n s e ( f r e , [ [ p r e s , s g , _ , _ ] , _ , [ X | Z ] ] , [Y|Z]) : concat(X,e,Y). e x _ t e n s e ( f r e , [ [ p r e s , p i , p 1 , _ ] , _ , [ X | Z ] ] , [Y|Z]) : concat(X,ons,Υ), !. e x _ t e n s e ( f r e , [ [ p r e s , p l , p 2 , _ ] [ X | Z ] ] , [Y|Z]) : concat(X,ez,Y), !. e x _ t e n s e ( f r e , [ [ p r e s , p l , _ , _ ] [ X | Z ] ] , [Y|Z]) : concat(Χ,βηΐ,Υ). e x _ t e n s e ( f r e , [ [ p a s t , s g , p 3 , _ ] , _ , [ X 1 | Z ] ] , [Y|Z]) : c o n c a t ( X 1 , a l t , Υ ) , I. e x _ t e n s e ( f r e , [ [ p a s t , s g , _ , _ ] , _ , [ X 1 | Z ] ] , [Y|Z]) : c o n c a t ( X 1 , a l s , Υ ) , 1. e x _ t e n s e ( f r e , [ [ p a s t , p i , p 1 , _ ] , _ , [ X I | Z ] ] , [Y|Z]) : c o n c a t ( X l , l o n s , Y ) , I. e x _ t e n s e ( f r e , [ [ p a s t , p l , p 2 , _ ] , _ , [ X 1 | Z ] ] , [Y|Z]) : concat(X1,lez,Y), !. e x _ t e n s e ( f r e , [ [ p a s t , p i , _ , _ ] , _ , [ X 1 | Z ] ] , [Y|Z]) : concat(X1,alent,Y). ex_tense(fre,[[[subjone,past],sg,p3,_],_,[X1|Z]],[Y|Z]) : c o n c a t ( X 1 , a t , Υ ) , I. e x _ t e n s e ( f r e , [ [ [ s u b j o n c , T E ] , p l , p 1 , J , _ , [ X | Z ] ] , [ Y 1 |Z]) : sub]_stem(fre,TE,X,Y), concat(Y,ions,Y1), I. e x _ t e n s e ( f r e , [ [ [ s u b j o n c , T E ] , p l , p 2 , _ ] , _ , [ X | Z ] ] , [ Y 1 |Z]) : subj_stem(fre,TE,X,Y), c o n c a t ( Y , l e z , Y ) , !. ex_tense(fre,[[[subJonc,TE],N,P,G],ASP,[X|Z]],[Y1|Z]) :subJ_stem(fre,TE,X,Y), ex_tense(fre,[[TE,N,P,G],[Y|Z]],ASP,[Y1 | Z ] ) , I . subj_stem(fre,past,V,St) :c o n c a t ( V , a s s , S t ) , I. subj_stem(fre,TE,X,X). ex_prp(fre,[X|Z],[Y|Z]) :concat(X,ant,Y).
The module FreExp
ex_pap(T r e , [NUM,GEN],[Χ|Ζ],[Υ|Ζ ]) ex_pap(fre,[X|Z],[X1|Z]), ex_adj(fre,[NUM,GEN],[X1],Y). ex_pap(fre,[X|Z],[A|Z]) :paradigm(fre,X,[A|_]), n o t ( A = [ ] ) , !. ex_pap(fre)[X|Z],tY|Z]) :concat(X,£,Y).
153
:-
The past participle is entered into the rules for adjectival expression in order to account for the fact that it is inflected for Gender and Number. ex_inf(fre,[X|Z],[X1|Z]) concat(X,er,X1).
:-
A specific problem in the French expression component is how to account for the various sandhi or "liaison" phenomena which may affect the form of a word depending on the properties of the word by which it is followed, or even "fuse" two words into one so-called portmanteau form. Since these modifications depend on word order, they can only be taken care of after the word order has been established. But since the forms of constituents have already been determined before the order was fixed, we are forced (given our organization of the expression component) to treat sandhi in terms of transformation of already established phonological material. This could only be otherwise if first the order, and then the form of constituents were determined. But then we would get other difficulties with cases in which the order appears to co-depend on the form. In the present program the sandhi rules have been implemented in the following way:1 the sandhi rules make a last, recursive run through the prefinal form of the sentence, which has already been cast into a "flat" Prolog list with no occurrences of [ ]: (9)
[ce,enfant,a,donne,le,ballon,ä,le,gargon, . ]
"sandhi" looks successively at pairs of atoms in this list: [atoml,atom2],[atom3,atom4],... checking at each step whether something must be modified. If not, it advances to the next pair without modification. sandhl(fre,X,_,Y,New) :init_vowel(Y), concat(X,' \ X 1 ) , concat(X1, Ί" ' ,X2), concat(X2,Y,New). As discussed in 10.1.7. the sandhi rules have been formulated in such a way that a sequence such as [ d e , l e , o u v r l e r ] is turned into [ d e , l ' o u v r i e r ] rather than * [ d u , o u v r i e r ] : s a n d h i ( f r e , X , ' - l ä ' , X 1 ) : - !, concat(X,'-lä',X1).
154 French expression
The second part of the discontinuous demonstrative ce...lä is glued onto the preceding constituent. s a n d h l ( f r e , ä , l e , a u ) : - I. s a n d h i ( f r e , ä , l e s , a u x ) : - !. s a n d h i ( f r e , d e , l e , d u ) : - 1. s a n d h i ( f r e , d e , l e s , d e s ) : - !. The prepositions ä and de fuse with the following article. sandhi(fre,X,Y,X1) : member (X, [ l e , l a , ]e,nie, t e , n e , d e , q u e , c e ] ) , vowel-initial(Y), sandhi2(f re,Χ,Υ,X1). A number of items, as mentioned in the rule, are modified before a vowelinitial constituent by "sandhi2": sandhi2(fre,ce,Y,Y1) :c o n c a t ( ' c e t ',Y,Y1). The demonstrative ce is changed into cet before vowel: ce professeur vs. cet ouvrier. sandhi2(fre,X,Y,X1) : vowel(Vow), concat(Red,Vow,X), concat(Red,'''',Αρ), concat(Ap,Y,X1). The other items lose there final vowel, get an apostrophe (le --> I'), and are glued onto the following vowel-initial word: I'ouvrier. vowel_initial(X) : vowel(fre,V), concat(V,_,X). vowel(fre,V) : member(V,[ha,he,he,hi,ho,hu]). vowel(fre,V) : vowel(V). A constituent is vowel-initial if it starts with a "French" vowel, which is either a normal vowel, or a vowel preceded by h. factive_marker(fre,[le,fait,que]). subl(fre,que). expl(fre,il). copula(fre,et).
Chapter 12. DutExp: Dutch expression
12.0. Introduction In combination with ParSel and UniExp, DutExp does the same work for Dutch as EngExp does for English and FreExp for French. Again, I will only comment on those points which might not be immediately clear from the commentary so far.
12.1. The module DutExp ex_noun(dut,[[slaa],[agent,masc]],slaander) ex_noun(dut,[[gaa],[agent,masc]j,gaander) :ex_noun(dut,[[slaa],[agent,fem]],slaanster) ex_noun(dut,[gaa,[agent,masc]],gaanster) :-
: - !. I. : - I. !.
Without the above stipulations, the agent noun expression rule for Dutch would yield *slaaer, *gaaer, *slaaster, and *gaaster. Note, however, that even the above forms are not very acceptable. ex_noun(dut,[X,[agent,masc]],Y) : - I, derivational_stem1(dut,X,ST), concat(ST,er,Y). e x _ n o u n ( d u t , [ X , [ a g e n t , f e m ] ] , Y ) : - !, derivational_stem2(dut,X,ST), concat(ST,ster,Y). Derivational stem-1 and -2 are defined at the end of this chapter. The following rules serve to express the stems of diminutive nouns: e x _ n o u n ( d u t , [ [ m e i s j e ] , d i m ] , m e i s j e ) : - !. e x _ n o u n ( d u t , [ [ j o n g e n ] , d i m ] , j o n g e t j e ) : - 1. The form meisje 'girl* is a special case. It is not felt to be regularly related to meid 'maid, big girl', and it collocates with non-diminutives, as in: (1)
Is het een jongen of een meisje? is it a boy or a girl?
On the other hand, the noun meisje does not have the expected diminutive *meisjetje. The present rule says that meisje does double duty as a base noun and as a diminutive. ex_noun(dut,[[N],dlm],X) : concat(N1,'Σ',Ν), prevocalic_stem(N1,N2), c o n c a t ( N 2 , e t j e , X ) , !.
156 Dutch expression
When the noun ends in X, the diminutive affix is -etje: mariX --> mannetje. As explained before, in 9.1.3.2., this is a temporary substitute for properly defining the phonological conditions for the occurrence of this suffix. ex_noun(dut,[[N],dim],X) concat(_,m,N), c o n c a t ( N , p j e , X ) , !. ex_noun(dut,[[N],dim],X) concat(Nl,g,N), concat(N2,n,N1), concat(N1,k]e,X), !. ex_noun(dut,[[N],dim],X) obstruent(0), concat(_,0,N), c o n c a t ( N , j e , X ) , !. ex_noun(dut,[[Ν],dim],X) concat(N,tJe,X).
::-
:-
:-
Otherwise, when the stem ends in -m, the suffix is -pje: boom 'tree' ~ > boompje; when the stem ends in -ng, the suffix is -kje: koning 'king' --> kortinkje; when the stem ends in an obstruent, the suffix is -je: kat 'cat' --> katje; otherwise the suffix is -tje: tafel 'table' --> tafeltje. ex_noun(dut,[N],X) concat(X,'Σ',Ν),
:I.
When none of these conditions apply, the X is removed: it serves no further purpose. ex_noun(dut,[X],X). When none of the above conditions apply, only the brackets around the noun stem are removed. ex_number(dut,[pl,X],Y) : concat(_,e,X), concat(X,s,Y), !. ex_number(dut,[pl,X],V) : sonant(G), concat(X1,G,X), concat(_,e,X1), concat(X,s,Y), I. ex_number(dut,[pl,X],Y) : prevocalic_stem(X,X1), concat(X1,en,Y). Dutch has two productive plural endings for nouns. The present rules say that if the last letter of the stem is e or if the stem ends in el, er, en, or em, the ending is s\ otherwise it is en. Cases not captured by these rules have been placed in paradigms. "Prevocalic stem" is defined below. e x _ d e t ( d u t , [ [ h i g h , i n d e f ] , p l , _ ] , veel) : - !. e x _ d e t ( d u t , [ [ l o W j i n d e f ] , p l , _ ] , weinig) : - !. ex_det(dut,[[univ,def],pl,_], alle) !.
The module DutExp
157
e x _ d e t ( d u t , [ [ u n i v , d e f ] , s g , n e u t ] , e l k ) : - !. e x _ d e t ( d u t , [ [ u n i v , d e f ] , s g , _ ] , elke) : - 1. e x _ d e t ( d u t , [ [ p r o x , d e f ] , s g , n e u t ] , d i t ) : - !. e x _ d e t ( d u t , [ [ p r o x , d e f ] , _ , _ ] , deze) : - I. e x _ d e t ( d u t , [ [ r e m , d e f ] , s g , n e u t ] , d a t ) : - !. e x _ d e t ( d u t , [ [ r e m , d e f ] , _ , _ ] , d i e ) : - I. e x _ d e t ( d u t , [ d e f , s g , n e u t ] , het) : - !. e x _ d e t ( d u t , [ d e f , _ , _ ] , de) : - !. e x _ d e t ( d u t , [ i n d e f , s g , _ ] , een) : - I. e x _ d e t ( d u t , [ i n d e f . p l , _ ] , []) : - !. Note that whether a noun has Gender "neuter" or not is of influence on the form of most of the determiners. ex_predadj(dut,_,[A,_],A1) ex_adJ1(dut,A,A1).
:-
The predicative adjective always equals the stem. ex_restr2(dut,[DEF,NUM,GEN],[A,J,A2) : ex_adj(dut,[DEF,NUM,GEN],[A,_],A2). ex_adj(dut,[DEF,NUM,GEN],[A,_],A2) : ex_adJ1(dut,A,A1), adj_affix(dut,[DEF,NUM,GEN],A1,A2). ex_adj1(dut,[prp,[V]],X) :e x j r p t d u t . m . X ) , !. ex_adJ1 ( d u t , [ A , p o s ] , B ) : p a r a d i g m ( d u t , A , [ B | _ ] ) , I. ex_adj1(dut,[A,pos],A2) : prevocalic_stem(A,A1), concat(A1,e1r,A2), !. ex_adj1(dut,[A,P0L],[P0L1,A]) : ex_pol(dut,P0L,P0L1), !. ex_pol(dut,pos,meer). ex_pol(dut,neg,minder). ex_pol(dut,equal,even). ex_adj 1 (dut,[deg,DEG,[A]],[DEG,A]) : - !. ex_adj1(dut,[A],A). ad}_affix(dut,FEAT,[Χ,Α],[X,A1]) : adj_affix(dut,FEAT,A,A1), !. a d j _ a f f i x ( d u t , [ i n d e f , s g , n e u t ] , Α , Α ) : - I. adj_affix(dut,_,A,Al) :prevocalic_stem(A,AA), concat(AA,e,A1). The Gender of the noun ("neuter" vs. "non-neuter") influences the form of the attributive adjective in the following way:
158 Dutch expression
(2)
a. b. c.
een slim kind a clever child het slimme kind the clever child slimme kinderen clever children
The rule is that the attributive adjective remains unchanged when the noun is "neuter", and the term indefinite and singular. In all other cases the attributive adjective gets the affix -e (which obviously requires the "prevocalic stem"). This also applies to derived adjectival predicates and attributive present participles. ex_ fun dut [poss], van). ex]]fun eng [gen], '*s'). ex]]fun dut [pt],[]). ex fun dut [ag], door), ex]]f un dut [pos], door). ex]]fun dut [zero], []). ex]]fun dut [reo], aan). ex ]f un dut [ben], voor). ex]]fun dut [loc], in). ex]]fun dut [superloc], op). ex]]fun dut [dir], naar). ex fun dut [so], van). ex]]f un dut [instr], met). ex fun dut [ass], met). ex]]fun dut [temp], op). ex_ fun dut [causee],[]). ex]]fun dut [rf],[]). ex]]fun dut [stand],dan). ex ]fun dut [eqstand],als). ex fun dut [cond],als). ex ]fun dut [reason],omdat). ex]]fun dut [nomreason],vanwege) ex]]fun dut [conc],hoewel). ex_ fun L,[ !>[])• ex_full_term(dut>[_,[_,[[p3,TY]]],F], X1) :not(F=[subj[_]), not(F=[obj|_]), member(inanim,TY), ex_fun1(dut,F,X), concat(er,X,Xl), !. ex_full_term(dut,[_,[_,[[RQ,TY]]],F], [[],[**,X1]]) :(RQ=rel;RQ=que), not(F=[subj|_]), not(F=[obj|_]), (member(inanim,TY);member(neut,TY)), ex_fun1(dut, F,X), concat(waar,X,X1), I.
The module DutExp 159
e x _ f u n 1 ( d u t , [ i n s t r ] , m e e ) : - I. e x _ f u n 1 ( d u t , [ d i r ] , h e e n ) : - I. ex_funl(dut,X,Y) : ex_fun(dut ) X,Y). For Dutch we need a special rule for the expression of inanimate or neuter pronominal, questioned, and relativized terms. This special rule must precede the other rules for expressing pronominal full terms because of the following facts: (3) a. b. c.
(4) a.
b.
c.
(5) a. b. c.
Jan speelt met hem. John plays with him *Jan speelt met het. John plays with it Jan speelt ermee. John plays there-with 'John plays with it' Met wie praat Jan? with whom talks John 'With whom does John talk?' *Met wat schrijft Jan? with what writes John 'What does John write with?' Waarmee schrijft Jan? where-with writes John 'What does John write with?' de man met wie Jan praat the man with whom John talks *de pen met wat Jan schrijft the pen with which John writes de pen waarmee Jan schrijft the pen where-with John writes 'the pen with which John writes'
Note the following properties of inanimate pronominal/questioned/relative terms: — they are is postpositional rather than prepositional; — they contain er (literally 'there') or waar (literally 'where') rather than wat ('what'); — the postpositional element is written as one word with er/waar\ — the postpositional element is not always identical to the corresponding preposition: (6)
Instrument: Direction :
met X naarX
but er-mee, waar-mee but er-heen, waar-heen
160 Dutch expression
In other instances postposition and preposition are identical: (7)
Cause Source
: :
door X uit X
and er-door, waar-door and er-uit, waar-uit etc.
These facts are taken care of by the above rules, which "bleed" the prepositional expression of pronominal, questioned and relativized inanimate or neuter oblique terms in Dutch. ex_sat1m(dut,[[eval],[ADJ],[manner]],ADJ). ex_sat1m(dut,[[eval],ADJ,[manner]],[opjeen.ADJJjmanier]) derivational_stem1(dut,ADJ,ADJ1), concat(ADJ1,e,ADJJ), I.
:-
Manner adverbs in Dutch can be expressed by the bare adjectival stem, as in (8a), or through a paraphrase similar to English "in a ADJ manner", as in (8b): (8) a. b.
Jan beantwoordde John answered Jan beantwoordde John answered
de vraag slim. the question cleverly de vraag op een stimme manier. the question on a clever manner
e x _ s a t 3 ( d u t , [ [ e v a l ] , [ A D J ] , [ o p i n i o n ] ] , [ A D J , g e n o e g ] ) : - !. Attitudinal satellites of level 3, based on adjectives, are expressed as "adjective genoeg ('enough'). E.g. cleverly, in this function, corresponds to slim genoeg. ex_voice(dut,_,[[pass],[X|Z]],[word,Y]) ex_pap(dut,[X|Z],Y).
:-
The passive non-perfect auxiliary is worden 'become' rather than zijn 'be'. e x _ p r o g r ( d u t , [ p r o g r , [ a c t ] , [ X | Z ] ] , [ z i j n , a a n , h e t , X 1 ]) : ex_inf(dut,[X|Z],X1). ex_progr(dut,[progr,[act],[X|Z]],[X|Z]). e x _ p r o g r ( d u t , [ p r o g r , [ p a s s ] , [ X | Z ] ] J [ X | Z ] ) : - !. Just as in French, the expression of the Progressive in Dutch is less productive than in English: (9a) may be used, but the simple present as in (9b) is more usual: (9) a.
b.
Jan is aan het fietsen. John is at the cycle-inf 'John is a-cycling' Jan fietst = 'John cycles', 'John is cycling'
The construction with aan het + infinitive cannot be used in the passive: (10) a. b.
*Jan is aan het gestagen worden. Jan wordt geslagen. 'John is being beaten'
The module DutExp 161
ex_aspect(dut,[pert,[act],[zijn|Z],_],[zijn,Z,Y]) e x _ p a p ( d u t , [ z i ] n ] , Y ) , I.
:-
If the verb is the copula zijn 'be', the perfect auxiliary is also zijn (rather than hebben 'have', as in English and, mutatis mutandis, in French): (11)
Jan is slim geweest. John is clever been 'John has been clever'
ex_aspect(dut,[pert,[act],[X|Z],ΤΥ],[ziJn.Y]) (member(proc,TY);member(tel,TY)), ex _ p a p ( d u t , [ X | Z ] , Υ ) , I.
:-
The perfect auxiliary is also zijn if the nucleus designates a process or a telic movement. The conditions are similar as for French etre, and have to be specified more adequately than is the case here. ex_aspect(dut,[pert,[act],[X|Z],_],[heb,Y]) ex_pap(dut,[X|Z],Y).
:-
Otherwise, the perfect active auxiliary is hebben 'have'. ex_aspect(dut,[pert,[pass],[X|Z],_],[zi]n,Z]). The passive perfect is peculiar in that the auxiliary, zijn, suppresses the past participle of the passive auxiliary worden: (12) a. b. c.
Jan wordt geslagen. 'John is (being) hit' *Jan is geslagen geworden. 'John has been (being) hit' Jan is geslagen. 'John has been (being) hit'
Because in some sense the auxiliary worden is "missing" in the perfect passive, I have taken the liberty of actually suppressing it in the rule above: it is the element "X" which does not return in the output. This is not quite in the spirit of Functional Grammar, which avoids deletion of surface elements already specified. In another version of the Dutch expression component, I have obtained the correct results without this trick, but this requires some reorganisation of the expression rules in such a way that passive voice and perfect aspect are simultaneously expressed. ex_attitude(dut,[predict,[X|Z]],[zul,X1]) :e x _ i n f ( d u t , [ X | Z ] , X 1 ) , I. ex_attitude(dut,[predict,[X|Z]],[zul,Z,X1]) :e x _ i n f ( d u t , [ X ] , X 1 ) , !. ex_attitude(dut,[poss,[X|Z]],[kun,X1]) :e x _ i n f ( d u t , [ X | Z ] , X 1 ) , I. ex_attitude(dut,[poss,[X|Z]],[kun,Z,X1]) :e x _ i n f ( d u t , [ X ] , X 1 ) , !.
162 Dutch expression
For the expression of the attitudinal operators "predict" and "poss" two rules each are needed in view of the following differences: (13) a. b.
Jan kan valsspelen. 'John may cheat' Jan kan Mane küssen. 'John may kiss Mary'
Note that in both cases the infinitive comes at the end. But in separable compound verbs the second element vals is affixed to the infinitive, whereas in all other cases the postverbal element remains unattached. Note that these rules also account for such constituent orders as: (14) a. b.
Jan kan een stimme jongen zijn. 'John may a clever boy be' Jan kan een stimme jongen geweest zijn. John may a clever boy been be = 'John may have been a clever boy'.
ex_polarity(dut,[neg,[X Z]],[X.niet|2]) ex_polarity(dut,[pos,[X Z]],[X,wel|Z]). Dutch has no equivalent to English do-support in the expression of negative and (emphatic) positive polarity. ex_illo(dut,[[ILL,AR],[X|Z]],[X|Z]). The illocution does not influence the expression of the verbal complex. ex_tense(dut,[AGR.ASP,[[V,Particle]|Z]],[V1,Particle,Z]) ex_tense(dut,[AGR.ASP,[V|Z]],[V1 | Z ] ) , !.
:-
If the verbal predicate is a separable compound (and thus a verb + particle combination), it is the verb which is the target of Tense expression. e x _ t e n s e ( d u t , [ [ p r e s , s g , p 1 , _ ] , _ , [ X | Z ] ] , [ X 1 |Z]) : final_devoicing(X,X1), !. ex_tense(dut,[[pres,sg,PS,_],_,[X|Z]],[X|Z]) :(PS=p3;PS=p2), c o n c a t ( _ , t , X ) , 1. e x _ t e n s e ( d u t , [ [ p r e s , s g , P S , _ ] [ X | Z ] ], [Y|Z]) : (PS=p3;PS=p2), final_devoicing(X,X1), c o n c a t ( X 1 , t , Y ) , !. The second and third person present tense is expressed by affixing -/. This is not written when the last letter of the stem is also t. ex_tense(dut,[[pres,pl,_,_],_,[X|Z]],[Y|Z]) prevocalic_stem(X,X1), concat(X1,βη,Υ).
:-
The module DutExp 163
ex_tense(dut,[[past,sg,_,_],_,[X|Z]],[Y|Z]) voiceless(C), concat(_,C,X), c o n c a t ( X , t e , Υ ) , !. ex_tense(dut,[[past,sg,_,_],_,[X|Z]],[Y|Z]) concat(X,de,Y), !. ex_tense(dut,[[past,pl,_,_],_,[X|Z]],[Y|Z]) voiceless(C), concat(_,C,X), concat(X,ten,Υ), I. ex_tense(dut,[[past,pi,_,_],_,[X|Z]],[Y|Z]) concat(X,den,Υ), !.
:-
::-
:-
The past tense ending is singular -te, plural -ten when the stem ends in a voiceless consonant, otherwise -de, -den. ex_pap(dut,[V,Particle],V1) :predv(dut,[[V,Particle]|_]), ex_pap(dut,[V],VV), concat(Particle,VV,V1), !. The past participle of separable compounds is created by forming the past participle of the verb, and prefixing the particle to it: (15)
[speel,vals] past participle of speel: gespeeld past participle of [ s p e e l , v a l s ] : valsgespeeld.
ex_pap(dut,[X],Y) : pap_stem(dut,X,X1), pap_affix(dut,X1,Y). For past participle formation we form a past participle stem and do past participle affixing to this stem. pap_stem(dut,X,X) : member(Pre,[be,ge,ver]), concat(Pre,_,X),!. pap_stem(dut,X,X1) : concat(ge,X,X1). Verbs starting with the formatives be-, ge-, or ver- have no past participle prefix; otherwise, the prefix is ge-. pap_affix(dut,Χ,X) : c o n c a t ( _ , t , X ) , !. pap_affix(dut,X,Y) : voiceless(L), concat(_,L,X), c o n c a t ( X , t , Υ ) , !. pap_affix(dut,X,Y) : final_devoicing(Χ,X1), concat(X1,d,Y), !.
164 Dutch expression
The past participle affix is φ when the stem ends in -1, it is -t when the stem ends in a voiceless consonant, and otherwise it is -d. ex_inf(dut,[V.Part],V1) : predv(dut,[[V.Part] |_]), concat(Part.V.W), ex_inf (dut, [ W ] ,V1), ! . ex_lnf(dut,[X|2],[Y|2]) : prevocalic_stem(X,Xl), concat(Xl,en,Y). The infinitive of separable compounds is created by forming the infinitive of the verb, and prefixing the particle to it. Otherwise, the infinitive if formed by taking the prevocalic stem and affixing -en to it. ex_prp(dut,[V.Part],Z) : predv(dut,[[V.Part]|_]), concat(Part.V.VV). ex_prp(dut,[VV],Z), I. The present participle of separable compounds is created by prefixing the particle to the stem and participializing the resulting combination. ex_prp(dut,[X|Z],[Y|Z]) : prevocalic_stem(X,X1), vowel(V). concat(_,V,X1). concat(X1,nd,Y), ! . ex_prp(dut,[X],Y) : prevocalic_stem(X J Xl), concat(Xl,end,Y). ex_nom_polarity(dut,[neg,[X|Z]],[niet|[X,Z]]) : - ! . ex_nom_polarity(dut,[pos,[X|Z]],[wel|[X,Z]]) : - ! . derivational_stem1(dut,[X|Y],Z1) stem_fusion([X|Υ],[Z]), prevocalic_stem(Z,Z1), 1.
:-
In Dutch expression, provision had to be made for the "fusion" of separable compound predicates, and for the following orthographical relations: (16)
stem loop 'walk' kus 'kiss'
derived forms loop-t lop-er, lop-en, lop-end kus-t kuss-er, kuss-en, kuss-end
A stem with a tense or "long" vowel written W (as in loop) is written with a single vowel when followed by a vowel-initial affix (as in loper); a stem with a lax or "short" vowel (as in kus) is written with a geminated final consonant in that condition (as in kusser).
The module Du tExp 165
At the same time, "derivational stem-1" takes care of the "stem fusion" which is required for separable compound predicates, as in: (17)
stem [speel.vals] 'cheat'
derived form [valsspeler] 'cheater'
derivational_stem1(L,[X],X). If none of the above conditions hold in any of the languages, the derivational stem-1 is equal to the (lexical) stem. derlvational_stem2(dut,X,Z1) :stem_fusion(X,[Z]), f l n a ^ d e v o i c i n g f Z . Z I ) , !. derivational_steni2(L > [A] ,A). In Dutch we also need a derivational stem which only involves stem fusion for separable compounds (this is the stem which is used preceding consonants): (18)
stem [speel.vals] 'cheat'
derived form [valsspeelster] 'cheat-ster'
If the stem is not a separable compound, the derivational stem-2 equals the stem. stem_fusion([Χ,Υ],[Z]) concat(Υ,Χ,Ζ). stem_fusion(X,X).
:- !,
Stem fusion takes the second element of a separable compound and prefixes it to the first: (19)
[ s p e e l , v a l s ] —> [ v a l s s p e e l ]
In the case of non-compound stems, stem fusion applies vacuously. prevocalic_stem(X,X1) :consonant(C), concat(Y,C,X), vowel(V), concat(Y1,V,Y), consonant(C1), concat(Y2,C1,Y1), concat(X,C,Xl), !. prevocalic_stem(X,X1) : consonant(C), concat(Y,c,X), vowel(V), concat(Y1,V,Y), concat(Y2,V,Y1), concat(Y1,C,X1), 1. prevocalic_stem(X,X).
166 Dutch expression
This definition of "prevocalic stem" is used only in the Dutch expression component. The first rule geminates the last consonant of stems which end in -CVC, the second rule deletes one vowel character from a stem which ends in - W C . This is to account for the orthographical regularities illustrated in (16) above. If none of the relevant conditions apply, the prevocalic stem is identical to the lexical stem. final_devoicing(X,Y) : concat(St,z,X), 1, concat(St,s,Y). final_devoicing(X,Y) : concat(St,v,X), !, concat(St,f,Y). final_devoicing(X,X). These rules for "final devoicing" turn a stem-final ζ into s, and ν into /. The rules do not convert b into ρ nor d into t (which would be correct from a phonological point of view), because of the inconsistency of Dutch spelling conventions. Compare: (20)
written form kaas 'cheese' kazen 'cheeses' raaf 'raven' raven 'raven pi'
phonological form /kas/ /kazen/ /raf/ /raven/
But on the other hand: (21)
hand 'hand' handen 'hands' web 'web' webben 'webs'
/hant/ /handen/ /wep/ /weben/
adjust_spelling(dut,[],[]). adjust_spelling(dut,[X|Z],[Y|Z1]) : stringof(ΝΝ,Χ), convert(NN,NN1), stringof(NN1,Y), adjust_spelling(dut,Z,Z1), !. convert([],[]). convert(NN,NN2) : delete("1',NN,NN1), convert(NN1,NN2), !. convert(NN,NN). Spelling adjustment is used to tidy up the occurrences of el (= schwa) wherever they occur in the prefinal output list. For this purpose, each word is inspected recursively for whether it contains an occurrence of "1". If so, this is deleted; if not, the next word is inspected.
The module DutExp
factive_marker(dut,[het,feit,dat]). subl(dut,dat). expl(dut,het). copula(dut,zijn).
Chapter 13. UniPar: a universal parser
13.0. Introduction A parser is a device which takes a sentence as input and delivers a structure for that sentence as output. One way to characterize a parser, then, is to consider the nature of the input and the output structures, and then the strategy through which the gap between input and output is bridged: (1)
input —> parsing strategy —> output
[a] Input to the parser The input to the present parser must in principle be a sentence in L of which the underlying structure can be generated by the L generator. There is, however, a class of sentences which the present parser can handle although, in this particular form, they cannot (yet) be generated by the generator. For example, the generator does not yet know all the placement possibilities for terms. Thus, (2a) cannot as yet be generated, but it will nevertheless be accepted by the parser. When we ask the parser then to return the sentence from the reconstructed underlying structure, it will return (2b): (2) a. b.
In the garden the boy kisses the girL The boy kisses the girl in the garden.
Obviously, (2b) will also be accepted by the parser. In this respect, then, the comprehensive capacity of the parser exceeds its productive capacity. In order to be manageable to the parser the input sentence will be converted into a Prolog list by a program for reading in sentences. Versions of such a program can be found in Bratko (1986) and Clocksin and Mellish (1987). The effect of this program is as follows: (3)
input sentence: John gave the book to the girL converted into: [ j o h n , g a v e , t h e , b o o k , t o , t h e , g i r l , . ]
The list representation is the structure that the parser is going to work on in reconstructing the underlying structure of the sentence. [b] Output of the parser The output of the parser is a more controversial point. Usually, current parsers reconstruct some kind of syntactic tree such as can be produced by phrase structure rules. The nodes of the tree will be labelled by category labels, and sometimes other pieces of information may be added to these nodes. To the extent that semantic interpretation is called for, this is considered as constituting a second phase, in which the syntactic tree is mapped onto some kind of semantic representation.
170 A universal parser
In Functional Grammar, syntactic trees of this format are not recognized as a significant level of representation. Nor is the idea adopted that the parser should work according to a two-phase design, first reconstructing some kind of syntactic tree, then mapping this tree onto some kind of semantic representation. Within a Functional Grammar context the most interesting parser is one which immediately reconstructs the underlying structure of the sentence. Note that this structure can be considered a syntactic object from one point of view, since it clearly has a standardized "syntactic" organization (as defined by the generator). From another point of view, however, it can also be considered a semantic object, since it is rich in semantically relevant material and can be used as input to a logical inferencing machine (see chapter 15). Our program for developing a parser can thus be sketched as follows:
(4)
Input Sentence I I I Sentence in Prolog List Format I I I Parser I I I Underlying Clause Structure
In implementing this program I have distinguished two output targets and correspondingly two parsing strategies. "True Parse" reconstructs the actual fully specified clause underlying the input sentence. Since the fully specified clause is characterized for Subject and Object, this parsing strategy preserves the Voice of the input sentence: active sentences will be returned as active sentences, and passives as passive sentences. The other strategy, "Deep Parse", reconstructs the "clause structure" rather than the "fully specified clause" underlying the sentence (see Figure 1 in chapter 3). Since the clause structure is not specified for Subject and Object, this means, for example, that when we ask the parser to return a sentence through this strategy, we may get such results as:
Introduction
(5)
input: output:
johrt kissed mary. john kissed mary. mary was kissed by john.
(6)
input: output:
john was given the book by mary. mary gave the book to john. mary gave john the book. the book was given to john by mary. john was given the book by mary.
171
Note that as far as "understanding" the sentence is concerned, this is not a bad result: the clause structure before specification does indeed contain the gist of the semantic information as contained in the sentence. The difference between "True Parse" and "Deep Parse" may simulate some of the phenomena which have been observed by psycholinguists in relation to the interpretation and comprehension of active and passive sentences: sometimes it seems as if hearers "listen for the gist" rather than the form; they perfectly remember the content, but not whether it was transmitted to them in active or passive format (Deep Parse). In other circumstances, however, they do seem to remember the form along with the content (True Parse). Corresponding to the difference between "true-parsing" and "deep-parsing" I have implemented two different translation strategies ("True Translate" and "Deep Translate") in the translator to be discussed in chapter 14. [c] Parsing strategy The task of the parser in its deep-parsing mode is to bridge the gap between, for example, the input structure (7) and the output structure (8): (7)
[john,kissed,the,girl,. ]
(8) P4: P3: P2: P1: NUC:
[[decl,[ee,14]], [[[],[xx,9]], [[past,[],[],[e,23]], [[], [[kiss],[act,dyn,contr], [[[anim], [[def,sg,[x,7]],[[john], [hum,masc,ρroper,anim,concr,vert]]], [ag ] ], [ [concr], [[def,sg,[x,19]],[[girl], [hum,fem,anim,concr,vert]]], [pt]]]], [[*],[*],[*],[*]]], [[*],[*],[*],[*]]], [*]], [*]]
S1: S2: S3: S4:
172 A universal parser
Since all underlying clause structures have basically the same schema (the schema exemplified in (8)), the parser can start with a skeleton representation of that schema, in which all possible specific values are represented by variables. This skeleton schema will look as follows: (9)
P4: [ [ I L L , [ e e , R 1 ] ] , P3: [[MOOD,[XX,R2] ], P2: [[TENSE,POL,ASP,[e,R3]], P1: [PROGR, NUC: [PRED,TYPE,ARGS] S1: [ [ * ] , [ * ] , [ * ] , [ * ] ] ] , S2: [ [ * ] , [ * ] , [ * ] , [ * ] ] ] , S3: [ * ] ] , S4: [ * ] ]
The task of the parser is now to exploit the information contained in the sentence plus all other relevant knowledge sources (more specifically, the lexicon and the expression rules of the language in question) in order to assign appropriate values to the variables ILL, MOOD, TENSE, POL, ASP, PROGR, PRED, TYPE, ARGS in the clause structure schema, and to fill in the satellite positions marked [ * ] wherever possible. For a very simple example, we can take the following rule: (10) The variable ILL in the clause structure schema may be assigned the value "decl" if the last item in the input list is "."; it may be assigned the value "interr" if that last item is "?". The parsing task is completed if (a) the parser has "used up" all the material in the input list, and (b) it has arrived at a well-formed underlying clause structure which contains no more variables. The correctness test for the parse is that it should be possible to regenerate the input sentence (plus possible other sentences which are semantically equivalent to it, as illustrated in (5) and (6) above) from the reconstructed underlying clause structure. In implementing this program I have experimented with two rather distinct retrieval strategies: Strategy 1 looks at the actual morphemes and grammatical items of the input sentence, and contains rules of the following form: (11) If, in an English input sentence, you find a form Ν (e.g. books) which is such that it can be decomposed into N1 and s (e.g. book and s) such that N1 is the form of a basic nominal predicate of English, then you can reconstruct: [ p l , [ [ N 1 ] , T Y ] ] ] (e.g. [ p i , [ [ b o o k ] , [ i n a n i m ,
concr]]])
where the form [ N1 ] and the type TY are retrieved from the lexicon, possibly extended through redundancy rules. In other words: you have found a potential plural noun of English.
Introduction 173
Note that this strategy is strongly form-dependent and language-specific: we need different rules for different forms of pluralization both within and across languages. Strategy 2 is based on an inverted application of the expression rules. It works as follows: (12) You may reconstruct an underlying structure U C for a form F found in the input sentence if there is a rule of the expression component which generates F from UC. For example, you may reconstruct [ p i , book ] from books since the latter can be generated from the former by the expression rule: ex_num(eng,[pi,X],Y) : - c o n c a t ( X , s , Y ) . This strategy has two great advantages: (a) it needs by far fewer rules than Strategy 1; (b) it can be formulated in a completely language-independent way, in the sense that no material of the actual object language is mentioned in the parser rules. This means that, when another language is added to the system, the parser need not be changed, let alone that there is a need for creating a new parser for that language. 1 These latter two properties make Strategy 2 much more interesting in a theoretical sense than Strategy 1 with its many ad hoc rules. For that reason, the parser described here is based on Strategy 2. The term "universal parser" is here well-justified. The aim has thus been to develop a goal-directed parser with a high degree of cross-linguistic validity. In implementing this aim I have again experimented with different alternatives. It will be clear that of all the elements in the input sentence it is the predicate complex which provides the richest information concerning the values to be filled in into the clause structure schema. For example, if we can interpret the predicate complex was being kicked of the input sentence: (13)
The boy was being kicked by the professor.
then we know immediately that the value for Tense is Past, for Voice: Passive, for PI Aspect: Progressive, for Polarity: [ ], for P2 Aspect: [ ], and for P3 attitude: [ ]. Moreover, if we can reconstruct the stem kick we can retrieve the corresponding predicate frame from the lexicon: (14)
[[kick],[act,dyn.contr], [[[anim],t,[ag]],[[concr],t,[pt]]]] This gives us the basic structure of the nucleus into which the term structures underlying the boy and the professor can be inserted rather easily. For this reason I experimented with a strategy which first identifies and parses the predicate complex, and then goes on to consider the rest of the input sentence. This strategy, although it can be made to work properly, is less attractive from a psychological point of view, since it is known from the
174 A universal parser
psycholinguistic literature that listeners start interpreting right from the beginning of the sentence, even in cases in which the predicate complex comes at the very end, as in SOV clauses (cf. Hesp 1990). Therefore, the parser to be discussed below does indeed start interpreting right from the first constituent in the input sentence, integrating the partial structures which it has built up with the structure derived from the predicate complex when this has been reached in the input string. Constituents (especially argument terms) which cannot be properly integrated before the predicate complex has been reached are placed in a "waiting room", so to speak, to find their proper place when the argument structure of the predicate has been retrieved through the predicate frame. Compare: (15)
Yesterday the old man died. ... was kicked by the boy. ... was given the book by John.
When we start interpreting from the beginning of the sentence, yesterday can be safely interpreted as a temporal satellite. Then, the old man can be interpreted as a term, and its term structure can be reconstructed; but its proper role in the argument structure of the predicate can only be determined when the predicate complex (died; was kicked; was given) has been interpreted. Therefore, the term structure reconstructed for the old man will be placed in a "term buffer", to be integrated after the predicate complex has been interpreted. Another principle which has been implemented in the present parser is: "integrate as soon as you can". What this means can be illustrated with the following example from an SOV sequence such as might occur in a Dutch subordinate clause: (16)
(dat) de jongen het meisje in de tuin gekust heeft. that the boy the girl in the garden kissed has 'that the boy has kissed the girl in the garden'
Consider the following steps: (17) a.
b. c.
de jongen ... This can be interpreted as a term, but it is not clear what role that term is going to play in the argument structure of the predicate, which is yet to come. Therefore the term structure is placed in the term buffer. de jongen het meisje ... The same applies to the term het meisje. de jongen het meisje in de tuin ... Here there are two possibilities: [1 ] in de tuin is interpreted as a local satellite and placed in the appropriate place in the clause schema;
Introduction 175
d.
[2] in de tuin is interpreted as an attributive adpositional modifier of ket meisje (a restrictor-3); in that case it is integrated into the term structure of the preceding term, and placed in the buffer along with it. de jongen het meisje in de tuin gekust heeft ... Now the predicate frame can be retrieved from the lexicon, all the relevant P-operators can be specified, and the terms from the term buffer can be given their proper place in the argument structure of the predicate.
The principle "integrate as soon as you can" also implies that argument terms which occur before the predicate complex are treated in a different way from those which come after it. The former terms are placed in the term buffer, from which they are integrated into the argument structure of the predicate as soon as it is available. Those terms which come after the predicate complex, on the other hand, will be immediately integrated into the argument structure. A final word about discontinuous constituents. Consider the following example: (18)
Has the clever boy laughed?
Little can be done with has at the beginning of the sentence, except that we can identify it as an auxiliary (marginally in English, but quite normally in the corresponding Dutch sentence, it could also be an instance of the lexical verb "have"). Therefore, if an auxiliary has been identified which is not followed by a sequence which can be interpreted as the rest of a predicate complex, it is kept in memory while an attempt is made to assign an interpretation to the next sequence the clever boy laughed? It turns out that the clever boy can be interpreted as a term; therefore, it is so interpreted and lifted out of the sentence. The element has is now concatenated back onto the remainder, yielding has laughed? It turns out that has laughed can be interpreted as a predicate complex, and so the reconstruction of the clause structure can be achieved. This process will be iterated in cases in which there is more material between the auxiliary and the rest of the predicate complex, as in a Dutch construction such as: (19)
de jongen heeft gisteren in de stad waarschijnlijk the boy has yesterday in the city probably een fiets gestolen. a bicycle stolen 'The boy has probably stolen a bicycle in the city yesterday'
176 A universal parser
After the interpretation of each of the constituents gisteren, in de stad, waarschijnlijk, een fiets, the auxiliary is concatenated back onto the remaining clause material, yielding a sequence of the following form: (20) a. b. c. d. e.
heeft gisteren in de stad waarschijnlijk een fiets gestolen heeft in de stad waarschijnlijk een fiets gestolen heeft waarschijnlijk een fiets gestolen heeft een fiets gestolen heeft gestolen
At each point in the sequence an attempt is made to interpret heeft plus the following material as a predicate complex. In the case of (20), this will succeed only at (20e). After that point, the interpretation and integration process can proceed. Further details of the parsing strategies used will be clarified in the commentary on the program. As for the power of the present parser: it can handle a variety of simplex active and passive sentences and a limited set of complex sentences with embedded predicational/propositional terms. Within the limits of the simplex sentence, it has considerable power, as can be seen from the following survey of what items can be correctly parsed: (21) a.
b.
c.
d.
e.
f.
Nouns — all simple nouns as listed in the lexicon; — derived nouns such as walker, kisser, etc.; — all plural forms of all nouns. Adjectival modifiers — all simple adjectives from the lexicon; — participles such as walking, kissing, etc.; — comparative adjectives such as taller, etc. Determiners — all determiners (including demonstratives and quantifiers) such as can be generated by the generator. Adpositional modifiers — all such adpositional modifiers as can be generated by the generator, and therefore complex terms such as the boy in the city, the ladies with the books of the professor. Predicates — one-, two-, and three-place active and passive verbs in present and past tense, progressive aspect, perfect aspect, as in gives, is giving, has given, has been giving, has been given and their past tense equivalents, plus their negated counterparts. Satellites — satellites of Instrument, Direction, and Beneficiary
Introduction
g.
177
— satellites of Time and Location — attitudinal satellites such as probably Sentence types — declaratives and (yes-no) interrogatives. Q-word interrogatives have not yet been handled.
Thus, the following sentence can be correctly parsed by the present parser: (22)
The clever boys of the professor have probably not been given these roses by those kissing walkers in the garden on Sunday.
13.1. The module UniPar 13.1.1. Introduction I discuss the parser "bottom-up", since this makes its operation more transparent. In actual fact it works top-down from the instructions: (23) a.
true_parse(L,S,FSC).
'true-parse the sentence S in language L onto the fully specified clause FSC' b.
deep_parse(L,S,CS).
'deep-parse the sentence S in language L onto the clause structure CS' Starting from these instructions, the parser works its way down to the interpretation of the smallest items in the sentence. In our presentation we start with these smallest items.
13.1.2. Finding and interpreting terms flnd_determiner(L,[D|Z],Z,[DEF.NUM.GEN]) ex_det(L,[DEFJNUMJGEN],D).
:-
The parser finds the values DEF, NUM, and GEN when it finds an element D in the input list (e.g. the, these, all) such that D can be generated from these values by the corresponding expression rule. Note that all "find" rules are formulated as a difference between the input string (here: [D|Z]) and the rest of the string (here: Z) which is left after the initial material has been interpreted. In the case of this rule, for example, we could have the following situation:
178 A universal parser
(24)
L [D|Z] Ζ [DEF,NUM,GEN]
= = = =
eng [this|[man,walks,.]] [man,walks,.] [[prox.def],sg,_]
By having the rest Ζ available, we know where to go on with the interpretation of the input string. Note that in this particular application of the rule, the gender GEN is not retrieved, since it plays no role in the English expression rule. find_determiner(fre,[D|Z],Z,[DEF,NUM,GEN]) : ex_det(L,[DEF,NUM,GEN],[D|RR]). We need this separate rule for French since we gave French determiners a two-part output for such cases as ce...la etc. This could be remedied by changing the expression component. find_adj(L,[AA|Z],Z,[[DEF,NUM,GEN],[[AA1],TY1]]) : bpreda(L,[[AA1],TY,ARG]), match2(AA,AA1), ex_restr2(L,[DEF,NUM,GEN],[[AA1],_], AA), add_redundant_features([[AA1],TY,ARG],[[AA1 ],TY1 ,ARG]). The parser finds the underlying adjective AA1 plus its type ΤΎ1 and potential values for DEF, NUM, and GEN if it finds a form AA in the input string such that there is a basic adjectival predicate AA1 in the lexicon from which AA can be generated through the corresponding expression rule. This rule retrieves the structure of basic adjectives such as tall, good, and clever. Note the following points about this rule: — For English it would be sufficient to simply lift the adjectival predicate frame from the lexicon, since the English adjective is invariable. But this would not do for French and Dutch, since these have inflected adjectives. For example, we want to recognize a form such as intelligentes (AA) as the feminine plural form of the basic adjective intelligent (AA1). Therefore, we must check the expression rule to see if AA can be expressed from AA1. — In checking this, we do not want to try the expression rule on any odd adjective, starting with the first one listed in the lexicon, and stopping only when intelägent has been reached. In order to avoid this, the condition "match2" has been added. This is defined as follows: match2(W,W1) : name(W,[Ll,L2|_]), name(W1,[L1 , L 2 | _ ] ) . Two forms W and W1 "match2" if they have the first two letters in common. Thus, every adjective is immediately discarded if it does not fulfil this condition. This is a first, rather primitive attempt to speed up the retrieval of the correct lexical form.
The module UniPar
179
— Since we have retrieved the basic adjective, we must add the redundant features to its type ΤΎ to yield the type ΤΎ1 which can be used in further reconstructing the underlying structure of the term. find_adj(L,[AA|Z],Z,[[DEF,NUM,GEN],[[AA1,pos],[state]]]) : decompose(AA,AA2,AA3), paradigm(L.AA1,[AA2|_]), ex_restr2(L,[DEF,NUM.GEN],[[AA1,pos],_],AA). We find the positive comparative [AA1 ,pos] of a basic adjective AA1 when we find a form AA in the input string which can be decomposed into two parts, AA2 and AA3, such that AA2 is listed in the first position of the paradigm of an adjective AA1, and AA can be derived from AA1 through the relevant expression rule. This rule identifies a form such as better as [good,pos]. Again, the rule is more complicated than is needed for English, since we must reckon with the possibility that (in French and Dutch) the comparative adjective may have been inflected. For example, we may find the French form meilleures "better, feminine plural", of which only the stem, meilleur, is listed in the paradigm. Therefore, we must check whether the initial part of meilleures (in this case, meilleur) occurs in an adjectival paradigm. For this we use "decompose", which divides any word into two nonempty parts: 2 decompose(X,Y,Z) : name(X,XN), conc(YN,ZN,XN), name(Y,YN), name(Z,ZN). Note that we have left out "match2" in the rule for finding irregular comparatives, since the comparative form may be suppletive (as in good better, bon - meilleur). But this is no problem in this case, since we retrieve the lemma via the paradigm anyway. find_adj(L,[AA|Z],Z,[[DEF,NUM,GEN],[[AA1,pos],[state]]]) bpreda(L,[[AA1],TY,ARG]), match2(AA,AA1), ex_rest Γ2(L,[DEF,NUM,GEN],[[AA1,pos],_],AA).
:-
This rule similarly identifies a productively formed positive comparative such as taller as [ t a l l , p o s ] . Note that two-word comparatives such as less tall, equally tall, more interesting are not yet recognized by the parser. f l n d _ a d j (L, [AA|Z] ,Z, [ [ DEF, NUM, GEN] , [ [ p r p , [ V | Q ] ] , T Y ] ] ) bpredv(L,[[V|Q]|QQ]), match2(V,AA), ex_restr2(L,[DEF,NUM,GEN],[[prp,[V|Q]],_],AA), add_redundant_features([[V|Q]|QQ],[[V|Q],TY,ARG]).
:-
This rule reconstructs a participle such as walking as [prp,walk] when it occurs in attributive position.
180 A universal parser
find_noun(L,[Ν|Z],Z,[pi,[[NN],TY]]) : paradigm(L,[ΝΝ,Ν]), bpredn(L,[[NN],TY1]), add_redundant_features([[NN],TY1],[[NN],TY]). We identify a noun Ν as the plural form of a basic noun NN when Ν is listed as such in the paradigm of NN. This rule retrieves a basic noun such as child from the plural form children, or foot from feet. flnd_noun(L,[N|Z],Z,[NUM,[[NN],TY]]) : bpredn(L,[[NN],TY1]), match2(NN,Ν), ex_number(L,[NUM.NN],N), add_redundant_features([[NN],TY1],[[NN],ΤΥ]). Otherwise, a productively formed singular or plural noun can be interpreted if it can be derived from a basic noun NN through number expression. find_noun(L,[N|Z],Z,[NUM,[X,Y]]) : (bpredv(L,[[V|Q]|QQ]);bpredvm(L,[[V|Q]|QQ])), (match2(V,N);(not(Q=[]),match2(Q,N))), (GEN=masc;GEN=fem;GEN=ambl), ex_noun(L,[[V|Q],[agent,GEN]],N1), ex_number(L,[NUM,Nl],N), add_redundant_features([[[V|Q],[agent,GEN]], [hum,GEN]],[X,Y]). Through this rule the parser can reconstruct the underlying structure of such nouns as walker, kisser. The rule is more complicated than is needed for these "simple" derived nouns, since we also want to be able to identify a Dutch form such as valsspeler "cheater" as the expression of underlying [ [ s p e e l . v a l s ] , [agent ,masc] ]. In that case, it is the element vats rather than speel which should "match2" with the input noun. What the rule does is try to re-generate the derived agent noun from a basic verbal predicate. Now that we know how to find and interpret determiners, adjectives, and nouns in a string we can start defining the conditions under which we find a term. A feature of the term-identifying rules to follow is that they may be marked for "true", "deep", or "deep/true" (DT), according to whether they are to function in the procedure for "deep-parsing" or "true-parsing" the sentence. In the case of terms, this difference may be relevant if the term contains a predicational/propositional substructure. Such structures get different analyses in the "deep" and the "true" parsing mode. find_term(L,DT,Z,Z1,X) : find_bterm(L,Z,Z1 ,X); flnd_mterm(L,Z,Z1,X); find_pterm(L,DT,Z,Z1 ,X). The parser finds a term structure X in language L in mode DT (= "deep" or "true") in a string Ζ (with rest Z l ) if it finds a basic "bterm", a modified "mterm" or a propositional "pterm" in the initial substring of Z.
The module UniPar
find_bterm(L,[T|Z1],Z1,[[def,sg,[x]],[[[T],TY1]]]) bpredn(L,[[Τ],TY]), member(proper,TY), add_redundant_features([[Τ],TY],[[TJ,TY1 ] ) .
181
:-
A "bterm" plus its Type and the values "definite" and "singular" are found if Ζ starts with a proper name. E.g.: (25)
Ζ = [John,walks,.] Τ = John Z1 = [walks,.] find_bterm(L,Z,Z2 J [[DEF,NUM,[χ]],[[N1 ,TY1 ] , [ ] , [ ] , [ ] ] ] ) find_determiner(L,Z,Z1,[DEF,NUM,GEN]), find_noun(L,Z1,Z2,[NUM,[N1,TY1]]).
:-
A "bterm" is also found and its structure is reconstrued if a determiner and a noun are found, in this order. For example: (26)
Ζ = [the,boy,walks,.] determiner found, leaving Z1 = [boy,walks,.] noun found, leaving Z2 = [walks,.] find_bterm(L,Z,Z3, [[DEF,NUM,[x]],[[N1,TY1],[AA,TY2],[],[]]]) : find_determlner(L,Z,Z1,[DEF,NUM,GEN]), find_adj(L,Z1,Z2,[[DEF,NUM,GEN],[AA,TY2]]), find_noun(L,Z2,Z3,[NUM,[N1,TY1 ] ] ) . The same for a sequence of determiner, adjective, and noun. find_bterm(fre,Z,Z3, [[DEF,NUM,[X]],[[N1,TY1],[AA,TY2],[],[]]]) : flnd_determiner(fre,Z,Z1,[DEF,NUM,GEN]), find_noun(fre,Z1,Z2,[NUM,[N1 ,TY1 ] ] ) , member(GEN,TY1), find_adj(fre,Z2,Z3,[[DEF,NUM,GEN],[AA,TY2]]). The same in French for a sequence of determiner, noun, and adjective. find_bterm(L,Z,Z1,[[indef.pl,[χ]],[[N1,TY1],[],[],[]]]) not(find_determiner(L,Z,Z1,[DEF,NUM,GEN])), find_noun(L,Z,Z1,[pi,[N1 ,TY1]]).
:-
The parser finds an indefinite plural term in a sequence such as [ c h i l d r e n , a r e , crying ] since, although it does not find a determiner, it does find a plural noun. find_bterm(L,Z,Z2, [[indef,pi,[x]],[[N1,TY1],[AA,TY2],[],[]]]) :not(find_determiner(L,Z,Z1,[DEF,NUM,GEN])), find_adj(L,Z,Z1,[[DEF,pi,GEN],[AA,TY2]]), find_noun(L,zi,Z2,[pi,[N1,TY1 ] ] ) .
182 A universal parser
The same for interpreting a sequence such as [poor, c h i l d r e n , are,crying]. The above rules need not be adapted to French, since indefinite plural terms in French are marked by the overt determiner des: des enfants pauvres will be recognized by the regular term-identifying rule. find_pterm(L,DT,Z,Z2, [ί * > s 9 » * ] , [ [ P R O P , [ p r o p , f a c t , s u b J o n e , i n a n i m , m a s c ] ] ] ]) : find_sub(L,Z,Z1), find_prop(L,DT,Z1,Z2,PR0P). A "pterm" (propositional term) is found and reconstrued in mode Deep or True when at the beginning of Ζ a subordinator is found, and the rest Z1 can be interpreted as a proposition in the same mode. How a proposition is interpreted will be defined below. Some features have been added to the type of the propositional term in order to secure proper term "matching". For example, propositional terms of this type can only be inserted into argument (or satellite) positions marked by selection [prop], [ p r o p , f a c t ] , or (in French) [ p r o p , f a c t , s u b j o n c ] . find_sub(L,[SUB|Z],Z) subl(L,SUB).
:-
The parser finds a subordinator when it finds an element SUB which is defined (in the generator) as a "subl" of the language. These are English that, French que, and Dutch dat. Note that the above rule for identifying propositional terms is only a first step towards parsing embedded constructions. find_full_term(L,DT,Z,Z2,[TERM,FUN]) find_prep(L,Z,Z1,FUN), find_term(L,DT,Z1,Z2,TERM).
:-
A full term is found in Ζ when Ζ starts with a preposition and the rest of Z, Z l , starts with a term. find_prep(L,[PREP|Z],Z,FUN) : ex_fun(L,FUN,PREP). A function F U N is retrieved from an input sequence which starts with a preposition P R E P when P R E P can be generated from FUN through the relevant expression rule. For example, for is reconstrued as [ben] (Beneficiary). find_full_term(L,DT,Z,Z1,[[idiom],ID,FUN]) bpredv(L,[S,T,ARG]), member(AA,ARG), AA=[[idiom],ID,FUN], conc(ID,Z1,Z).
:-
The parser also finds a full term when it encounters a sequence such as [ t h e , b u c k e t , . . . ] of which the initial part is identical to a term marked
The module UniPar 183
"idiom" in some predicate frame of the language. Note that an idiomatic term of this type can also be reconstrued as a productively formed term, so that the ambiguity between the literal and the idiomatic readings of an expression such as: (26)
John kicked the bucket.
is resolved in the parsing process. find_mterm(LJZ,Z2J[OPS,[R1,R2,[[],T,F],[]]]) find_bterm(L,Z,Z1,[OPS,[R1,R2,[],[]]]), find_full_term(L,DT,Z1,Z2,[T,F]).
:-
An "mterm" (modified term) is found in a situation such as: (27)
Ζ = [the,boy,in,the,garden. ] [ t h e , b o y ] is interpreted as a "bterm" Z1 = [ i n , t h e , g a r d e n . ] [ i n , t h e , g a r d e n ] is interpreted as a "full term" and entered into the R3 position of the bterm Z2 = [ . ] Since the rule for "mterm" is formulated recursively, it will also correctly identify and reconstruct the following strings as possible "mterms": (28) a. b.
[the,boy,of,the,professor,in,the,city] [the,book,of,the,lady,with,the,roses,for,the,painter]
Note that such a sequence can also be interpreted as a term followed by a satellite, so that we get two analyses for a sentence such as: (29)
John kicked the boy in the garden.
and multiple analyses for: (30)
John kicked the boy in the garden in the city.
13.13. Finding and interpreting satellites We can now proceed to defining the conditions under which satellites are found. In recognizing productively formed satellites we get a lot of value out of the preposition; lexical satellites are simply matched with their lexical entries. find_sat 1 i(L,Ζ,Z1,[[instrument],T,[instr]]) : find_full_term(L,Ζ,Z1,[T,[instr]]). The parser finds a (potential) instrumental satellite if it finds a full term, the preposition of which (e.g., with) can be interpreted as signalling the function [instr].
184 A universal parser
find_sat1d(L,Z,Z1,[[place],T,[dir]]) find_full_term(LJZ,Z1,[T,[dir]]).
:-
Idem for directional satellites. find_sat1b(L,Z,Z1,[[anim],T,[ben]]) :find_full_term(L,Ζ,Z1,[T,[ben]]). Idem for beneficiary satellites. find_sat21oc(L,Ζ,Z1,[[place],Τ,[loc]]) find_full_term(L,Z,Z1,[T,[loc]]).
:-
Idem for locative satellites of level 2. find_sat2temp(L,Z,Z1,[[time],T,[temp]]) find_full_term(L,Ζ,Z1,[T,[temp]]).
:-
Idem for temporal satellites of level 2. For recognizing the lexical satellites we must make sure that the chances for these satellites are set at 100%. find_sat2temp(L,[T|Z1],Z1,[T]) sat2temp(L,[T],_).
:-
The parser also finds a temporal satellite if it finds a form Τ (e.g. yesterday) which has been defined as such a satellite in the lexicon. find_sat3(L,[T|Z1],Z1,[T]) sat3(L,[T],_,_).
:-
A satellite of level 3 is found when a form Τ (e.g. probably) has been found which is defined as such in the lexicon.
13.1.4. Finding and interpreting predicate complexes We can now proceed to see what information can be derived from the predicate complex. We use the predicate complex for retrieving the predicate frame from the lexicon and for reconstructing most of the values for the relevant P-operators (except the illocution). We start by defining the conditions under which the various bits and pieces (auxiliaries, present and past participles) are found which may form part of the predicate complex. find_aux(L,[A|Z],Z,[[TE,NUM,PERS,GEN],X]) : auxiliary(L,X,AUX), ex_tense(L,[[TE,NUM,PERS,GEN],_,[AUX|Z]],[A|Z]). The parser finds values for Tense, Number, Gender, and "X" (where X = perf, progr, pass, neg, etc.), when it finds an element A which can be generated from an element AUX which is defined as the AUX for "X" below:
The module UniPar 185
auxiliary(eng,perf,have). auxiliary(eng,progr,be). auxiliary(eng,pass,be). auxiliary(eng,neg,do). auxiliary(fre,perf,av). auxiliary(fre,[pert,intr],et). auxiliary(fre,pass,et). auxiliary(dut,[pert,pass],zijn). auxiliary(dut,pert,heb). auxiliary(dut,[pert,intr],zijn). auxiliary(dut,pass,word). Note that this is one of the few places in the parser where language-specific information is entered as such, solely to aid the parsing process. Though it is possible to retrieve the relevant AUX through inverted application of the expression rules, this became rather complicated, so that I decided to follow this more direct course here. Some further work on the expression rules is required to adapt them in such a way that here, too, the language-independent nature of the parser can be maintained. find_inf(L,[INF|Z],Z,[[INF|Q],ΤΥ,Α]) : (bpredv(L,[[INF|Q]|QQ]); bpredvm(L,[[INF|Q]|QQ])), add_redundant_features([[INF|Q]|QQ],[[INF|Q],ΤΥ,Α]). The parser finds an infinitive when it finds a form such as kick which as such is the form of a basic verbal predicate in the lexicon. For the moment, this rule only works for English, where it is used for interpreting sequences such as [does,not,walk] etc. find_prp(L,[PRP|Z],Z,progr) : auxiliary(L,progr,AUX), ex_prp(L,[AUX|Q],[PRP|Q]), !. This rule finds the present participles of auxiliaries: being and having. Again, it is only used in parsing English sequences such as [ i s , b e i n g , K i s s e d ] . find_prp(L,[PRP|Z],Z,[[S|Q],TY,A]) : (bpredv(L,[[S|Q]|QQ]); bpredvm(L,[[S|Q]j QQ])), match2(S,PRP), ex_prp(L,[S|Q],[PRP|Q]), add_redundant_features([[S|Q][QQ],[[S|Q],ΤΥ,Α]). This rule finds the present participle of basic lexical verbs, such as walking, kissing, deploring. Again it is as yet only used in parsing English sequences such as [ i s , w a l k i n g ] , [has,been,walking], etc. find_pap(L,[PAP|Ζ],Ζ,ΧΧ) : auxiliary(L,XX,AUX), ex_pap(L,[AUXIQ],[PAP|Q]). This rule finds the past participles of auxiliaries (e.g. Eng. been, Fr. ete, Dut. geweest).
186 A universal parser
find_pap(L,[PAP|Z],z,[[S|Q],TY,A]) : paradigm(L,S,[PAP|_]), (bpredv(L,[[S|Q]|QQ]); bpredvm(L,[[S|Q]|QQ])), a d d _ r e d u n d a n t _ f e a t u r e s ( [ [ S | Q ] | Q Q ] , [ [ S | Q ] ,ΤΥ,Α]). An irregular past participle of a basic lexical verb (e.g. gone) is retrieved through the paradigm of the predicate. flnd_pap(L,[PAP|Z],Z,[[S|a],ΤΥ,Α]) : (bpredv(L,[[SIQ]|QQ]); bpredvm(L,[[S|Q]|QQ])), match2(S,PAP), ex_pap(L,[S|Q],[PAP|Q]), add_redundant_features([[S|Q]|QQ],[[S|Q],ΤΥ,Α]). The productive past participle of a lexical verb is identified through regenerating it from the verbal stem. find_pap(dutJtPAP|Z]JZ,ttS|Q],TY,A]) : decompose(PAP,ge,PAP1), (bpredv(dut,[[S|Q]|QQ]); bpredvm(dut,[[S|Q]jQQ])), match2(S,PAP1), ex_pap(dut,[S|Q],[PAP|Q]), add_redundant_features([[S|Q]|QQ],[[S|Q],ΤΥ,Α]). For Dutch we need a slightly adapted rule, since most productively formed past participles have the prefix g«- (e.g. maken "make" --> gemaakt "made"). Therefore, the prefix is chopped off before the match with a potential lexical source item is made. find_pap(fre,[PAP|Z],Z,[[S|Q],ΤΥ,Α]) :(bpredv(fre,[[S|Q]|QQ]); bpredvm(fre,[[S|Q]|QQ])), match2(S,PAP), num(NUM), gender(GEN), ex_pap(fre,[NUM,GEN],[S|Q],[PAP|Q]), add_redundant_features([[S|Q]|QQ],[[S|Q],ΤΥ,Α]). For French past participles I have had to formulate this separate rule since past participles are potentially sensitive to Number and Gender, and the expression rules for past participle expression have not yet been regularized across the three languages (unlike what was done in the case of attributive adjectives). Now that we have formulated the rules for identifying the potential components of the verbal complex, we can proceed to the interpretation of the predicate complex as a whole. In interpreting the verbal complex the parser tries to extract all elements of information contained in it, and places these in their proper position within a structure which is identical to the structure of
The module UniPar 187
the fully specified clause minus the satellites. In this way, a considerable part of the underlying clause structure is reconstrued through the verbal complex. We start with the case in which the verbal complex consists of only the (finite) lexical verb: find_pred(L,[V|Z1],Z1, [ [ I L L Oa,c[ e e ] ] , [ [ [ ] , [ x x ] ] , [ [ [ Τ , N , P , G ] , P O L , [ ] , [ e ] ] , [[]>[[ [ "t] > [S|Q] ] ,ΤΥ,Α] ] ] ] ]) : identify_lemma(L,V, [ [S|Q] |QQ]), ex_tense(L,[[T,N,P,G][S|Q]],[V|Q]), add_redundant_features([[S|Q]|QQ],[[S|Q],ΤΥ,Α]). When a form such as walks is found in the input list it follows that: — the predicate frame for walk can be retrieved from the lexicon through "identify-lemma"; — the Voice must be [ act ] (active); — the clause is non-progressive (PI = [ ]); — the clause is non-perfect (the position for "perf' = [ ]); — there is no modal operator (attitudinal operator = [ ]); — the tense Τ must be pres; — the number Ν of the subject must be singular; — the person Ρ of the subject must be third person; — the gender G of the subject can be reconstrued if it plays a role in the tense expression rule of the language. All this information is entered into the output structure at the appropriate positions. identify_lemma(L,V,[[S|Q]|QQ]) paradigml(L,_,_,S,LIST), member(V,LIST), not(V=[ ] ) , (bpredv(L,[[S|Q]|QQ]); bpredvm(L,[[S|Q]|QQ])).
:-
The lemma of irregular forms (e.g., goes) is retrieved through the paradigm. identify_lemma(L,V,[[S|Q]|QQ]) : (bpredv(L,[[S|Q]|QQ]); bpredvm(L,[[S|Q]|QQ])), match2(S,V). The lemma of productive forms is retrieved through matching with basic verbal predicates. find_pred(L,Z,Z2, [[ILLO,[ee]],[[[],[xx]],[[[T,N,P,G],POL,[],[e]], [progr,[[[act],[S|Q]],ΤΥ,Α]]]]]) :find_aux(L,Z,Z1,[[T,N,P,G],progr]), find_prp(L,Z1,Z2,[[S|Q],TY,A]).
188 A universal parser The parser finds and reconstructs a verbal complex when it finds a progressive auxiliary and a present participle, in this order. flnd_pred(L,Z,Z2, [ [ILLO,[ee]],[[[],[XX]],[[[T,N,P,G],POL, [ ], [e] ], [[]>[[[act],[S|Q]],TY,A]]]]]) :find_aux(L,Z,Z1,[[T,N,P,G],neg]), flnd_inf(L,Z1,Z2,[[S|Q],ΤΥ,Α]). Idem, for when it finds a sequence such as [does,walk] such that the first item can be interpreted as a negative auxiliary and the second as an infinitive. "Emphatic" auxiliary might be a better term here, since this rule is in fact used for interpreting both negative and interrogative sentences in English, such as: (31) a. b. c.
Does the sailor walk? The sailor does not walk. Does the sailor not walk?
How the parser encounters the sequence [does,walk] in interpreting these sentences will be clarified below. find_pred(dut,Z,Z2, [[ILLO,[ee]],[[[],[XX]],[[[T,N,P,G],POL,pert,[e]], t[],[[[pass],[S|Q]],ΤΥ,Α]]]]]) :find_aux(L,Z,Z1,[[T,N,P,G],[pert,pass]]), find_pap(L,Z1,Z2,[[S|Q],ΤΥ,Α]), ! . This rule identifies perfect passive constructions in Dutch when it finds the "perf,pass" auxiliary zijn "be" and a past participle, in this order. A separate rule for Dutch is needed since this auxiliary combines the expression of the two operators: (32)
Jan is geschopt. John is kicked 'John has been kicked'
find_p red(L,[AUX,PAP|Z2],Z2, [[ILLO,[ee]],[[[],[xx]],[[[T,N,P,G],POL,pert, [e]], [ [ ] ι [ [ [ a c t ] , [ S | Q ] ] , Τ Υ , Α ] ] ] ] ] ) : - (L=dut;L=fre), find_pap(L,[PAP|Z2] ) Z2,[[S|Q],TY,A]) > member(tel,TY), find_aux(L,[AUX|Z1],Z1,[[T,N,P,G],[pert,intr]]). In Dutch and French an active perfect construction may have the auxiliary zijn/etre "be" if the type of the predicate underlying the past participle contains the feature "tel" ( = 'telic'). find_pred(L,Z,Z2, [[ILLO.IeelMnMxxlMnT.N.P.GJ.POL.perf.Ie]], [[],[[[act],[S|Q]],ΤΥ,Α]]]]]) :find_aux(L,Z,Z1,[[T,N,P,G],perf]), find_pap(L,Z1,Z2,[[S|Q],TY,A]).
The module UniPar 189
Otherwise, a perfect active predicate complex is found when a perfect auxiliary is found (this will be have/heb/av), followed by a past participle. find_pred(LJZ,Z2, [[ILLO,[ee]],[[[],[xx]],[[[Τ,Ν,P,G],POL,[],[e]], [[].[[[pass],[S|Q]],ΤΥ,Α]]]]]) :find_aux(L,Z,Z1,[[Τ,Ν,P,G],pass]), flnd_pap(L,Z1,Z2,[[S|Q],ΤΥ,Α]). A passive verbal complex is found when a passive auxiliary is followed by a past participle. find_pred(L,Z,Z3, [[ILLO,[ee]],[[[],[xx]],[[[Τ,Ν,P,G],POL,pert,[e]], [[]»[[[pass],[S|Q]],ΤΥ,Α]]]]]) :find_aux(L,Z,Z1,[[Τ,Ν,P,G],pert]), find_pap(L,Z1,Z2,pass), find_pap(L,Z2,Z3,[[S|Q],ΤΥ,Α]). A perfect passive verbal complex is found when a perfect auxiliary is followed by the past participle of a passive auxiliary, followed by the past participle of a lexical verb, as in [ h a s , been, kissed ]. find_pred(L,Z,Z3, [[ILLO,[ee]],[[[],[xx]],[[[Τ,Ν,P,G],POL,perf,[e]], [progr,[[[act],[S|Q]],ΤΥ,Α]]]]]) :find_aux(L,Z,Z1,[[Τ,N,P,G],perf]), find_pap(L,Z1,Z2,progr), find_prp(L,Z2,Z3,[[S|Q],TY,A]). Idem, for sequences such as [ h a s , b e e n , k i s s i n g ] . Finally, we define what conclusion can be drawn from the final punctuation of the clause: find_illo(L,[.],[[ILLO|Z]|ZZ],[[decl|Z]|ZZ]). find_illo(L,[?],[[ILLO|Z]jzz],[[interr|Z]|ZZ]). This punctuation allows the parser to reconstruct the illocution of the underlying clause structure.
13.1.5. Integrating underlying structures We have now defined which conclusions can be drawn on the basis of various types of subsequences found in the input string. Our next task is to define how all this information can be integrated in such a way that a full input sequence yields a well-formed underlying clause structure as output. This could be done by the same method as was used in compositionally building up terms, full terms, and predicate complexes from the component parts. For example, we could formulate a rule to the effect that:
190 A universal parser
(33)
You find a clause structure if: you find a term; you find a predicate complex; you find a term; you find a satellite; you find an illocution; in this order.
This algorithm would be fine for all sentences of the general form: (34)
The sailor kissed the lady in the garden.
However, the algorithm relies on the prestated composition and ordering of a particular sentence type. For other sentence types within and across languages we would need other rules. Given the freedom of occurrence of constituents at the clause level, the number of rules would multiply, and the rule system would break down on the first sentence of which composition and order had not been declared from the beginning. Therefore, I have implemented a much more flexible parsing strategy at the clause level. This parsing strategy relies on the following considerations: as far as the parsing capacity of the present parser goes, every parsable sentence can be exhaustively divided into: (35)
terms full terms predicate complex satellites negation illocution
Let us now define a "pass" on an arbitrary input string Ζ as the successful interpretation of the initial part of Ζ as any of the items listed in (35), leaving a rest Z l . Consider the following example: (36)
input string: [the,sailor,kissed,the,lady,.] pass through "term", rest: [kissed,the,lady,.] pass through "predicate", rest: [the,lady,.] pass through "term", rest: [.] pass through "illocution", rest: []
The module UniPar 191
We can thus define the parsing process as as a successful series of "passes" through the input sequence. Each "pass" reconstructs the structure which can be retrieved from the initial part of that sequence, and defines which action can be taken towards integrating that structure into the underlying schema of the clause. Then, it hands over the rest of the input sequence to the next "pass" trial. A continuous series of passes is successful if it "eats its way" through to the final elements [ . ] or [ ? ], which are then used to identify the illocution. A disadvantage of this strategy is that at each transition or trial point a lot of unsuccessful trials will be made before a possible "pass" is found. This disadvantage is counterbalanced by some rather formidable advantages: [a] since the order of the units listed in (35) need not be declared in advance, we need many fewer rules than would otherwise be necessary. [b] for the same reason, the same rules can be used for sentences with quite different ordering patterns. For example, all the permutations of the following units will receive the same interpretation: (37) a. b. c. d. e. f.
[the lady] [walked] [to the station] the lady walked to the station the lady to the station walked walked the lady to the station walked to the station the lady to the station the lady walked to the station walked the lady
[c] this also implies, as can be seen in (37), that sequences ungrammatical in L can nevertheless get a correct parse in L. Unless the parser is used to check the correctness of the input sequence, this is an advantage rather than a disadvantage, since ordinary speech may contain such ungrammatical sequences without understanding being necessarily impaired. [d] moreover, the sequences ungrammatical in L may be grammatical in LI. For example, the equivalent of (37f) is grammatical in Dutch main clauses; the equivalent of (37b) is grammatical in Dutch subordinate clauses; and the equivalent of (37c) is grammatical in Dutch questions. None of these variations need be declared beforehand for the present parser to work. Nor need the number of units from (35) to be encountered in a sentence be declared beforehand. deep_parse(L,Z,CS3) : read_in(Z), CS=[P4,[P3,[[T,[],A,R],[P1,[[[],S],TY, [ ]], [[*],[*],[*],[*]]],[[*],[*],[*].[*]]],[*]],[*]], pass_list(L,deep,Z,[Z1],CS,CS1), find_illo(L,[Z1],CS1,CS2), remove_voice(CS2,CS3).
192 A universal parser
In order to "deep-parse" a sentence in L and obtain its clause structure CS3 the parser reads in the sentence from the screen as the input list Z, sets ready an empty schema CS for the clause structure to be reconstrued, tries to pass through Ζ in the "deep" mode until it reaches the final element [Z1 ] (i.e., the punctuation mark), reconstructing the clause structure CS1 as it goes along, then retrieves the illocution from [Z1 ], obtaining clause structure CS2, and finally removes the Voice from CS2 to get CS3. The Voice ([ act ] or [ pass]) is removed because it is irrelevant to the level of "deep" clause structure; it is removed at the end because it is relevant to the reconstruction process. true_parse(L,Z,CS2) : read_ln(Z), CS=[P4,[P3,[[T,[],A,R],[P1,[[[],S],TY, [)], [I*],!*],!*],!*]]],!!*],!*],!*],!*]]],!*]],!*]], p a s s _ l i s t ( L , t rue,Ζ,[Z1],CS,CS1), f i n d J - l l o i L , [Z1 ] ,CS1 ,CS2). In order to "true-parse" an input sentence the same steps are taken, except that the parser now works its way through the input list Ζ in the "true" mode, and the Voice is not removed. The output in this case is a "fully specified clause" rather than a "deep" clause structure. flnd_prop(L > deep,Z,Z1,PROP) : CS=[P4,[P3,[[T,[],A,R],[P1,[[[],S],TY,[]], [[*],[*],[*],[*]]],II*],[*],[*],[*]]],[*]],[*]], p a s s _ l i s t ( L , d e e p , Z , Z 1 ,CS,CS1), remove_voice(CS1,CS2), CS2=[P4,PROP,S4]. find_prop(L,true,Z,Z1,PROP) : CS=[P4,[P3,[[T,[],A,R],[P1,[I[],S],TY,[]], [[*],[*],[*],[*]]],[[*],[*],(*],[*]]],[*]],[*]], pass_list(L,true,Z,Z1,CS,CS1), CS1=[P4,PR0P,S4]. Finding a proposition analogously works through a part of the input sequence to see whether it can reconstruct the propositional part of an underlying clause structure ("deep") or of an underlying "fully specified clause" ("true"). These procedures are used to interpret propositions that are part of propositional terms, such as [ t h a t , m a r y , c h e a t e d ] in a sequence such as: (38)
[John,deplored,that,mary,cheated,.]
Passing through a list is defined recursively in the following clauses. Stop conditions are [ ] (no more material in the list), or [. ] or [ ? ] (end of the list reached): p a s s _ l i s t ( L , _ , [ ] , [ ] , C S , C S ) : - 1. p a s s _ l i s t ( L , _ , [ . ] , [ · ] , C S , C S ) : - I. pass_list(L,_,[?],[?],CS,CS) !.
The module UniPar 193
pass_list(L,DT,Ζ,Z2,CS,CS2) : pass(L,DT,Z,Z1,CS,CS1), pass_list(L,DT,Z1,Z2,CS1,CS2). Otherwise, we can pass through a list in mode DT (deep/true) by doing a pass on the initial subsequence of the list, and then list-passing the rest of the list (until one of the stop conditions is reached). In the following clauses, the individual possible passes are defined. Each pass retrieves the information found at that particular point in the sequence, and defines which action can be taken towards integrating that information into the underlying structure. pass(L,DT,Z,Z1,CS,CS1) : CS=[P4,[P3,[P2,[P1,[[[],S],TY,AR],S1],S2],S3],S4], (find_full_term(L,DT,Z J Z1 ,T1); find^ermiL.DT.Z.ZIjTI)), concat(AR,[T1],AR1), CS1=[P4,[P3,[P2,[P1,[[[],S],TY,AR1],S1],S2],S3],S4]. In any mode, when a term or a full term is found before the predicate is reached (= the Voice slot on the predicate is still empty), the term structure T1 is concatenated to the argument slot AR. Initially, this argument slot is empty. It is here used as a "term buffer" for those terms which cannot yet be properly integrated into the argument structure of the predicate, which is yet to come. In SVO patterns such as are common in English and French, this strategy will only be used for Subject terms, as in: (39) a. b.
In the city the tall lady
... kissed the sailor. ... was kissed by john.
A first pass will interpret in the city as a locative satellite. A second pass will interpret the tall lady as a term, which might plausibly be interpreted as the Subject, but cannot be further integrated, as the example shows: in (39a) the term will have to be interpreted as the Agent argument, in (39b) as the Patient argument of the predicate. Therefore, the term structure is parked in the buffer to wait for the predicate to be interpreted. In SOV sequences such as can be found in Dutch, the buffer may be filled with several terms before the predicate allows these terms to find their proper place: (40) a.
b.
Gisteren heefl Jan [1] het boek [2] waarschijnlijk aan Marie [3] gegeven yesterday has John the book probably to Mary given Piet gelooft dat Jan [1] het boek [2] gisteren aan Marie [3] gegeven heeft. Peter believes that John the book yesterday to Mary given has
194 A universal parser
These terms will thus be lined up in the buffer until their status can be determined. Note that such satellites as gisteren and waarschijnlijk will be properly interpreted as soon as they have been identified. pass(L,deep,Z,Z1,CS,CS1) : CS=[P4,[P3,[[_,POL|_],[P1,[[[],S],TY,TT],S1 ] , S 2 ] , S 3 ] , S 4 ] , find_pred(L,Z,Z1,PR), PR=[P41,[P31,[[[T|_],_,ASP,[e]], [PROGR,[[VCE.SS],TY1,AR]]]]], match_terms(VCE,deep,TT,AR,AR1), CS1 = [P41,[P31,[[T,POL,ASP,[e]], [PROGR,[[VCE,SS],TY1,AR1],S1],S2],S3],S4]. A pass in the "deep" mode can be made at a predicate complex by integrating the material found through the predicate complex into the clause structure, and integrating the terms which have accumulated in the term buffer (TT) with the argument structure (AR) of the predicate frame through the "deep" version of "match terms", monitored by the Voice of the predicate. "Matching terms" is defined below. pass(L,true,Z,Z1,CS,CS1) : C S = [ P 4 , [ P 3 , [ [ _ j P O L | _ ] , [ P 1 , [ [ [ ] , S ] , Τ Υ , Τ Τ ] , S 1 ] , S 2 ] , S 3 ] ,S4], find_pred(L,Ζ,Z1,PR), PR=[P41 ,[P31,[[T,_,ASP,[β]],[PROGR,[[VCE.SS],TY1 ,AR]]]]], match_terms(VCE,true,TT,AR,AR1), CS1 = [P41,[P31,[[T,POL,ASP,[e]], [PROGR,[[VCE.SS],TY1,AR1],S1],S2],S3] ,S4] . Idem for the "true" mode, with the only difference that now the full agreement features ([T,N,P,G] = Tense, Number, Person, and Gender) are integrated, and "match terms" is done in the "true" mode. What happens in the case of discontinuous verbal complexes as in [ h a s , t h e , l a d y , l a u g h e d , ? ] ? . This is defined in the following pass: pass(L,DT,[A|Ζ],Z2,CS,CS1) flnd_aux(L,[A|Z],Z,_), not(find_rest(L,Z,_,_)), pass(L,DT,Z,Z1 ,CS,CS1), concat([A],Z1,Z2). find_rest(L,Z,Z1,X) :find_pap(L,Z,Z1,X); find_prp(L,Z,Z1 ,X); find_inf(L,Z,Z1,X).
:-
When an auxiliary is found in such a situation that the following material cannot be interpreted as the rest of a predicate complex (i.e. no infinitive or present/past participle is found), a pass is tried on the material following the auxiliary, and when that succeeds the material is taken out of the input sequence and the auxiliary is concatenated back onto the remainder of the sequence. The following example may illustrate this:
The module UniPar
(41)
195
Ζ = [has,the,lady,laughed,?] an auxiliary is found: AUX = [ h a s ] , Z1 = [ t h e , l a d y , l a u g h e d , ? ] but no appropriate "rest" is found in Z l ; therefore, other passes are tried on Z l , and a term is found: [ t h e , l a d y ] , leaving: Z11 = [ l a u g h e d , ? ] now, the AUX [ has ] is concatenated back onto Z l l , yielding Z2 = [ h a s , l a u g h e d , ? ] which can now be interpreted as a predicate complex.
Note that this is an effective procedure for interpreting discontinuous elements at an arbitrary distance from their partners. We now define what happens when a term or full term is found after the predicate complex has been identified: pass(L,DT,Z,zi ,CS,CS1) :CS=[P4,[P3,[P2,[P1,[[VCE,SS],TY,AR],S1],S2],S3],S4], not(VCE=[ ]), (find_full_term(L,Z,Z1 ,T1); find_term(L,DT,Z,Z1 ,T1)), match_terms(VCE,DT,[T1],AR,AR1), CS1=[P4,[P3,[P2,[P1,[[VCE,SS],TY,AR1],S1],S2],S3],S4]. In this situation the term structure will be immediately integrated into the argument structure of the predicate through the procedure "match terms", to be defined below. pass(L,DT,[NEG|Z1],Z1,CS,CS1) :CS=[P4,[P3,[[T,POL,A,R],C0RE,S2],S3],S4], negation(L,NEG), CS1=[P4,[P3,[[T,neg,A,R],C0RE,S2],S3],S4]. negation(eng,not). negation(dut,niet). When the negator of the language in question is passed the value "neg" is entered into the polarity slot of the clause structure under (re)construction. This rule has not yet been adapted to French. The following rules define passes on satellites: pass(L,_,Z,Z1,CS,CS1) CS = [P4,[P3,[P2,[P1,NUC,[Μ,I,D,Β]],S2],S3],S4], find_sat1i(L,Ζ,Z1,S), CS1 = [P4,[P3,[P2,[P1,NUC,[M,S,D,Β]],S2],S3],S4]. In any mode, when an instrumental satellite S is found, this can be inserted into the appropriate place (marked "I") of the clause structure schema. pass(L,_,Z,Z1,CS,CS1) : CS = [P4,[P3,[P2,[P1,NUC,[Μ,I,D,B]],S2],S3],S4], find_sat1d(L,Z,Z1,S), CS1 = [P4,[P3,[P2,[P1,NUC,[Μ,I,S,B]],S2],S3],S4].
196 A universal parser
Idem for when a directional satellite is found. pass(L,_,Z,Z1,CS,CS1) : CS = [P4,[P3,[P2,[P1,NUC,[Μ,I,D,B]],S2],S3],S4], flnd_sat1b(L,Z,Z1,S), CS1 = [P4,[P3,[P2,[P1,NUC,[Μ,I,D,S]],S2],S3],S4]. Idem for when a beneficiary satellite is found. pass(L,_,Z,Z1,CS,CS1) : CS = [ P4,[P3,[P2,CORE,[[*]|RR]],S3],S4], find_sat2loc(L,Z,Z1,Sloc), CS1 = [P4,[P3,[P2,CORE,[Sloe|RR]],S3],S4]. Idem for when a locative satellite is found. pass(L,_,Z,Z1,CS,CS1) : CS = [P4,[P3,[P2,CORE,[Sloe,[*]|RR]],S3],S4], find_sat2temp(L,Z,Z1,Stemp), CS1 = [P4,[P3,[P2,CORE,[Sloe,Stemp|RR]],S3],S4]. Idem for when a temporal satellite is found. pass(L,_,Z,Z1,CS,CS1) : CS = [P4,[P3,EXTPRED,S3],S4], find_sat3(L,Z,Zl ,S), CS1 = [P4,[P3,EXTPRED,S],S4]. Idem for when a satellite-3 (attitudinal satellite) is found. remove_voice(CS,CS1) : CS=[P4,[P3,[P2,[P1,[[VCE,S],TY,TT],S1],S2],S3],S4], CS1 = [ P 4 , [ P 3 , [ P 2 , [ P 1 , [ S , T Y , T T ] , S 1 ] , S 2 ] , S 3 ] , S 4 ] . At the last step in the "deep-parsing" procedure the Voice is removed, since this is not specified at the level of clause structure.
13.1.6. Matching terms with argument positions The procedures for matching terms with argument positions will have to reckon with different conditions: — when the predicate is identified, "match terms" must be able to integrate the whole series of terms collected in the term buffer into the argument structure of the predicate. — when a term is found after the predicate has been identified, this single term must be integrated into the argument structure. — in the "deep" mode, inserting the correct terms into the correct argument positions is sufficient; in the "true" mode, it must also be determined which terms carry Subject and Object function.
The module UniPar 197
— the Voice of the verb must be used in order to arrive at a correct match between terms and argument structure. The following clauses take care of these different conditions. match_terms(_,_,[],Α,Α) : - 1. When the term buffer is empty, "match terms" has no effect on the argument structure. match_terms(VCE,DT,TT,AR,AR1) : member(IDT,TT), IDT=[[idiom],T,F], member(IDT,AR), delete(IDT,TT,TT1), !, match_terms(VCE,DT,TT1,AR,AR1). In any mode and Voice, when an idiomatic term has been found this can be matched with arguments AR which contain the same term. For example, the full term [ [ idiom ], [ t h e , bucket ], [ ] ] is matched with the identical full term in the predicate frame. match_terms(VCE,DT,TT,AR,AR2) : member([T,F],TT), T=[OPS,[[Ν,TY]|RR]], member(AA,AR), AA=[S,t, F ], sublist(S,TY), AA1=[S,T,F], subst(AA,AR,AA1,AR1), d e l e t e ( [ T , F ] , τ τ , τ τ ΐ ) , !, match_terms(VCE,DT,TT1,AR1,AR2). In any mode and any Voice, when a full term has been found with a semantic function F, this term is inserted into the corresponding argument position with function F. For example, the term structure underlying by john will always be placed in the Agent argument slot, and the structure underlying to the lady will be placed in the Recipient slot (unless it is interpreted as a directional satellite). Possible further terms in the term buffer will be matched according to the same mode and Voice. We are now left with only terms which are not overtly marked for their semantic function. The following cases may be distinguished: (42) a. b. c. d. e. f.
Active [John] walked. [John] kissed [Mary]. [John] gave [the book] to Mary. [John] gave [Mary] [the book]. Passive [John] was kissed by Mary. [John] was given [the book] by Mary.
198 A universal parser
In the passive condition, if the t e r m structure to b e inserted is the structure underlying John, w e first check w h e t h e r t h e r e is a Recipient slot which is still empty. If so, o u r t e r m structure is inserted into that slot: match_terms([pass] ,deep,TT,AR,AR2) : member(T,TT), T=[OPS,[[N,TY]I RR]], member(AA,AR), AA=[S,t,[rec]], sublist(S,TY), AA1=[S,T,[rec]], subst(AA,AR,AA1,AR1), delete(T,TT,TT1), I, match_terms([act],deep,TT1,AR1,AR2). W h e n parsing in t h e " t r u e " m o d e these same conditions also allow Subject function t o b e assigned to t h e inserted term: match_terms([pass],true,TT,AR,AR2) :member(T,TT), T=[OPS,[[Ν,TY]|RR]], member(AA,AR), AA=[S,t,[rec]], sublist(S,TY), AA1 = [S,T,[subj,rec]], subst(AA,AR,AA1,AR1), delete(T,TT,TT1), !, match_terms([act],deep,TT1,AR1,AR2). W h e n t h e r e is n o o p e n Recipient position available in the passive condition, we check w h e t h e r t h e r e is an o p e n P a t i e n t position. If so, o u r term is inserted into that position: match_terms([pass],deep,TT,AR,AR2) :member(T,TT), T=[OPS,[[Ν,TY]|RR]], member(AA,AR), AA=[S,t,[pt]], sublist(S,TY), AA1=[S,T,[pt]], I, subst(AA,AR,AA1,AR1), delete(T,TT,TT1), match_terms([act],deep,TT1 ,AR1 ,AR2). In t h e " t r u e " m o d e this Patient t e r m is, again, assigned Subject function: match_terms([pass],true,TT,AR,AR2) :member(T,TT), T=[OPS,[[Ν,ΤΥ]|RR]], member(AA,AR), AA=[S,t,[pt]j, sublist(S.TY), AA1=[S,T,[subj,pt]], I, subst(AA,AR,AA1,AR1), delete(T,TT,TT1), match_terms([act],true,TT1,AR1 ,AR2).
The module UniPar
199
In the active Voice, we first check the first argument position: match_terms([act],true J TT,AR,AR2) : member(T,TT), T= [OPS,[[Ν,TY] I RR ] ], AR= [ [ S , t , [ F ] ] | Z ] , sublist(S,TY), AR1=[[S,T,[SUbj,F]]|Z], delete(T,TT,TT1), I, match_terms([act],true,TT1,AR1 ,AR2). Then we check for a possible Recipient position (to which in the relevant case Object function is assigned): match_terms([act],ΐηιβ,ΤΤ,ΑΗ,ΑΗ2) : member(T,TT), T= [ OPS, [ [ Ν, TY ] | RR ] ], AR= [ A A 1 , A A 2 , [ S , t , [ r e c ] ] ] , sublist(S,TY), AR1 = [AA1,AA2,[S,T,[Obj,rec] ] ], delete(T,TT,TT1), I, match_terms([act],deep,TT1,AR2). Then, we check for an open Patient position: match_terms([act],true,TT J AR,AR2) : member(Τ,TT), T=[OPS,[[Ν,TY]|RR]], AR= [ A A 1 , [ S , t , [ p t ] ] | Z ] , sublist(S,TY), not(Z=[[_,_J[obj,rec]]])J AR1=[AA1,[S,T,[Ob],pt]]|Z], delete(T,TT,TT1), !, match_terms([act],deep,TT1,AR1,AR2). If none of the above conditions hold we match the term(s) one by one with the first argument position which is still open: match_terms([act],DT,TT,AR,AR2) : member(T,TT), T=[OPS)[[Ν,TY]|RR]], member(AA,AR), AA=[S,t,F], sublist(S,TY), AA1=[S,T,F], subst(AA,AR,AA1,AR1), delete(T,TT,TT1), !, match_terms([act],DT,TT1,AR1,AR2). This matching algorithm is successful in most cases, including difficult cases such as:
200 A universal parser
(43)
The lady was given these books by John.
One situation the algorithm cannot handle is Patient-Agent sequences such as in: (44)
This lady John does not like.
which will be interpreted as 'This lady does not like John'. This will have to be remedied by taking possible constituent orders of English into account. Further, it will be easier to handle when pragmatic functions have been integrated into the program.
13.1.7. Returning sentences Now that we have the possibility of "true-parsing" a sentence onto its fully specified clause, or "deep-parsing" it onto its underlying clause structure, we can formulate rules to "return" the structure so obtained by re-expressing it as one or more sentences which can be generated from these underlying structures: true_return_clause(L,CL,F) :ex2_clause(L,CL,F), nl, nl, writelist(F), nl, nl.
In order to "true-return" a clause we simply re-express the fully specified clause. Note that this may yield different variants when one fully specified clause can be expressed in different ways. true_return(L,X,Y,Z) :true_parse(L,X,Υ), true_return_clause(L,Y,Z).
This instruction waits for a sentence to be typed on the screen, then "trueparses" it and "true-returns" the fully specified clause so reconstructed. deep_return_clause(L,CL,F) :specify_clause(L,CL,CLl), ex2_clause(L,CL1,F), nl, nl, writelist(F), nl, nl.
For "deep-returning" a clause structure it must first be (re)specified and then the resulting fully specified clause may be returned. This will give higher-level variants if different specifications of the same clause structure (e.g. alternative subject-object assignments) are possible. deep_return(L,X,Y,Z) :deep_parse(L,X,Υ), deep_return_clause(L,Y,Z).
We can thus also "deep-parse" an input sentence and then "deep-return" it.
The module UrtiPar 201
An interesting feature of this parser (which was not consciously intended but accidentally discovered) is that it can parse sentences with "missing arguments". For one example, consider: (45)
The sailor was kicked.
Although this sentence is not produced by the generator, it is parsed by the parser in a way corresponding to: (46)
[t]^ kicked [the sailor]PatSubj
where "t" is the symbol indicating an empty term position. In order to bring out this kind of parse I formulated a special expression rule which will express an empty term with function F (e.g. Ag) as [ empty, ag]: ex_full_term(L,[Χ,ΐ,Υ],[empty,Y]).
13.2. Some performance features The parser as it stands has some interesting performance features which in some respects make it more powerful than the generator on which it relies ("comprehension exceeds production"), but which in some conditions may also produce incorrect parses (by the side of correct ones). Consider the following points: — The parsing strategy is not strictly tied to a fixed order of constituents, except in the following cases: [1] Auxiliaries must precede (though not necessarily immediately precede) the other auxiliaries and the lexical predicate which they govern. As it stands, the parser accepts the orders in Dutch (47a) and (47b), but not the equally acceptable order (47c): (47) a. Jan heeft Marie gekust. John has Mary kissed b. ... dal Jan Marie heeft gekust. ... that John Mary has kissed c. ... dat Jan Marie gekust heeft. ... that John Mary kissed has This can easily be made more flexible by adapting the rules for finding predicates. [2] Determiners, adjectives, nouns, and nominal modifiers must have a fixed order in the input sentence. But this order is in fact largely fixed in the three languages, as well in most others. [3] The order of terms which are not overtly marked by a preposition must be fixed, as in:
202 A universal parser
(48) a. b. c.
John (Subj-Ag) kissed Mary (Obj-Pat). John (Subj-Ag) gave the book (Obj-Pat) to Peter. John (Subj-Rec) was given the book (Pat) by Peter.
In actual fact, this order is indeed fixed in most cases. But the parser cannot correctly handle certain alternative orders which are possible especially in Dutch, such as: (49) a. b.
DEZE jongen heeft Jan gestagen. THIS boy has John hit "THIS boy has hit John' (correctly parsed) 'It is THIS boy that John has hit' (not parsed)
— The parser can handle certain structural ambiguities, especially those which involve the difference between attributive adnominal modifiers versus adverbial satellites. Thus, the parser delivers two parses for: (50) a. b.
John hit the man in the garden. 'John hit the man, and he did it in the garden' 'It was the man in the garden that John hit'
In some cases multiple ambiguities of this kind are correctly parsed: (51) a. b. c.
John hit the boy of the professor in the garden in the city. [the boy of the professor] [in the garden in the city] [the boy of the professor in the garden] [in the city] [the boy of the professor in the garden in the city]
— In certain cases the parser gives incorrect parses. Consider the following example: (52) a. b.
The boy in the city kissed the lady. [the boy in the city] [kissed] [the lady] (correct) [the boy] [in the city] [kissed] [the lady] i.e., 'The boy kissed the lady in the city' (incorrect)
This can of course be remedied, but I have left it as it is for the moment, since certain similar sequences in Dutch are genuinely ambiguous between (52a) and (52b): (53) a. b.
.... dat de jongen in de stad de dame kuste '... that the boy in the city kissed the lady' '... that the boy kissed the lady in the city'
At this point we thus require a further differentiation between English and Dutch. — Certain correct parses are not delivered. Consider the following example:
Some performance features
(54) a. b.
203
John deplored that Mary cheated yesterday. [John] [deplored] [that mary cheated yesterday] (correctly parsed) [john] [deplored] [that mary cheated] [yesterday] (i.e. he deplored it yesterday - not parsed).
Obviously, these various features require further development of the parser. — The parser can recognize and correctly interpret idiomatic expressions both at the deep and at the true level. Moreover, it resolves the ambiguity of such expressions as: (55) a. b.
John kicked the bucket. 'John hit the bucket with the foot' 'John died'
by reconstructing both the idiomatic and the non-idiomatic underlying structure. When we ask the parser to "true-return" (55), we get the following output: (56) a. b.
john kicked the bucket, (i.e., the non-idiomatic reading) john kicked the bucket, (i.e., the idiomatic reading)
When we ask the parser to "deep-return" (55), we get the following: (57) a. b. c.
john kicked the bucket, (non-idiomatic, active) the bucket was kicked by john. (idem, passive) john kicked the bucket, (idiomatic)
No passive will be returned for the idiomatic reading, since the empty function [ ] of the idiomatic term does not qualify for subject assignment.
Chapter 14. UniTra: a trilingual translator
14.0. Introduction1 The lay-out of the translator is based on the fact that it is much easier to translate underlying structures from one language to another than to translate the sentences through which these can be expressed. This is because of the standardized nature of underlying structure, and because of the fact that at the underlying level many language-specific features of formal expression have been neutralized. We can thus use the underlying clause structures as an "interlingua" for translation purposes. We already know that we can obtain underlying structures either through generating them or through reconstructing them by means of the parser. Therefore, if we can build a "bridge" between two languages at the level of underlying structure, we can, across that bridge, enter from the one language into the other. By "underlying structure" we may understand the fully specified clause underlying a sentence, or the more abstract "clause structure" from which the fully specified clause is derived through specification (see Figure 1 in chapter 3 and Figure 4 in chapter 4). Just as the parsing strategies "True Parse" and "Deep Parse" were distinguished in the preceding chapter, so I have implemented two distinct translation strategies: "True Translate" and "Deep Translate". The relationships between these two strategies can be represented as in Figure 5:
LI
L2
CS1
CS2
FSC1
FSC2
SI
S2
Figure 5. Deep and true translation.
206 Trilingual translator
If the "bridges" at the deep and the true level can be built, the overall lay-out of ProfGlot allows for different "routes" through the schema of Figure 5. For example: (1)
1. 2. 3. 4.
Generate CS1 Deep-translate CS1 to CS2 Specify CS2 to FSC2 Express FSC2 as S2
(2)
1. 2. 3. 4.
Generate CS1 Specify CS1 to FSC1 True-translate FSC1 to FSC2 Express FSC2 as S2
(3)
1. 2. 3. 4.
Input SI True-parse SI to FSC1 True-translate FSC1 to FSC2 Express FSC2 as S2
(4)
1. 2. 3. 4. 5.
Input SI Deep-parse SI to CS1 Deep-translate CS1 to CS2 Specify CS2 as FSC2 Express CS2 as S2
Since the fully specified clause is specified for Subject and Object, translation at that level will be true to the Voice of the input sentence: (5) LI: The professor has been hit by the boy. L2: Le professeur a ete frappe par le gargon. Deep translation, on the other hand, will "forget" the particular Subject and Object assignments, and may thus yield alternative translations which, though true to the nuclear meaning, need not be true to the particular form of presentation: (6) LI: The professor has been hit by the boy. L2: a. Le gargon a frappe le professeur. b. Le professeur a ete frappe par le garqon. The system outlined in Fig. 5 will yield two types of alternative translations: alternatives of type 1 consist of alternative expressions from one and the same FSC; alternatives of type 2 consist of alternative specifications of CS onto FSC:
Introduction 207
(8)
John regrets that Mary is ill.
Between CS and FSC this may be specified as either active or passive; between FSC and S both the active and the passive specified structure may be realized in different forms. Deep-translation into French will thus yield the following alternatives: (9) 1. 2. 3. 4. 5.
Active Jean regrette le fait que Mane soit malade. Jean regrette que Marie soit malade. Passive Le fait que Marie soit malade est regrette par Jean. Que Marie soit malade est regrette par Jean. II est regrette par Jean que Marie soit malade.
The actual translator UniTra thus takes the form of a system which, for an input CS or FSC in LI, can construe an equivalent CS or FSC in L2. The relation "equivalent clause structure L1-L2" is compositionally defined, finally arriving at the ultimate constituents of the fully specified clause for which LlL2 equivalences have been defined. These are equivalences between lexical predicates of LI and L2, or between other types of "ultimate constituents". If we simply "relexify" an LI source structure into an L2 target structure, the L2 structure may be such that it cannot be expressed in L2. In that case, the target structure must be adjusted to the well-formedness requirements of L2. The present program can deal with the following types of adjustment: [1] Relative adjustment In Dutch the relative pronoun is sensitive to the "neuter" or "non-neuter" character of the head noun. Therefore, if we translate towards Dutch and the target head noun has the feature "neut", then this feature must be added to the features of the relative element. In translating from Dutch towards English or French, no relative adjustment is required.
208 Trilingual translator
[2]
Gender adjustment
A nominal predicate may have another "Type" in the target language than in the source language. For example, in translating English book into Dutch boek, we need the information that the latter is "neuter"; in translating it into French livre we must know that the latter is masculine. [3] Subject-Object adjustment In translating from LI to L2 we must take care that we do not end up with a distribution of Subj and Obj functions which is inexpressible (or ungrammatical) in L2. Rather, we should like to adjust to the closest alternative in L2. Compare: (10) a. b. c.
John has been given the book by Mary. *Jean a ete donne le livre par Marie. Le livre a ete donne a Jean par Marie.
Without adjustment, (10a) would yield (10b); therefore, the clause structure underlying (10a) is adjusted to that underlying (10c) in the process of crossing the bridge to French. [4] Agreement adjustment The agreement features on the Tense of the finite verb maximally contain Number, Person, and Gender. Gender, however, is only relevant to French, and not present in the relevant slot in English and Dutch underlying structures. This Gender information must thus be re-established in French target structures, so as to avoid incorrect output. [5] Mood adjustment Without adjustment, translating (11a) to French would yield (lib); we therefore have to add the "subjunctive" to the target underlying structure: (11) a. b. c.
John deplores that Mary is ill. *Jean regrette que Marie est malade. Jean regrette que Marie soit malade.
In the other direction we will have to leave out the subjunctive element in order to secure proper expression in English and Dutch. These adjustments are indeed effected as part of the definition of "equivalent clause L1-L2". I have placed the adjustments at the lowest possible level in the compositional definition of "equivalent clause". For example, since Gender adjustment only concerns the unit [Noun,Type] (the "restrictor-1"), it is indeed taken care of at that level. Thus, at various points in the translation of underlying structures, the relevant adjustments are made as an integral part of the equivalence defined at that point. In the presentation
Introduction
209
below, however, I have placed all adjustments together in section 14.1.2., so as to make their effects more transparent and to show clearly where the three languages contrast at the level of underlying structure. For UniTra to work properly we need the three lexica for English, French, and Dutch side by side with the trilingual equivalence lexicon contained in UniTra itself. The capacity of the translator is such that it can correctly translate all structures generated by the generators of the three languages in all six directions, both at the "deep" level and at the "true" level. However, the translation of idiomatic expressions has only been taken care of to a limited extent (see below, under "equivalent proposition").
14.1. The module UniTra 14.1.1. The translator I first give the highest clauses of the translator, and then more or less "bottom-up" the way in which the relation "equivalent clause L1-L2" is defined. t r a n s l a t e l (L1,L2) : repeat, fully_specified_clause(L1,X), write(X), n l , n l , eq_clause(L1,L2,X,Υ), write(Y), n l , n l , w r i t e ( ' P o s s i b l e e x p r e s s i o n s of source s t r u c t u r e : ' ) , n l , give_all_expr(L1,Χ), η ΐ , η ΐ , w r i t e ( ' P o s s i b l e e x p r e s s i o n s of t a r g e t s t r u c t u r e : ' ) , n l , give_all_expr(L2,Υ), η ΐ , π ΐ , fail. This instruction first creates a fully specified clause in LI, writes it on the screen, then finds the equivalent fully specified clause in L2, writes it on the screen as well, and then gives all possible expressions for both the source and the target clause structure. give_all_expr(L,X) : findall(Xl,ex2_clause(L,X,Xl).LIST), write_items(LIST). Giving all possible expressions of a clause means making a list of these expressions and writing the items of the list on the screen. Note that there may be different alternatives for the expression of both the source and the target structiure, when the same fully specified clause can be expressed in different ways (see schema (7)).
210 Trilingual translator
translate2(Ll,L2) :repeat, clause_structure(L1,X), w r i t e ( ' U n d e r l y i n g clause s t r u c t u r e in ' ) , write(L1), w r i t e ( ' : ' ) , nl, write(X), nl, nl, eq_clause(L1,L2,X,Υ), w r i t e ( ' E q u i v a l e n t UCS in ' ) , write(L2), w r i t e ( ' : ' ) , nl, write(Y), nl, nl, specify_clause(L2,Υ,Ζ), w r i t e ( ' F u l l y s p e c i f i e d underlying c l a u s e structure in write(L2), w r i t e ( ' : ' ) , nl, write(Z), nl, nl, w r i t e ( ' P o s s i b l e e x p r e s s i o n s of t h i s FSUCS:'), n l , give_all_expr(L2,Z), n l , n l , fail.
'),
This instruction creates an underlying clause structure in LI, writes it on the screen, finds an equivalent clause structure in L2, writes that one on the screen as well, then specifies that clause structure to a fully specified clause in L2, again writing it on the screen, and then gives all possible expressions of that fully specified clause. true_translate(L1,L2) :true_parse(L1 ,Υ), write(Y), nl, nl, eq_clause(L1,L2,Υ,Ζ), ex2_clause(L2,Z,Z1), write(Z), nl, nl, w r i t e l i s t ( Z 1 ) , nl, nl. This alternative procedure of "true-translate" first "true-parses" an input sentence onto its fully specified clause in LI, finds the equivalent clause in L2, and expresses this as Z l . deep_translate(L1,L2) : deep_parse(Ll,Υ), eq_clause(L1,L2,Y,Z), specify_clause(L,Z,Z1), ex2_clause(L2,Zl,Z2), w r i t e l i s t ( Z 1 ) , nl, nl. Analogous procedure for deep-parsing + deep-translating. The relation "equivalent fully specified clause" is compositionally defined in the following rules. First, we consider when two term structures are equivalent:
The module UniTra 211
eq_term(L1 ,L2,[OPS,[[PRO|Z]]],[OPS,[[PROjZ]]]) : (PRO=ana;pronoun(PRO)), I. Two pronominal terms (including anaphorical, relative, and interrogative terms) are Ll-L2-equivalent if they are identical. eq_term(L1,L2,[[*,sg,*],[[PR|Z]]],[[*,sg,*],[[PR1|Z]]]) :(eq_prop(L1,L2,PR,PR1);eq_ext_pred(L1,L2,PR,PR1)), I. Two propositional/predicational terms are equivalent if the propositions or extended predications they contain are equivalent. eq_term(L1 ,L2,[OPS,[R1]],[OPS,[R11]]) : eq_restr1 (L1 ,L2,R1 ,R11). Two proper name terms are equivalent if the proper names are equivalent. eq_term(L1,L2, [OPS,[R1,R2,R3,R4]],[OPS,[R11,R22,R33,R44]]) : eq_restr1(L1 ,L2,R1 ,R11), eq_restr2(L1, L2,R2,R22), eq_restr3(L1,L2,R3,R33), eq_restr4 (L1, L2, R4,R44). Two terms are equivalent if they have the same operators and all the restrictors are pairwise equivalent. We now define when restrictors are pairwise equivalent. eq_restr1 (L1 ,L2,[N1 ,TY1 ],[N2,TY2]) : eq(L1,L2,N1,N2), adjust_type(L2,N2,TY2). Two restrictors-1 are equivalent if their forms are defined as equivalent, and if the source language type ΤΎ1 is adjusted to the target language type ΤΎ2. eq_restr2(Ll ,L2, [ ] , [ ] ) . eq_restr2(L1 ,L2, [S,T],[S1 ,T]) : eq_ad](L1,L2,S,S1), !. Two second restrictors are equivalent if they are both [ ] or if the adjectives they contain are equivalent. eq_adj(L1 ,L2,[deg,DEG,X],[deg,DEG1,X1 ]) : eq(L1,L2,[X],[X1]), eq(L1,L2,DEG,DEG1), !. Two derived adjectives of the form [deg, v e r y , t a l l ] and [deg,trds,grand] are equivalent if both the adjectives and the degrees are equivalent. eq_adJ(L1,L2,[X,Y],[X1,Y]) : eq(L1,L2,[X],[X1]), !. Two comparative adjectival predicates (e.g. [ t a l l , p o s ] and [grand,pos]) are equivalent if the adjectives are equivalent and the comparative operators are identical.
212 Trilingual translator
eq_adj(L1,L2,[X],[Y]) : eq(L1,L2,[X],[Y]). Otherwise, adjectives are equivalent if they are defined as equivalent. eq_restr3(L1,L2,[],[]). eq_restr3(L1,L2,X,Y) : eq_full_term(L1,L2,X,Y). Two restrictors-3 are equivalent if they are both [ ] or if the full terms they contain are equivalent. eq_restr4(L1,L2,[],_,[]). eq_restr4(L1 . d u t . X , [Ν,ΤΥ] ,Y1) : eq_ext_pred(L1,dut,X,Y), ad]ust_rel(dut,TY,Y,Y1), !. eq_restr4(L1,L2,X,_,Y) : eq_ext_pred(Ll,L2,X,Y). Two restrictors-4 are equivalent if they are both [ ] or if the extended predications they contain are equivalent. If the target language is Dutch the relative element has to be adjusted to the Gender of the head noun. This adjustment is defined below, in 14.2.2. We have now defined what it means for terms to be L1-L2 equivalent. We now proceed to defining equivalence between full terms. eq_fulljterm(L1,L2,[Sei,Τ,Fun],[Sell,T2,Fun]) eq_term(Ll,L2,T,T1), adjust_mood(L2,[Sel,T1],[Sell,T2]).
:-
Two full (predicational/propositional) terms are equivalent if they contain equivalent terms and the "mood" (subjunctive) is adjusted to the target language. eq_full_term(L1,L2,[Sei,T,Fun],[Sel,T1,Fun]) :eq_term(L1,L2,T,T1). In all other cases two full terms are equivalent if their selections and functions are identical and the terms they contain are equivalent. eq_full_terms(L1,L2,[],[]). eq_full_terms(L1,L2,[T|TR],[T1 |TR1 ]) : eq_full_term(L1,L2,T,T1), eq_full_terms(L1,L2,TR,TR1). A whole series of full terms is equivalent if each of the full terms in the LI series is equivalent to each corresponding full term in the L2 series. We now proceed to defining the equivalence between satellites. eq_sat(L1,L2,[*],[*]) !. e q _ s a t ( L 1 , L 2 , [ S e i , T , F ] , [ S e l , T 1 ,F]) : eq_term(L1,L2,T,T1), !. eq_sat(L1,L2,X,X1) : eq(L1,L2,X,X1).
The module UniTra 213
Two satellites are equivalent if they are both empty or if they have the same selections and functions and contain equivalent terms, or if they are defined as equivalent (in the case of lexical satellites). eq_sats(L1,L2, [ ] , [ ] ) . eq_sats(L1 ,L2,[X|Tail],[X1|Tail1]) : eq_sat(L1,L2,X,X1), eq_sats(L1,L2,Tail,TaiH). A series of satellites is equivalent to another if all of the satellites are pairwise equivalent. Now that equivalences have been defined between (series) of argument and satellite terms, we define the conditions in which predicates are equivalent. eq_pred(L1 ) L2,[X } Y] ) [X1 J Y1]) : copula(L1,X), eq(L1,L2,X,X1), eq_full_term(L1,L2,Y,Y1), !. Two (copular) predicates are equivalent if they have equivalent copulas and equivalent full term predicates: be a big boy = etre un grand garqon. We need the same rule at the "deep" level, where the copula has not yet been assigned: eq pred(L1,ί2,Υ,Υ1) : eq_full_term(L1,L2,Υ,Y1), !. The same for copular constructions with a term predicate: eq pi~ed (L1, L2, [Χ, Υ], [X1, Y1 ]) : copula(L1,X), eq(L1,L2,X,X1), eq_term(L1,L2,Y,Y1), !. eq_pred(L1,L2,Y,Y1) : eq_term(L1,L2,Υ,Y1), !. And for derived comparative predicates: eq_pred(L1 J L2,[X > [[Y J C] > Z]] > [X1,[[Y1 > C] ) Z1]]) copula(L1,X), eq(L1,L2,X,X1), eq_adj(L1,L2,[Y] J [Yl]) ) eq_full_term(L1,L2,Z,Z1), !. eq_pred(Li ,L2,[[Y,C],Ζ],{[Y1,C],Z1 ]) : eq_adj(L1,L2 J [Y] J [Y1]), eq_full_term(L1,L2,Z,Z1), !.
:-
These rules take care of the "true" (copular) and the deep version of comparative predicates such as (be) taller than John = (etre) plus grand que Jean. Remember from chapter 9 that these were analysed as complex predicates. Such predicates are equivalent if the copulas are equivalent, the adjectives in e.g. [ t a l l , p o s ] , [grand,pos] are equivalent, the comparative markers are identical, and the full terms are equivalent.
214 Trilingual translator
eq_pred(L1,L2,[X,Y],[X1,Y1]) copula(L1,X), eq(L1,L2,X,X1), eq_adj(L1 ,L2,Y,Y1), !. eq_pred(L1,L2,Y,Y1) : eq_adj(L1,L2,Υ,Y1), I.
:-
The equivalence between copular and non-copular adjectival predicates: (be) tall = (etre) grand.
eq_pred(L1,L2,S,S1) eq(L1,L2,S,S1),!.
:-
Otherwise, two predicates are equivalent if they are defined as such. Given these equivalences we can now compositionally define the equivalence between two clause structures: eq_nucleus(L1,L2,[S,T,A],NUC1) : eq_pred(L1,L2,S,S1), !, eq_full_terms(L1,L2,A,A1), ad J ust_so (L2, [ S1, Τ, A1 ], NUC1). At the "deep" level of clause structure, two nuclei are equivalent if their predicates and their argument full terms are equivalent, and if the SubjectObject assignment is adjusted to the target language. eq_nucleus(L1,L2,[[V,S],Τ,Α],NUC1) : (V=[act];V=[pass]), eq_pred(L1 ,L2,S,S1), I, eq_full_terms(L1,L2,A,A1), adjust_so(L2,[[V,S1],T,A1],NUC1). The same for the "true" level of the fully specified clause, where the Voices must be identical as well. eq_core (L1, L2, [ P1, NUC, S1 ], [ P1, NUC1, S11 ]) : eq_nucleus(L1,L2,NUC,NUC1), eq_sats(L1,L2,S1 ,S11). Two core predications are equivalent if they have the same operators, equivalent satellites-1, and equivalent nuclear predications. eq_ext_pred(L1,L2,[P2,C0RE,S2],EXTPRED) : eq_core(L1,L2,CORE,C0RE1), eq_sats(L1,L2,S2,S21), adjust_ext_pred(L2,[P2.C0RE1,S21].EXTPRED). Two extended predications are equivalent if they have the same operators, equivalent satellites-2 and equivalent core predications and if they are adjusted to the Gender agreement operative in the target language.
The module UniTra 215
eq_prop(L1,L2,PROP,PR0P2) : PR0P=[P3,[P2,[P1,[S,T,ARG],S1],S2],S3], member(AA,ARG), AA=[[idiom]|Z], entail(L1,PR0P,PR0P1), eq_prop(L1,L2,PR0P1 ,PR0P2), !. Just to give an impression of how idiomatic expressions can be translated, this rule checks whether there is any idiomatic argument in the nucleus of the proposition. If so, if derives a non-idiomatic paraphrase through "entail" (as defined below; actually, it is part of the logical component to be discussed in chapter 15), and matches the paraphrase with an equivalent proposition in the target language. Thus, the proposition underlying (12a) is first paraphrased to (12b) and then translated as (12c): (12) a. b. c.
John has kicked the bucket. John has died. Jean a mouru.
entail(L,C,C1) : C= [P3,[P2,[P1,NUC,S1],S2],S3], mean(L,NUC,NUC1), C1=[P3,[P2,[P1,NUC1,S1],S2],S3]. A proposition C entails another proposition CI if the nucleus NUC of C is replaced by the nucleus NUC1 as specified in the meaning definition of the predicate. Two notes on this strategy for translating idioms: [1] We only get non-idiomatic translations for idiomatic expressions. What would be needed to remedy this is a method of "re-idiomatizing" the target expression (Van der Korst 1987). Alternatively, we could directly match LI idiomatic expressions with L2 idiomatic expressions of more or less the same meaning and idiomatic flavour. [2] The rules as presently formulated work only at the "deep" translation level. The relevant procedure for the "true" level has not yet been taken care of. eq_prop(L1 ,L2, [P3,EP,S3],[P3,EP1,S31 ]) : eq_ext_pred(L1,L2,EP,EP1), eq_sat(L1,L2,S3,S31). Two propositions are equivalent if they have the same operators, equivalent satellites-3, and equivalent extended predications. eq_clause(L1 ,L2,[P4,PR0P,S4],[P4.PR0P1,S41 ]) : eq_prop(L1,L2,PROP,PR0P1), eq_sat(L1,L2,S4,S41). Two clauses (either clause structures or fully specified clauses) are equivalent if they have the same operators, equivalent satellites-4, and equivalent propositions.
216 Trilingual translator
1422. The adjustments In this section I have collected the adjustments which have to be made in the underlying structures of the target language so as to secure that the target output will be acceptable. adjust_rel(dut,TY,Y,Y1) : Υ = [P2,[P1,[S,T,A],S1 ], S2], member(neut,ΤΥ), member([SE,[OPS,[[rel,TY1 ] ] ] ,F] ,A), concat([neut],TY1,TY2), subst([SE,[OPS,[[rel,TY1]]],F],A, [SE,[OPS,[[rel,TY2]]],F],A1), Y1 = [ P 2 , [ P 1 , [ S , T , A 1 ] , S 1 ] , S 2 ] , 1. adjust_rel(dutΥ,Y). If the target language is Dutch, this adjustment checks whether the Type of the target head noun contains the feature "neut" (neuter). If so, this feature is added to the Type of the relative element, so as to trigger the correct form in the paradigm of relative pronouns. Otherwise, the input remains unaffected. adJust_type(L,N2,TY2) : predn(L,[N2,TY2],_), 1. adjust_type(L,[N1,TY1],[N2.TY1]). The type of a translated nominal predicate is adjusted to the target language by taking the Type of the target noun (ΤΎ2) along with the target noun itself (N2).
adjust_so(eng,X,X) : - !. Subject-Object assignment never needs adjustment when the target language is English, since English has the widest possibilities in this respect. adjust_so(L,[S,TY,ARG],[S,TY,ARG1]) : ARG=[A1,[Sel.T,[pt]],[Sell,T1,[subj.rec]]], ARG1 = [ A 1 , [ S e i , T , [ s u b j , p t ] ] , [ S e l l , T 1 , [ r e c ] ] ],
I.
In both the other languages, Subject assignment to a Recipient is changed to Subject assignment to the Patient, so that (13c) rather than (13b) is produced as a translation of (13a): (13) a. b. c.
John has been given the book by Mary. *Jean a ete donne le livre par Marie. Le livre a ete donne a Jean par Marie.
Note that a better translation might require a different lexical choice in French, as in: (14)
Jean a obtenu le livre de (la part de) Marie.
The module UniTra 217
But this requires a more fundamental kind of "adjustment" that UniTra is not yet capable of. adjust_so(fre,[S,TY,ARG],[S,TY,ARG1]) : ARG=[A1,[Sel,T,[pt]],[Sell,T1,[obj,rec]]], ARG1=[A1,[Sel,T,[obj,pt]],[Sell,T1,[rec]]],
I.
In French, but not in Dutch, Object assignment to a Recipient is adjusted to Object assignment to the Patient, so as to avoid (15b) and produce (15c) instead as a translation of (15a): (15) a. b. c.
John has given Peter a book. *Jean a donne Pierre un livre. Jean a donne un livre ä Pierre.
adjust_so(L,X,X). If none of the above conditions apply, Subject-Object assignment remains unaffected. As for the subjunctive mood, if the target language is French, this mood must be added in appropriate circumstances: adjustjnood(fre,[Sel,T1],[Sel,T2]) :(on(prop,Sel);on(extpred,Sel)), add_mood(T1, T 2 ) , ! . When the Selection on the term position in question contains "prop" or "extpred", subjunctive mood must be added to the (Tense of) the embedded predication/proposition. It so happens that all the propositional and predicational terms recognized so far require the subjunctive mood in French. Note that this assignment must be improved for those cases in which a real choice between indicative and subjunctive is involved. Mood addition must again be formulated for both the "deep" and the "true" level of clause structure. Both forms of "add_mood" must be made to work for both propositional and predicational complements: add mood(X,X1) : X= [0,[[[P3,[[[TE|Z]|ZZ],CORE,S2],S3],TY]]], X1=[0,[[[P3,[[[[subjonc,TE]|Ζ]|ZZ],CORE,S2],S3],TY]]],!. add_mood(X,X1) : X= [0|[[[[[TE|Z]|ZZ],CORE,S2],TY]]], X1=[0,[[[[[[subionc,TE]|Z]|ZZ],CORE,S2],TY]]],!. add_mood(X,Xl) : x= [0,[[[P3,[[TE|ZZ],CORE,S2],S3],TY]]], X1 = [0,[[[P3,[[[subjonc,TE]|ZZ],CORE,S2],S3],TY]]],!. add_mood(X,X1) : X= [0,[[[[TE|ZZ],CORE,S2],TY]]], X1=[0,[[[[[subjonc,TE]|ZZ],CORE,S2],TY]]],I.
218 Trilingual translator
On the other hand, when we go from French to English or Dutch, the mood must be removed: ad]ust_mood(L,[Sel,T1],[Sel,T2]) : remove_mood(T1,T2),J. Mood-removal can be conveniently formulated as the converse of moodaddition: remove_mood(X,X1) : add_mood(Xl,X), !. If there is no mood to be adjusted either way, "adjust_mood" has no effect: adjust_mood(L,X,X). The agreement component contains a slot for Gender, which is only relevant to French. Therefore, it must be added to the extended predication when French is the target language. ad j u st_ext_p red(L,Χ,X) : (L=eng;L=dut), 1. This adjustment has no effect in English and Dutch. ad]ust_ext_pred(fre,X,Y) :add_gender(fre,X,Y), I. In French we must add the Gender feature to the agreement component. This is done by retrieving it from the Subject term: add_gender(fre,PRED,PRED1) : PRED=[[[Τ,Ν,Ρ,G]|Z],[P1,[S,TY,ARG],S1],S2], member(AA,ARG), AA=[Sel,[OPS,[[ΝΟ,ΤΥΥ]|RR]],[sub]|R]], gender(GEN), member(GEN,TYY), PRED1=[[[T,N,P,GEN]|Z],[P1,[S,TY,ARG],S1],S2]. ad]ust_ext_pred(fre,X,X). Otherwise, the extended predication needs no adjustment.
The module UrtiTra 219
14.1.3. The equivalences All basic equivalences are formulated as triplets English-French-Dutch. Note that further languages can easily be added to these equivalences. The following rules select the correct L1-L2 pairs from the triplets: eq(eng,fre,X,Y) eq(eng,dut,X,Y) eq(fre,dut,X,Y) eq(fre,eng,X,Y) eq(dut,eng,X,Y) eq(dut,fre,X,Y)
eq(X,Y,_) eq(X,_,Y) eq(_,X,Y) eq(Y,X,_) eq(Y,_,X) eq(_,Y,X)
The following rule is needed to define equivalences between derived agent nouns. These are equivalent if the underlying verbs are equivalent: eq([V,[agent|Z]],[V1,[agent|Z]],[V2,[agent|Z]]) eq(V,V1,V2), !.
:-
All the other equivalences immediately correlate grammatical or lexical items as such: eq([John],[jean],[jan]) :- i. e q ( [ m a r y ] , [ m a r i e ] , [ m a r i e ] ) : - !. eq([Sunday],[dimanche],[zondag]) : - 1. e q ( [ b o o k ] , [ l i v r e ] , [ b o e k ] ) : - 1. e q ( [ c h i l d ] . [ e n f a n t ] , [ k i n d ] ) : - !. e q ( [ s a i l o r ] , [ m a r i n ] , [ z e e m a n s ] ) : - I. e q ( [ c i t y ] , [ c i t 6 ] , [ s t a d ] ) : - I. eq([bucket],[seau],[emmer]) !. e q ( [ p a i n t e r ] , [ p e i n t r e ] , [ s c h i l d e r ] ) : - !. e q ( [ p a i n t e r ] , [ p e i n t r e j , [ s c h i l d e r e s ] ) : - !. e q ( [ s a c k ] , [ s a c ] , [zak]) : - 1. e q ( [ b a l l ] , [ b a l l o n ] , [ b a l s ] ) : - !. e q ( [ b o y ] , [ g a r q o n ] , [ j o n g e n ] ) : - !. e q ( [ b o o k ] , [ l i v r e ] , [ b o e k ] ) : - 1. e q ( [ p r o f e s s o r ] , [ p r o f e s s e u r ] , [ p r o f e s s o r ] ) : - !. e q ( [ p a i n t e r ] , [ p e i n t r e ] , [ s c h i l d e r ] ) : - !. e q ( [ p a i n t e r ] , [ p e i n t r e j , [ s c h i l d e r e s ] ) : - !. eq([man],[homme],[manz]) : - 1. eq([woman],[femme],[vrouw]) : - !. e q ( [ c h i l d ] , [ e n f a n t ] , [ k i n d ] ) : - I. e q ( [ g i r l ] , [ f i l l e ] , [ m e i s j e ] ) : - !. eq([lady],[dame],[dame]) : - 1. eq([mother],[m£re],[moeder]) : - 1. e q ( [ s t u d e n t ] , [ ' 6 t u d i a n t ' ] , [ s t u d e n t ] ) : - 1. e q ( [ s t u d e n t ] , [ ' e t u d i a n t e ' ] , [ s t u d e n t e ] ) : - !. e q ( [ f a t h e r ] , [ p e r e ] , [ v a d e r ] ) : - !. e q ( [ r o s e ] , [ r o s e ] , [ r o o s ] ) : - I. e q ( [ f l o w e r ] , [ f l e u r ] , [ b l o e m ] ) : - !. e q ( [ g a r d e n ] , [ j a r d i n ] , [ t u i n ] ) : - !. eq([thingummy],[machin],[dinges]) : - !. e q ( [ g o o d ] , [ b o n ] , [ g o e d ] ) : - J.
220 Trilingual translator
e q ( [ c l e v e r ] , [ i n t e l l i g e n t ] , [ s l i m ] ) :- I. e q ( [ t a l l ] , [ g r a n d ] , [ l a n g ] ) :- !. e q ( [ s h o r t ] , [ p e t i t ] , [ k l e i n ] ) :- !. eq(be,et,zijn) :- I. e q ( [ g o ] , [ a l l ] , [ g a a ] ) :- I. eq([walk],[march],[loop]) :- I. e q ( [ c h e a t ] , [ t r i c h ] , [ s p e e l , v a l s ] ) :- !. eq([kiss],[embrass],[kus]) :- I. eq([give],[donn],[geev]) :- !. eq([die],[mour],[sterv]) :- !. eq([move],[bouge],[beweeg]) :- !. e q ( [ k i c k ] , [ — ] , [ s c h o p ] ) :- I. eq([hit],[touch],[raak]) :- !. e q ( [ h i t ] , [ f r a p p ] , [ s l a a ] ) :- I. eq([die],[mour],[sterv]) :- !. e q ( [ l a u g h ] , [ r i g o l ] , [ l a c h ] ) :- !. eq([cough],[touss],[hoest]) :- !. eq([touch],[touch],[raak,aan]) :- J. eq([buy],[achet],[koop]) :- !. eq([sell],[vend],[verkoop]) :- 1. eq([deplore],[regrett],[betreur]) :- !. eq([expect],[attend],[verwacht]) :- 1. eq([believe],[croy],[geloov]) :- I. eq([want],[voul],[wil]) :- 1. eq([today],['aujourd'*hui'],[vandaag]) :- !. eq([tomorrow],[demain],[morgen]) :- !. eq([yesterday],[hier],[gisteren]) :- !. eq([probably],[probablement],[waarschi]nli]k]) :- !. e q ( [ ' f r a n k l y , ' ] , [ ' f r a n c h e m e n t , ' ] , [ ' e e r l i j k gezegd,']) :- !. eq(['frankly, eeh,'],['franchement, e e h , ' ] , [ ' e e r l i j k gezegd, eeh,']) :- !. eq(very,tr£s,erg). eq(rather,assez,nogal). eq(reasonably,raisonnablement,redelijk). eq(surprisingly,remarquablement,verbazend).
Chapter 15. UniLog: universal logic
15.0. Introduction One of the essential capacities of natural language users is that they can infer pieces of information from other pieces of information under preservation of logical validity. For example, if (1) is accepted as true, then [a] and [b] must be accepted as true as well: (1)
John is taller than Bill. therefore: [a] Bill is shorter than John. [b] Bill is less tall than John.
In order to correctly capture such inferences we need a logic which defines which inferences are valid and which are not. For such a logic, again, we need a logical syntax and a logical semantics. The logical syntax defines the wellformed formulae which can serve as input and output to inference rules; the logical semantics defines how these formulae can be interpreted in relation to some domain of interpretation, and what inference rules yield valid inferences when applied mechanically to given input formulae. The strategy followed in designing logical calculi is typically to define some abstract logical syntax and then study the logical properties of this syntax under certain forms of logical semantic interpretation. Such exercises may form a goal in themselves, and need not be claimed to stand in any direct correspondence with natural language expressions, nor with the way in which actual natural language users go about in developing patterns of "natural reasoning". It may also be, however, that a theory of logic does wish to relate to the actual practice of natural language users. In that case, the typical strategy is to translate natural language expressions into expressions of the logical syntax, then do the inferencing at that level, and re-translate the output logical formulae into natural language. It is clear that this process of translating and re-translating becomes more difficult, the greater the distance between the structures of the natural language expressions (the grammatical form) and the logical formulae in terms of which the inferencing process is captured (the logical form). It will further be clear that the underlying clause structures postulated within Functional Grammar have many properties which make them fit for supporting logical inferencing directly. For this reason we adopt the working assumption that the logical form of sentences is identical to their Functional Grammar underlying clause structures. In other words, we identify logical form with grammatical form. We saw in the preceding chapters that Functional Grammar underlying clause structures can be reached from different sides: they can be generated
222 Universal logic
by the generator, reconstructed from an input sentence through "deepparsing", or reached from another language through "deep-translating". If it can be demonstrated that these same clause structures can also support logical inferencing, the advantages will be clear: we need no transition from the underlying grammatical structure to a level of logical structure, nor back from logical structure to grammatical underlying form. A further unifying effect is achieved if the propositions contained in Functional Grammar underlying structures are also used for knowledge representation (at least, for the nonperceptual part of knowledge). In this way, the "language" of the Functional Grammar underlying clause can indeed provide a multi-purpose "unified cognitive language" (Dik 1987d, 1989a, Weigand 1990). It is customary to define logical sub-calculi in terms of the types of units of which the logical properties are crucial to the particular rules of interpretation and inference involved. Thus, we speak of "predicate logic", "quantificational logic", "propositional logic", etc. The different layers distinguished within Functional Grammar underlying clause structure provide a natural basis for defining the following sub-calculi: — Illocutionary Logic which deals with the logical properties of and relations between speech acts, including illocutionary operators and satellites. — Propositional Logic which deals with the logical properties of and relations between propositions, including propositional operators and satellites (subjective mood, epistemic status, propositional attitude). — Predicational Logic which deals with the logical properties of and relations between predications, interpreted as designating States of Affairs, including predication operators and satellites (Tense, Aspect, objective mood). — Predicate Logic which deals with the logical properties of and relations between predicates, interpreted as designating properties and relations, including predicate operators and predicate satellites. — Term Logic which deals with the internal logical properties of and relations between terms, interpreted as designating entities, including term operators specifying Definiteness, Number, Quantification, etc. — Lexical Logic which deals with the logically relevant properties internal to lexical predicates; properties such as have to be specified for specific lexical items through meaning postulates.
Introduction
223
For all these sub-calculi the Functional Grammar underlying clause structure in principle contains the items in terms of which the relevant logical properties and relations can be specified. Of course, it will be a major undertaking to properly develop these various sub-calculi, but I see no principled reasons why this could not be done. The enormous advantage of this undertaking will be the integration of grammar, logic, and cognition within the model of the natural language user. A logic which takes Functional Grammar underlying clause structures for its logical forms may be called a "Functional Logic". Some basic assumptions of Functional Logic have been discussed and motivated elsewhere (see Dik 1988, 1989a, 1989c, Weigand 1987, 1989, 1990, Dignum 1989b). The present logical module is only a first preliminary attempt at operationalizing some of the assumptions of Functional Logic. Nevertheless, it may give a first impression of how the ideas sketched above may be implemented.
15.1. The module UniLog UniLog first of all defines a number of basic logical properties of and relations between lexical items of the three languages.
15.1.1. English basic logical relations basic_hyponym(eng,[[sailor]|Z],[[[man]|Z]]). This rule says that sailor is a basic hyponym of man, in other words, "a sailor is a man". Note that this can be seen as part of the meaning definition of the predicate sailor. Likewise we define other hyponymy relations: basic_hyponym(eng,[[rose]|Z],[[[flower]|Z]]). basic_hyponym(eng,[[father]|Z],[[[man]|Z]]). basic_hyponym(eng,[[mother]|Z],[[[woman]|Z]]). baslc_hyponym(eng,[[flower]|Z],[[[plant]|Z]]). Hyponymy relations may also obtain between nominal predicates and combinations of noun + adjective. Thus: basic_hyponym(eng,[[man]|Z], [[[person],[hum,ambi]],[[male],[]]]). basic_hyponym(eng,[[woman]|Z], [[[person],[hum,ambi]],[[female],[]]]). A man is a male person and a woman is a female person. Proper names are defined as hyponyms of "male person" and "female person" by virtue of their type features:
224 Universal logic
basic_hyponym(eng,[ΝΝ,ΤΥ], [[[person],[hum.ambi]],[[female],[]]]) :TY=[hum,fern,proper|_]. basic_hyponym(eng,[ΝΝ,ΤΥ], [[[person],[hum,ambi]],[[male],[]]]) :TY=[hum,masc,proper|_]. Another logically important relation is the converse relation: converse(eng,[tall,pos],[short,pos]). converse(eng,[short,pos],[tall,pos]). These rules define shorter than as the converse of taller than, and the other way around. A special type of converse relation holds between predicate frames such as those for buy and sell: if XI sells X2 to X3, then X3 buys X2 from XI and vice versa: converse3(eng, [[buy],TY,[[[anim],X1,[ag]], [[lnanim],X2,[pt]],[[anim],X3, [[sell],TY,[[[anim],X3,[ag]], [[lnanim],X2,[pt]],[[anim],X1, converse3(eng, [[sell],TY,[[[anim],X3,[ag]], [[inanim],X2,[pt]],[[anim],X1, [[buy],TY,[[[anim],X1,[ag]], [[inanim],X2,[pt]],[[anim],X3,
[so]]]], [rec]]]]). [rec]]]], [so]]]]).
15.1.2. French basic logical relations basic_hyponym(fre,[[marin]|Z],[[[homme]|Z]]). basic_hyponym(fre,[[rose]|Z],[[[fleur]jz]]). basic_hyponym(fre,[[fleurj|Z],[[[plante]jz]]). basic_hyponym(fre,[[homme]jz], [[[personne]»[hum,fem]],[[masculin],[]]]). basic_hyponym(fre,[[femme]|Ζ], [[[personne]>[hum,fem]],[[feminin],[]]]). basic_hyponym(fre,[ΝΝ,ΤΥ], [[[personne],[hum,fem]],[[feminin],[]]]) :TY=[hum,fem,proper|_]. basic_hyponym(fre,[ΝΝ,ΤΥ], [[[personne],[hum,fem]],[[masculin],[]]]) :TY=[hum,masc,proper|_]. Note that personne in French is grammatically of feminine gender even if it is used to refer to male entities. converse(fre,[grand,pos],[petit,pos]). converse(fre,[petit,pos],[grand,pos]).
The module UniLog 225
converse3(fre, [[achet],TY,[[[anim],X1,[ag]], [[inanim],X2,[pt]],[[anim],X3,[so]]]], [[vend],TY,[[[anim],X3,[ag]], [[inanim],X2,[pt]],[[anim],X1,[rec] ] ] ]). converse3(fre, [[vend],TY,[[[anim],X3,[ag]], [[Inanim],X2,[pt]],[[anim],X1,[rec]]]], [[achet],TY,[[[anim],X1,[ag]], [[inanim],X2,[pt]],[[anim],X3,[so]]]]).
15.1.3. Dutch basic logical relations basic_hyponym(dut,[[mans]|Z],[[[mens]|Z]]). basic_hyponym(dut,[[vrouwj|Z],[[[mens]|Z]]). basic_hyponym(dut,[[schilder]|Z], [ [[man]|Z]]). basic_hyponym(dut,[[zeemans]|Z],[[[man]|Z]j). basic_hyponym(dut,[[roos]|Z],[[[bloem]|Z]]). basic_hyponym(dut,[[vaderj|Z],[[[man]|Z]j). basic_hyponym(dut,[[moederj|Z],[[[vrouwj|Z]]). basic_hyponym(dut,[[bloem]|Z],[[[plant]|Z]j). basic_hyponym(dut,[[student]|Z],[[[jongen]jz]]). basic_hyponym(dut,[[mans]|Z], [ [[persoon],[hum,ambi]],[[manlijk],[]]]). basic_hyponym(dut,[[vrouwj|Z], [[[persoon],[hum,ambi]],[[vrouwelijk],[]]]). If the genders of the hyponym and the hyperonym differ, they have to be taken along in defining the hyponymy relation: basic_hyponym(dut,[[studente],[A,B|Z]], [[[meiste],[A,neut,B|Z]]]). basic_hyponym(dut,[ΝΝ,ΤΥ], [[[persoon],[hum,ambij],[[vrouwelijk],[]]]) :TY=[hum,fem,proper|_]. basic_hyponym(dut,[ΝΝ,ΤΥ], [[[persoon],[hum,ambi]],[[manlijk],[]]]) :TY=[hum,masc,proper|_]. converse(dut,[lang,pos],[klein,pos]). converse(dut,[klein,pos],[lang,posj). converse3(dut, [[koop],TY,[[[anim],X1,[ag]], [[inanim],X2,[pt]],[[anim],X3,[so]]]], [[verkoop],TY,[[[anim],X3,[ag]], [[inanim],X2,[pt]],[[anim],X1,[rec]]]]). converse3(dut, [[verkoop],TY,[[[anim],X3,[ag]], [[inanim],X2,[pt]],[[anim],X1,[rec]]]], [[koop],TY,[[[anim],Xl,[ag]], [[inanim],X2,[pt]],[[anim],X3,[so]]]]).
226 Universal logic
15.1.4. Universal logical relations and operations 15.1.4.1. Logical properties of and relations between predicates The following logical properties of and relations between predicates can be formulated without mentioning the particular languages. converse(L,[S,neg],[S,pos]). converse(L,[S,pos],[S.neg]). The positive and negative comparatives of any predicate S are each other's converses. hyponymiL.X.Z) : basic_hyponym(L,X,Z). hyponym(L,X,Z) : basic_hyponym(L,Χ,Y), hyponymiL.YjZ). X is a hyponym of Ζ if X is a basic hyponym of Z, or if X is a basic hyponym of Y, and Y is a hyponym of Z. hyperonym(L,Y,X) : hyponym(L,X,Y). Y is a hyperonym of X when X is a hyponym of Y. hyponymous_term(L,Tl,T2) : T1 = [[A,B,C],[R1]], hyponym(L,Rl,[R11,R22]), T2 = [ [ i n d e f , B , C ] , [ R 1 1 , R 2 2 , [ ] , [ ] ] ] . When T1 is a proper name term with restrictor R1 and R1 is a hyponym of [R11.R22] (e.g., johti is a hyponym of [person,male]), then T1 is a hyponymous term to T2, where T2 is an indefinite term with the same number Β as Tl, and [R11 , R22] as restrictors. This accounts for such relations as: (2)
John kicked the bucket. therefore: A male person kicked the bucket.
hyponymous_term(L,T1,T2) : T1 = [ [ A , B , C ] , [ R 1 , [ ] , R 3 , R 4 ] ] , hyponym(L,R1,[R11 ] ) , T2 = [ [ i n d e f , B , C ] , [ R 1 1 , [ ] , R 3 , R 4 ] ] . The same for common noun terms with restrictors Rl, R2, R3, R4. This accounts for such relations as: (3) a. b. c. d.
A rose is a flower. The rose is a flower. Roses are flowers. The roses are flowers.
The module UniLog
227
and can be used in inferences of the form: (4)
John gave roses to Mary. therefore: John gave flowers to Mary.
R2 (the position for adjectival restrictors) has been left empty here, in order to avoid such inferences as: (5)
A good sailor cheated. therefore: A good man cheated.
Similar measures will have to be taken with respect to R3 and R4, but these play as yet no role in the entailments as so far formulated. hyponymous_term(L,Tl,T2) : T1 = [[A,B,C],[R1,R2,R3,R4]], hyponym(L,R1,[R11,R22]), T2 = [[indef,B,C],[R11,R22,R3,R4]]. The analogous rule for the case in which the hyperonym is a noun + adjective combination. This accounts for such relations as: (6)
A sailor kissed Mary. therefore: A male person kissed Mary. hyponymous_term(L,Tl,T2) : T1 = [[A,B,C],[R1,R2,R3,R4]], not(R2=[]), T2 = [ [ i n d e f , B , C ] , [ R 1 , [ ] , R 3 , R 4 ] ] ,
!.
This rule says that we also get a hyponymous term by leaving out the adjectival modifier. For example, a good sailor is a sailor. Automatically, combinations such as "male person" are defined as hyponyms of "person", etc.
15.1.4.2. Entailments between propositions Using the language-specific and the universal logical properties and relations we can define a number of entailments between propositions (in the Functional Grammar sense of the term). entall(L,C,Cl) :C = [P3,[P2,[P1,NUC,S1],S2],S3], NUC = [[ST,[SEL2,T2,FUN2]],TY,[[SEL1,T1,FUN1 ] ] ] , converse(L,ST,STl), NUC1 = [[ST1,[SEL1,T1,FUN2]],TY,[[SEL2,T2,FUN1]]], C1 = [P3,[P2,[P1,NUC1,S1],S2],S3].
228 Universal logic
When we have a proposition of which the nucleus contains a predicate ST which is defined as the converse of ST1, we can derive an entailment by replacing ST by ST1 while at the same time reversing the terms T1 and T2. Note that, as discussed in chapter 9, the term representing the standard of comparison is analysed as part of the complex derived predicate. This rule relates such pairs as: (7)
John is taller than Bill. therefore: Bill is shorter than John. (8) John is taller than Bill. therefore: Bill is less tall than John. entail(L,C,Cl) :C= [P3,[P2,[P1,NUC,S1],S2],S3], converse3(L,NUC,NUC1), C1=[P3,[P2,[P1,NUC1,S1],S2],S3]. A proposition C entails a proposition CI if the nucleus of C is the "converse3" of the nucleus of CI. This rule relates such clauses as: (9)
John sold the book to Peter. therefore: Peter bought the book from John.
entail(L,C,Cl) :C= [P3,[P2,[P1,NUC,S1],S2],S3], mean(L,NUC,NUC1), C1=[P3,[P2,[P1,NUC1,S1],S2],S3]. entail(L,C,Cl) :C= [P3,[P2,[P1,NUC,[M,I,D,B]],S2],S3], mean(L,NUC,NUC1,M1), M1=[X,Y,[manner]], C1=[P3,[P2,[P1,NUC1,[M1,I,D,B]],S2],S3]. entall(L,C,C1) : C= [P3,[P2,[Ρ1,NUC,[Μ,I,D,B]],S2],S3], mean(L,NUC,NUC1,I1), I1=[X,Y,[instr]], C1 = [P3,[P2,[P1,NUC1,[M,11,D,B]],S2] ,S3]. These rules define entailments between two propositions C and CI by virtue of meaning definitions between their nuclear predicate frames. The meaning definitions are found in the lexicon. The rules account for such relations as between: (10)
John kissed Mary. therefore: John touched Mary with the lips.
The module UniLog 229
(11)
John kicked the bucket. therefore: [a] literally: John hit the bucket with the foot. [b] idiomatic: John died.
entail(L,C,Cl) :C= [P3,[P2,[Ρ1,NUC,S1],S2],S3], member(X,Sl), not(X=[*]), C1=[P3,[ P 2 , [ P 1 , N U C , [ [ * ] , [ * ] , [ * ] , [ * ] ] ] , S 2 ] , S 3 ] . When a proposition contains a non-empty satellite-1, an entailment can be derived by leaving out the satellite. This accounts for such inferences as: (12)
John hit the bucket with the foot. therefore: John hit the bucket.
entail(L,C,C1) : C= [P3,[P2,[P1,NUC,S1],S2],S3], member(X,S2), not(X=[*]), C1=[P3,[P2,[P1,NUC,S1],[[*],[*],[*],[*]]],S3]. The same applies to non-empty satellites-2. Thus: (13) John went to London on Sunday. therefore: John went to London. entail(L,C,C1) : C=[P3,[P2,[P1,[ST,TY,AR],S1],S22],S33], member(PT,AR), PT=[Sei,Term,Fun], hyponymous_term(L,Terni,Terni1), PT1=[Sel,Term1,Fun], subst(PT,AR,PT1,AR1), C1 = [P3,[P2,[P1,[ST,TY,AR1],S1],S22],S33]. This rule defines an entailment from C to CI by virtue of C containing a term hyponymous to a corresponding term in CI. We get a valid inference by replacing the hyponymous term by the hyperonymous term. Note that the output may again be input to this rule. For example: (14)
The sailor kissed Mary. therefore: A man kissed Mary. therefore: A male person kissed Mary. therefore: A person kissed Mary.
Similarly, Mary may be replaced by female person and the latter by person, thus yielding a number of further inferences, the "final" one of which is:
230 Universal logic
(15)
A person kissed a person.
However, through the meaning definition of kiss and the rule for leaving out satellites we get further inferences including: (16)
X kissed Y therefore: X touched Y with the lips therefore: X touched Y
These different entailments combine with each other to yield a great number of inferences for a sentence such as The sailor kissed Mary. entail(L,C,C1) :C=[P3,[[T,[], A, R ],CORE,S22],S33], C1=[P3,[[pres,neg,[],R],[[],[(TRUE],[],AR1], [[*],[*],[*M*]]],[[*M*],[*M*]]],S33], truth_pred(L,TRUE), AR1 = [ [ [ p r o p ] , T T , [ z e r o ] ] ] , TT=[[*,sg,*], [[[P3,[[T,neg,A,R],C0RE,S22],[*]],[inanim.masc]]] ]. truth_pred(eng,true). truth_pred(fre,vrai). truth_pred(dut,waar). This rule is the equivalent of the rule of double negation: (17)
ρ = not[not[p]]
The rule says that if we have a positive (= the polarity slot is [ ]) proposition C, we may convert it into the corresponding negative proposition CI, then build CI into a propositional term TT, and insert TT into a higher, negative proposition containing the "truth predicate" of the language in question (Eng. true, Fr. vrai, Dut. waar). This rule creates inferences of the following form: (18)
John walked. therefore: [a] That John did not walk is not true. [b] It is not true that John did not walk.
The alternative expressions [a] and [b] are automatically generated by the expression rules. This rule shows that simple rules of standard logic may require more elaborate formulation in Functional Logic. The bonus is, however, that the Functional Logic rules, via the underlying propositions, immediately relate actual sentences to each other. entaill(L,C,C1) :entail(L,C,C1).
The module UniLog 231
entaill(L,C,C2) : entail(L,C,Cl), entaill(L,C1,C2), not(C2=C). The different entailments may be placed in a series such that, if Ρ entails Q and 0 entails R, then Ρ entails R. This actually captures the "transitive" nature of the entailment relation. Note that, on further elaboration of Functional Logic, the serial connection may be derived from a meta-predicate "transitive", applying to a predicate such as "entail".
15.1.4 J . Properties of and relations between full clause structures All entailments were formulated as holding between propositions. The following rule defines a clause structure CI as an inference of C if CI contains a proposition which is entailed by the proposition in C: inference(L,C,Cl) : C=[P4,PR0P,S4], entail(L,PROP,PROP1), C1=[P4,PR0P1,S4].
15.1.4.4. Output mechanisms In order to get ProfGlot to exercise its logical competence several instructions have been formulated: infer(L) :repeat, clause_structure(L,C), specify_clause(L,C,X), ex2_clause(L,X,X1), n l , n l , wrlte('The following s e n t e n c e : ' ) , nl, nl, writelist(X1), nl, nl, w r l t e ( ' e n t a i l s : ' ) , nl, nl, (specify_clause(L,C,Y); (inference(L,C,C1), s p e c i f y _ c l a u s e ( L , C 1 , Y ) ) ) ; ex2_clause(L,Y,Y1), writelist(Y1), nl, fail. This instruction has the following effect: it creates an arbitrary clause structure C in language L, specifies and expresses it and writes the following on the screen (for example):
232 Universal logic
(19)
The following sentence: john kissed mary. entails: (list of inferences)
The inferences are of two kinds: [a] Inferences derived through re-specifying the clause structure. This yields the original sentence itself, plus all passive and "dative" variants. For example: (20) The following sentence: johtt gave the book to mary. entails: johtt gave the book to maty. the book was given by john to mary. john gave mary the book. mary was given the book by john. [b] Inferences derived through "inference", fed by the entailments defined. This will yield such output as: (21)
a male person gave the book to mary. a person gave the book to a female person, that john did not give the book to mary is not true. etc. etc.
All in all, a great number of inferences are in this way derived for any clause structure C generated by the "infer" procedure. explain(L) : deep_parse(L,C), inference(L,C,Cl), specify_clause(L,Cl,Y), ex2_clause(L,Y,Yl), n l , n l , w r i t e ( ' t h i s implies t h a t : ' ) , n l , n l , wrltelist(YI), nl, nl, fall. The instruction "explain" takes in an input sentence, deep-parses it, and presents all inferences which can be drawn from the sentence through the logical rules.
15.1.5. Knowledge base management Now that ProfGlot has been provided with capacities for generating and parsing sentences, translating these between different languages, and drawing logical inferences from them, we can start thinking about using these capacities in building up, maintaining, and updating a knowledge base. Our
The module UniLog 233
assumption here is that conceptual (as opposed to perceptual) knowledge is coded in Functional Grammar propositions (cf. Dik 1987c, 1987d, 1989a). The following rules give a first impression of how a knowledge base can be built up and consulted: remember(L) :deep_parse(L,Y), Y=[[decl,OP],PROP,S4], assertz(knowledge1(L,PROP)), nl, w r l t e ( ' I w i l l remember the p r o p o s i t i o n : ' ) , nl, nl, tab(4), write(PROP), nl, nl. This instruction tells ProfGlot to deep-parse an input sentence, extract its proposition, and add this at the end of the program as "knowledgel". ProfGlot tells us on the screen which proposition is being remembered. knowledge(L,X) :knowledgel(L,X). knowledge(L,Y) :knowledge(L,X), entaill(L,X,Y). "Knowledge" is defined as "knowledgel" (= those propositions which have been explicitly asserted into the program), plus any proposition which is entailed by knowledge. This expresses: what you know is whatever information is explicitly coded in your memory, plus whatever can be derived from this through logical inferencing: every proposition added to the program falls into a network of serially connected entailments, such that the information available widely exceeds the information actually transmitted. answer(L) :deep_parse(L,X,Υ), Y=[[interr,OP],PR0P,S4], ((knowledge(L,PROP), wrlte('Yes, that i s t r u e ' ) ) ; (not(knowledge(L,PROP)); write('No, that i s not t r u e ' ) ) ) , nl, nl. The instruction "answer" deep-parses an interrogative (Yes-No) sentence, extracts its proposition and checks whether that proposition is among the knowledge (whether explicitly listed or inferred) of the system. If so, it answers "Yes, that is true"; if not, it answers "No, that is not true". inform(L) :knowledge(L,Y), Y=[P3,EP,S3], specification(L,EP,EP1), X=[[decl,[ee,N]],[P3,EP1,S3],[*]], ex2_clause(L,X,Xl), w r i t e l i s t ( X 1 ) , nl, nl, fail.
234 Universal logic
"Inform" is equivalent to: "Tell me whatever you know". It asks ProfGlot to present all its (listed and inferred) knowledge. By "remember" the relevant proposition is added at the end of the program through "assertz". This information will be lost when we leave Prolog. If we want to preserve the knowledge which has been explicitly added to the program in this way, we can "save" it by reading it into a special file "know", by means of the following clause: save_knowledge : tell(know), listing(knowledgel), told. The file "know" will be preserved when execution is ended, and can be accessed again at a next session, through the clause: start_knowledge : reconsult(know).
15.2. Conclusion The present module only gives a very first impression of what an inferencing machine along the lines of Functional Logic could look like. But I think it is sufficient to show that the underlying propositions of Functional Grammar can indeed be used as a logical syntax and as a knowledge representation language. This makes for a tight integration of grammar, logic, and knowledge representation within the overall model of the natural language user.
Notes Notes to chapter 1. 1. Chapter 13 will clarify what is meant by "an extension of a subset". 2. Since this text was written it has been possible to develop a Danish, a German, and a Spanish module for ProfGlot. 3. LPA Prolog Professional Compiler, version 3.15,15 June 1990, available from Logic Programming Associates Ltd, London. 4. Whereas most Prolog interpreters require a suffix .PRO for Prolog programs (e.g. BASFAC.PRO etc.), LPA Prolog requires a suffix .DEC (BASFAC.DEC etc.). The ProfGlot modules described in this book are presented without suffixes (BasFac etc.).
Notes to chapter 2. 1. Note that predicates can be formed of several words if these are joined by "_". 2. Cf. Bratko (1986).
Notes to chapter 3. 1. This version incorporates the improvements proposed in Hengeveld (1989). 2. Extensive argumentation for the layered structure of the clause can be found in Hengeveld (1989), Dik (1989f), and Dik, Hengeveld, Vester, and Vet (1990). 3. Compare Connolly (1986).
Notes to chapter 4. 1. Cf. Van der Korst (1989). 2. Cf. Kwee (1979) for this type of strategy.
Notes to chapter 5. 1. The definition of "length" is a built-in feature of LPA-Prolog.
236 Notes
Notes to chapter 6. 1. See Mackenzie (1981).
Notes to chapter 9. 1. Cf. van der Korst (1989) for such an approach. 2. This form does occur in English, but mostly in more specialized senses. 3. A satellite of polarity was introduced in the Danish version of ProfGlot, when it seemed that negation could be better described as the expression of a satellite than as the expression of an operator. This satellite position is not further used in the present grammar. 4. For discussion of how to create an appropriate relative clause structure, see Kwee (1981).
Notes to chapter 10. 1. Cf. Connolly (1986, 1989). 2. The single apostroph in John's must be written ", otherwise it could be taken as the end of an expression between single quotes: '@#$*!\ which indicates the literal form of the expression in question. Notes to chapter 11. 1. I am grateful to Peter Kahrel for the present formulation of the sandhi rules.
Notes to chapter 13. 1. We shall see in the description of the actual parser that language-dependent parameters do occur at certain points. 2. The built-in LPA-predicate "concat" cannot be used for this purpose.
Notes to chapter 14. 1. The first translator of the type discussed here was built by Van der Korst (1987, 1989).
References
[WPFG = Working Papers in Functional Grammar, available through the Institute for General Linguistics, University of Amsterdam] Auwera, Johan Van der, and Louis Goossens (eds.) 1987 Ins and outs of the predication. Dordrecht: Foris. Bakker, Dik 1988a "De implementatie van een grammaticamodel" ["The implementation of a model of grammar"], Tijdschrift voor Taal- en Tekstwetenschap 8: 239-260. 1988b Prolog voor alfa's. [Prolog for humanists, Dept. of Computational Linguistics, University of Amsterdam]. 1989 "A formalism for Functional Grammar expression rules", in: Connolly and Dik (eds.), 45-63. 1990 "A Functional Grammar machine", in: Hannay and Vester (eds.), 229-250. Bakker, Dik, Bieke van der Korst, and Geijan van Schaaik 1988 "Building a sentence generator for teaching linguistics", in: Zock and Sabah (eds.), vol. 2: 159-174. Bolkestein, A. Machtelt, Jan Nuyts, and Co Vet (eds.) 1990 Layers and levels of representation in language theory: a functional view. Amsterdam: Benjamins. Bratko, Ivan 1986 Prolog programming for artificial intelligence. Reading, Mass.: AddisonWesley. Capel, Casper and Didan Westra 1987 LIKE: Linguistic Knowledge Base Environment. [MA Thesis, Dept. of Mathematics and Computer Science, Free University Amsterdam]. Clocksin, W. F., and C. S. Mellish 1987 Programming in Prolog. Heidelberg: Springer. Connolly, John H. 1986 "Testing Functional Grammar placement rules using Prolog", International Journal of Man-Machine Studies 24: 623-632. 1989 "Functional Grammar and artificial intelligence", in: Connolly and Dik (eds.), 217-228. Connolly, John H., and Simon C. Dik (eds.) 1989 Functional Grammar and the computer. Dordrecht: Foris. Covington, Michael Α., Donald Nute, and Andre Vellino 1988 Prolog programming in depth. Glenview, 111.: Scott, Foreman and Company. Des Tombe, Louis 1987 Inleiding deklaratief programmeren (Prolog). [Introduction to declarative programming (Prolog), Dept. of Computational Linguistics, University of Utrecht]. Dignum, Frank 1989a "Parsing an English Text Using Functional Grammar", in: Connolly and Dik (eds.), 110-134. 1989b A language for modelling knowledge bases; based on linguistics, founded in logic. [Diss. Free University Amsterdam].
238 References
Dignum, Frank, T. Kemme, W. Kreuzen, Η. Weigand, and R.P. van de Riet 1987 "Constraint modelling using a conceptual prototyping language", Data and Knowledge Engineering 2: 213-254. Dik, Simon C. 1978a Functional Grammar. Amsterdam: North-Holland [3d printing, Dordrecht: Foris]. 1978b Stepwise lexical decomposition. Lisse: Peter de Ridder Press [Available through WPFG], 1979 "How to arrive at a procedure for mechanical translation in terms of Functional Grammar". [Paper, Institute for General Linguistics, University of Amsterdam]. 1980 "Seventeen sentences: basic principles and application of Functional Grammar", in: Moravcsik and Wirth (eds.), 45-75. 1987a "Functional Grammar and its potential computer applications", in: W. Meijs (ed.), Corpus Linguistics and Beyond, 253-268. Amsterdam: Rodopi. 1987b "Generating answers from a linguistically coded knowledge base", in: Kempen (ed.), 301-314. 1987c "Linguistically motivated knowledge representation", in: M. Nagao (ed.), Language and artificial intelligence, 145-170. Amsterdam: North-Holland. 1988 "Concerning the logical component of a natural language generator", in: Zock and Sabah (eds.), vol. 1: 73-91. 1989a "Towards a unified cognitive language", in: Heyvaert and Steurs (eds.), 97110. 1989b "FG*C*M*NLU: Functional Grammar Computational Model of the Natural Language User", in: Connolly and Dik (eds.), 1-28. 1989c "Relational reasoning in functional logic", in: Connolly and Dik (eds.), 273288. 1989d "Idioms in a computational functional grammar", in: M. Everaert and E.-J. van der Linden (eds.), Proceedings of the first Tdburg Workshop on Idioms, 41-55. Tilburg: Tilburg University. 1989e "The lexicon in a computational Functional Grammar" [Paper, Institute for General Linguistics, University of Amsterdam]. 1989f The theory of Functional Grammar. Part 1: The structure of the clause. Dordrecht: Foris. 1990 "How to build a natural language user", in: Hannay and Vester (eds.), 203215. Dik, Simon C., Kees Hengeveld, Elseline Vester, and Co Vet 1990 "The hierarchical structure of the clause and the typology of adverbial satellites", in: Bolkestein, Nuyts, and Vet (eds.): 25-70. Dik, Simon C., and Peter Kahrel 1991 ProCourse: a Prolog course for linguists. Amsterdam: Amsterdam Linguistic Software. Gatward, Richard A. 1989 "Implementation efficiency considerations in parsing Functional Grammar", in: Connolly and Dik (eds.): 77-91.
References 239
Gatward, Richard Α., and Peter J. Hancox 1990 "Functional Grammar as a unification grammar: is it a worthwhile investigation?", in: Hannay and Vester (eds.): 217-228. Gatward, R.A., S.R. Johnson, and J.H. Connolly 1986 "A natural language processing system based on Functional Grammar", in: Proceedings of the International Conference on Speech Input/Output: Techniques and Applications, London, 24-26 March 1986. IEE Conference Publication No. 258, 125-128. London: IEE. Gebruers, Rudi 1987 "Functional Grammar: a useful thinking tool for machine translation research?", in: Heyvaert and Steurs (eds.): 371-385. Groot, Casper de, and Machiel J. Limburg 1986 "Pronominal elements: diachrony, typology and formalization in Functional Grammar". WPFG 12. Hannay, Michael, and Elseline Vester (eds.) 1990 Descriptive and computational applications of Functional Grammar. Dordrecht: Foris. Hengeveld, Kees 1987 "Clause structure and modality in Functional Grammar", in: Auwera and Goossens (eds.): 53-66. 1989 "Layers and operators in Functional Grammar". Journal of Linguistics, 25, 127-157. 1990 "The hierarchical structure of utterances", in: Bolkestein, Nuyts, and Vet (eds.): 1-23. Hesp, Cees 1990 "Hie Functional Grammar computational natural language user and psychological adequacy", in: Bolkestein, Nuyts, and Vet (eds.): 295-308. Heyvaert, F.J., and F. Steurs (eds.) 1989 Worlds behind words; essays in honour of Prof Dr. F.G. Droste on the occasion of his sixtieth birthday. Leuven: University of Leuven. Hoekstra, Τ., H. van der Hulst, and M. Moortgat (eds.), 1981 Perspectives on Functional Grammar, 175-189. Dordrecht: Foris. Janssen, Theo M.V. 1989 "Towards a universal parsing algorithm for Functional Grammar", in: Connolly and Dik (eds.): 65-75. Kempen, Gerard (ed.) 1987 Natural language generation; new results in artificial intelligence, psychology and linguistics. Dordrecht: Martinus Nijhoff. Korst, Bieke van der 1987 "Twelve sentences; a translation procedure in terms of Functional Grammar". WPFG 19. 1989 "Functional Grammar and machine translation", in: Connolly and Dik (eds.): 289-316.
240 References
Kwee Tjoe Liong 1979 A68-FG(3); Simon Dik's funktionele grammatika geschreven in algol 68 versie nr. 03. [Simon Dik's functional grammar written in algol68]. Publications of the Institute for General Linguistics 23, University of Amsterdam. 1981 "In search of an appropriate relative clause", in: Hoekstra et al. (eds.): 175-189. Dordrecht: Foris. 1987 "A computer model of Functional Grammar", in: Kempen (ed.): 315-331. 1988a "Natural language generation: one individual implementer's experience", in: Zock and Sabah (eds.), vol. 2: 98-120. 1988b Computational theoretical linguistics: generating (English) sentences in (Dik's) Functional Grammar. D. Bojadziev, P. Tancig, and D. Vitas (eds.), Proceedings of the Vlth (Yugoslav) Conference (on) Computer Processing (and) Society of Applied Linguistics of Slovenija. 75-90. 1989 "An ATN parser for English FG? Or maybe an active chart?", in: Connolly and Dik (eds.): 93-107. Lakoff, George A. 1970 Irregularity in syntax. New York: Holt, Rinehart and Winston. Mackenzie, J. Lachlan 1981 "Functions and cases", in: Hoekstra et al. (eds.): 299-318. Dordrecht: Foris. 1987 "The representation of nominal predicates in the fund". WPFG 25. Meijs, Willem 1988 "Knowledge-activation in a large lexical data-base: problems and prospects in the LINKS-project". Amsterdam Papers in English, 1. Dept. of English, University of Amsterdam. 1989 "Spreading the word: knowledge-activation in a functional perspective", in: Connolly and Dik (eds.): 201-215. Moravcsik, Edith, and Jessica R. Wirth (eds.) 1980 Syntax and Semantics 13: Current Approaches to Syntax. New-York etc.: Academic Press. Riet, R.P van de 1989 "Van relaties naar realiteit; een linguistische benadering van kennisbanken". ["From relations to reality; a linguistic approach to knowledge banks"], Informatie 3: 901-1016. Samuelsdorff, Paul O. 1989 "Simulation of a Functional Grammar in Prolog", in: Connolly and Dik (eds.): 29-44. Voogt-van Zutphen, Hetty 1987 "Constructing an FG lexicon on the basis of LDOCE". WPFG 24. 1989 "Towards a lexicon of Functional Grammar", in: Connolly and Dik (eds.): 151-176. Vossen, Piek 1989 "The structure of lexical knowledge as envisaged in the LINKS-project", in: Connolly and Dik (eds.): 177-199.
References 241
Weigand, Hans 1986 "An overview of the conceptual language KOTO". Report nr. IR-112. Dept. of Mathematics and Information Theory, Free University Amsterdam. 1987 "Functional Grammar as a formal language", in: Auwera and Goossens (eds.): 179-194. 1989 "A dialectical model of modality", in: Connolly and Dik (eds.): 247-271. 1990 Linguistically motivated principles of knowledge base systems. Dordrecht: Foris. Zock, Michael, and Gerard Sabah (eds.) 1988 Advances in natural language generation; an interdisciplinary perspective. 2 vols. London: Pinter Publishers.
Index and brief description of Prolog predicates
This index contains an alphabetical list of all the Prolog predicates used in the ProfGlot program. Only the major occurrences of these predicates are mentioned, i.e. the places where they are defined and explained. Wherever the variable "L" is used, this means 'in the relevant language(s)'. Variables which stand for integers are indicated by Ν, M. addj>ender(L,A,B) 218 adds Gender to the agreement features of predication A if required by target language L, yielding output structure Β add_mood(A,B) 217f subjunctive mood is added to the Tense of embedded predication/proposition A, yielding Β adjust_ext_pred (L, A,B) 218 adjusts the agreement features of predication A to those required by the target language L, yielding predication Β adjustrel (L, A,B, C) 216 adjusts predication Β to the Type features A of target language L, yielding predication C adjustmood (L,A,B) 217f adjusts the mood of embedded term A, yielding embedded term Β in accordance with the requirements of the target language L adjust_so(L,A,B) 216f adjusts predication A to the subject-object possibilities in target language L, yielding predication Β adjust_type(L,A,B) 216 adjusts the Type of a noun A to the Type Β of the corresponding noun in the target language L add_redundant_features(A,B) 70 redundant features can be added to the type of the predicate frame A, yielding Β adj(N) 40 the chance for adjectives is Ν adj_affix(L,A,B,C) 157 express the influence of features A on adjective stem Β as C adjustspell ing(L, A,B) 144,167 adjust the spelling of list A to yield list Β affix(A,B,C) 14 affixing string A to string Β yields string C anaphora(L,A,B) 28,105 an anaphorical element in predication A is adjusted to a potential antecendent, yielding predication Β after adjustment
244 Index of Prolog predicates
answer(L) 233 deep-parse an input sentence and report whether the prepositional knowledge it contains is already known to the system arb_attr(L,A) 90 choose an arbitrary attributive basic or derived adjective or participle arb_noun(L,A,B) 90 choose an arbitrary noun A with selectional features Β arb_pred(L,A) 71,80 choose an arbitrary predicate A to be used in predicative position aspect(A) 83 A is a value for Perfect aspect assert(A) 15 clause A is added to the program asserta(A) 15 clause A is added at the beginning of the program assertz(A) 15 clause A is added at the end of the program attoperator(A) 95 A is a value for the attitudinal operator of level 3 auxiliary(L,A,B) 185 Β is an auxiliary verb for expressing operators A basic_hyponym(L,A,B) 223f noun A is a basic hyponym of noun Β bpreda(L,A) 21,47,58,64 A is a basic adjectival predicate bpredn(L,A) 21,45f,57f,63f A is a basic nominal predicate bpredv(L,A) 19,21,47,58,64f A is a basic verbal predicate bpredvm(L,A) 49,58,65 A is a basic verbal matrix predicate bqterm(A) 87 A is a basic questioned term bterm(A) 86 A is a basic term choose_arb_adj(L,A) 71 choose an arbitrary adjectival predicate A choose_arb_verb(L,A) 71 choose an arbitrary verbal predicate A choose_random(A,B) 71 choose a radom item A from list Β
Index of Prolog predicates 245
clause_specify(L,A,B) 28,100 clause structure A is specified to yield fully specified clause Β clause_structure(L,A) 98 A is a clause_structure conc(A,B.C) 12,41 list A and list Β concatenate to yield list C concat(A,B,C) 16 list or string A concatenate with list or string Β to yield list or string C consonant(A) 145 A is a consonant 224f converse(L,A,B) Β is the converse relation of A converse3(L,A,B) 224f Β is the converse relation of a three-place relation A 167 convert(A,B) remove all occurrences of "1" from A, yielding Β 145,154,167 copula(L,A) A is a copular verb 29,107 copula_support(L,A,B) non-verbal predicates in predication A are provided with a copula, yielding predication Β core_pred_schema(L,A) 80 A is a schema for the core predication 15,41 counter(N) running counter Ν = 1, 2, 3, 4, 5, ... 179 decompose(A,B,C) form A can be decomposed into two parts Β and C 177,191 deep_parse(L,A,B) parse sentence A onto its underlying clause structure Β 200 deep_return(L,A,B,C) deep-parse input sentence A onto fully specified clause Β and deep-return Β as sentence C deep_return_clause(L,A,B) 200 express clause structure A and write it as a sentence on the screen deep_translate(Ll,L2) 210 f deep-parse an inpu sentence in LI and deep-translate this into L2 49,59,65 degree (L, A) A is a degree adverb 41 delete(A,B,C) delete element A from list Β to yield list C 89 dem(A,B) A is a value for definiteness/demonstratives/quantifiers, sensitive to conditions Β
246 Index of Prolog predicates
derivational_steml(L,A,B) 145,164 Β is a derivational stem-1 of A (e.g. A = give, Β = giv) derivational_stem2(L,A,B) 145,165 Β is a derivational stem-2 of A (e.g. A = city, Β = citie) do_all_terms(L,A,B,C) 92 perform an operation A on all term positions of a predication Β to yield predication C do_one_term(L,A,B, C) 92 perform an operation A on one term position of predication Β to yield predication C emb(N) 40 the chance for embedded propositions/predications is Ν emb_place(L,A,B) 134 assign appropriate positions to all constituents of list A in the environment of an embedded construction, to result in list Β entail(L,A,B) 215,227f proposition A immediately entails proposition Β entaill(L,A,B) 230f proposition A entails proposition Β eq(A,B,C) 219f A, B, and C are equivalent between the three languages eq(Ll,L2,A,B) 219 A and Β are equivalent between LI and L2 eq_adj(Ll,L2,A,B) 211f A and Β are equivalent adjectives between LI and L2 eq_clause(Ll,L2,A,B) 211 A and Β are equivalent clause structures between LI and L2 eq_core(Ll,L2,A,B) 214 A and Β are equivalent core predications between LI and L2 eq_ext_pred(Ll,L2,A,B) 215 A and Β are equivalent extended predications between LI and L2 eq_full_term(Ll,L2,A,B) 212 A and Β are equivalent full terms between LI and L2 eq_full_terms(Ll,L2,A,B) 212 A and Β are equivalent series of full terms between LI and L2 eq_nucleus(Ll,L2,A,B) 214 A and Β are equivalent nuclear predications between LI and L2 eq_pred(Ll,L2,A,B) 213f A and Β are equivalent predicates between LI and L2 eq_prop(Ll,L2,A,B) 215 A and Β are equivalent propositions between LI and L2
Index of Prolog predicates 247
eq_restrl(Ll,L2,A.B) 211 A and Β are equivalent restrictors-1 (nominal restrictors) between LI and L2 eq_restr2(Ll,L2,A,B) 211 A and Β are equivalent restrictors-2 (adjectival restrictors) between LI and L2 eq_restr3(Ll,L2,A,B) 212 A and Β are equivalent restrictors-3 (adpositional modofieis) between LI and L2 eq_restr4(Ll ,L2,A,B) 212 A and Β are equivalent restrictors-4 (relative clause structures) between LI and L2 eq_sat(Ll,L2,A,B) 213 A and Β are equivalent satellites between LI and L2 eq_sats(Ll,L2A,B) 211 A and Β are equivalent series of satellites between LI and L2 eq_term(Ll,L2,A,B) 21 If A and Β are equivalent terms between LI and L2 equi(L,A,B) 29,106 marks an embedded anaphorical argument in predication A in appropriate conditions as "equi", yielding predication Β equi_subst(A,B) 107 one of the arguments in A is marked "equi" in the relevant circumstances eventreferent(A) 83 A is a referential index for the State of Affairs designated by the extended predication ex_adj(L,A,B) 112f,148,157 express basic or derived adjective A in form Β e x a d j 1(L,A,B,C) 148,157 express adjectival stem Β with features A as C ex_adj2(L,A,B,C) 148 express the effect of Number value A on adjectival stem Β to yield C ex_aspect(L,A,B) 127,151,161 epress the effect of Perfect aspect on the input A as Β ex_attitude(L,A,B) 127,151,162 express the effect of attitudinal operators on input structure A as Β ex_clause(L,A,B) 143 take a fully specified clause A and express it as Β ex2_clause(L,A,B) 143 express a fully specified clause A as Β ex_det(L,A,B) lllf,147,166f express the term operators A in the determiner Β ex_emb_prop(L,A,B) 143 express an embedded proposition or predication A as Β ex_full_term(L,A,B) 117f,158f,201 express a single full term ( = a term including function(s)) A as Β
248 Index of Prolog predicates
ex_full_terms(L,A,B) 117 express a series of full terms (= including function(s)) A as Β ex_fun(L,A,B) 113f,149,158 express function(s) A as Β ex_funl(L,A,B) 159 expression of function A in postpositionally marked term Β in Dutch ex_illo(L,A,B) 128,152,162 express the effect of illocutionary operators on the input structure A as Β ex_inf(L,A,B) 130,153,164 express the infinitival predicate A as Β ex_inf_complex(L,A,B) 125 express the operators and stem in A in the verbal complex Β of an infinitival construction ex_nom(L,A,B) 130 express the nominalized predicate A as Β ex_nom_clause(L,A,B) 144 express a nominalized clause A as Β ex_nom_polarity(L,A,B) 130,164 express the effect of Polarity operators on input structure A as Β within the domain of a nominalisation ex_nom_verbal_complex(L,A,B) 125 express the operators and stem in A in the verbal complex Β of a nominalisation ex_noun(L,A,B) lll,147,155f express derived noun structure A as stem Β ex_number(L,A,B) 111,147,155 express number on noun stem A to yield Β ex_pap(L,A,B) 126,152f,163 express the past participle A as Β expl(L,A) 140,154,167 A is an expletive element explain(L) 232 deep-parse an input sentence and give all the inferences derivable from its underlying structure ex_pol(L,A,B) 113,148,157 express the polarity A of the comparative adjective as Β ex_pol arity(L,A,B) 127,152,162 express the effect of Polarity operators on input structure A as Β ex_poss_pro(L,A,B) 115,150 express the abstract possessive pronoun A as Β ex_predadj(L,A,B) 126,148,157 express the predicative adjective A as Β
Index of Prolog predicates
249
ex_progr(L,A,B) 126,151,160 express the effect of Progressive on the input structure A as Β ex_prp(L,A,B) 126,152,164 express the present participle A as Β ex_restr2(L,A,B,C) 112,148 express restrictor Β with features A as form C ex_satl(L,A,B) 122 express a satellite A of level 1 as Β ex_satlm(L,A,B) 121f,151,160 express a manner satellite A as Β ex_sat2cog(L,A,B) 123 express a cognitive satellite A of level 2 as Β ex_sat21oc(L,A,B) 122 express a locative satellite A of level 2 as Β ex_sat3(L,A,B) 123,151,160 express a satellite A of level 3 as Β ex_sat4(L,A,B) 123 express a satellite A of level 4 as Β ex_sat2pol (L,A,B) 123 express a polarity satellite A of level 2 as Β ex_sat2temp(L,A,B) 123 express a temporal satellite A of level 2 as Β ex_satsl(L,A,B) 121 express a series A of satellites of level 1 as Β ex_sats2(L,A,B) 122 express a series A of satellites of level 2 as Β extend(A,B) 70 the feature A can be extended with the redundant features in list Β ex_tense(L,A,B) 128f,152,162f express the effect of Tense on the input structure A as Β ex_term(L,A,B) 114f,149f express term structure A as Β ext_pred(L,A) 85 A is an extended predication ext_pred_schema(L,A) 82 A is a schema for the extended predication extract(A,B,C) 42 extract sublist A from list Β to yield list C ex_verbal_complex(L,A,B) 124 express the series of operators and the stem A in the verbal complex Β ex_voice(L,A,B,C) 125f.151.160 express the Voice on the stem Β to yield C, subject to conditions A
250 Index of Prolog predicates
factive_marker(L,A) 119,154,167 A is a marker used to express a factive proposition fail 10 built-in Prolog predicate which is always interpreted as 'false' or 'failure' final_devoicing(A,B) 166 the final consonant of form A is devoiced, yielding form Β find_adj(L,A,B,C) 178f finds an adjective in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_aux(L,A,B,C) 184 finds an auxiliary in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_bterm(L,A,B,C) 181 finds a basic term in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_determiner(L,A,B,C) 177f finds a determiner in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_full_term(L,A,B,C,D) 182 finds a full term in initial position in list Β in parsing mode A, leaving a rest-list C, and reconstructing its underlying structure in D find_illo(L,A,B,C) 189 identify the interpunction A, interpret its illocutionaiy value, and insert this into schema Β to yield schema C. find_inf(L,A,B,C) 185 finds an infinitive in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_mterm(L,A,B,C) 183 finds a term with adpositional modifier in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_noun(L,A,B,C) 180 finds a basic or derived noun in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_pap(L,A,B,C) 185f finds a past participle in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_pred(L,A,B,C) 187f finds a predicate complex in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_prep(L,A,B,C) 182 finds a preposition in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C
Index of Prolog predicates 251
find_prop(L,A,B,C,D) 192 finds a proposition in initial position in list Β in parsing mode A, leaving a rest-list C, and reconstructing its underlying structure in D find_prp(L,A,B.C) 185 finds a present participle in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_pterm (L,A.B.C, D) 182 finds a prepositional term in initial position in list Β in parsing mode A, leaving a rest-list C, and reconstructing its underlying structure in D find_rest(L,A,B»C) 194 = find_pap(L,A,B.C), find_prp(L,A,B,C), or find_inf(L,A,B,C) findterm (L, A,B,C,D) 180 finds a term in initial position in list Β in parsing mode A, leaving a rest-list C, and reconstructing its underlying structure in D find_satlb(L,A,B.C) 184 finds a beneficiary satellite in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_satld(L,A,B.C) 184 finds a directional satellite in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_satli(L,A,B.C) 183 finds an instrumental satellite in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_sat21oc(L,A,B,C) 184 finds a locative satellite in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_sat2temp(L,A,B,C) 184 finds a temporal satellite in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_sat3(L,A,B,C) 184 finds a satellite of level 3 in initial position in list A, leaving a rest-list B, and reconstructing its underlying structure in C find_sub(L,A,B) 182 finds a subordinator in initial position in list A, leaving a rest-list Β flatten(A,B) 43 remove all internal brackets (including empty lists) from list A to yield the "flat" list Β formally_ex_clause(L,A,B) 130 formally express the constituents of fully specified clause clause A as Β formally_ex_emb_prop(L,A,B) 131 formally express the constituents of an embedded proposition A as Β
252 Index of Prolog predicates
formally_ex_inf(L,A.B) 131 formally express thhe constituents of an infinitival construction A as Β fonmally_ex_nom_clause(L,A,B) 131 formally express thhe constituents of a nominalized construction A as Β full_place(L,A,B) 133 place all the constituents of list A in appropriate positions in list Β fully_specified_clause(L,A) 69,99 A is a fully specified clause gamble(A) 40 gamble on setting A gender(L,A) 104 A is a value for Gender in L gi ve_al l_expr( L,X) 209 give all possible expressions for fully specified clause X go(L) 110,144 generate sentences in L hyperonym(L,A,B) 226 noun A is a hyperonym of noun Β hyponym(L,A,B) 226 noun A is a hyponym of noun Β hyponymous_term(L,A,B) 226f term A is hyponymous to term Β identify_lemma(L,A,B) 187 identify the lexical lemma (the predicate frame) for a given form illo(A) 98 A is a illocutionary value illooperators(A) 98 A are illocutionaiy operators illoreferent(A) 98 A is a referential index for the speech act infer(L) 231 make a clause structure CS in L and give all the inferences derivable from CS inference(L,A,B) 231 clause Β is a potential inference of clause A by virtue of entailment infinitival_expression(L,A,B) 144 express an infinitival construction A as Β inform(L) 233 inform the user of all the knowledge you possess insert(A,B,C) 41 insert element A into list Β to yield list C
Index of Prolog predicates 253
knowledge(L,A) 233 A is knowledge which has been transmitted to the system, or knowledge inferrable from knowledge length(A,N) 42 the length of list A is Ν liquid(A) 145 A is a liquid consonant list_perform(L,A,B,C) 92 perform operation A on a list of term positions Β to yield list C match2(A,B) 178 form A and form Β have two initial matching letters match_terms(A,B,C,D,E) 197f in Voice A and parsing mode Β the terms in the buffer C can be associated with the argument positions D to yield argument series Ε mean(L,A,B) 22,52,59 A means B, i.e., can be paraphrased through Β mean(L,A,B,C) 22,51f,59 A means B, with differentia C member(A,B) 11,41 A is a member of list Β modf(N) 40 the chance for attributive adpositional modifiers is Ν mood_add(A,B) 97 subjunctive mood is added to the Tense of predication A, yielding predication Β name(A,B) 13 when A is a string of letters, Β is a list of the ASCII codes corresponding to these letters nasal(A) 145 A is a nasal consonant newchance(A.N) 39 define a new chance Ν for a setting A nl 10 built-in Prolog predicate 'go to a new line' nominalisation(L,A,B) 120 nominalize predication A to yield Β nth(AB) 41 A is the nth member of list Β num(A) 89 A is a value for number number_agreement(A,B) 103 there is agreement in Number between A and Β
254 Index of Prolog predicates
obstruent(A) 145 A is an obstruent consonant pap_affix(L,A,B) 164 Β is the affixed form of past-participle stem A pap_stem(L,A,B) 163 Β is the past-participle stem of A paradigm (L,A,B)
22,38,53f,60f,66f Β contains irregular forms of A paradigm 1(L,A,B)
22,38,53f,60f,66f idem paradigm2(L,A) 55,62,68 A contains ready-made interrogative adverbs pass(L,A,B,C,D,E) 193f make a pass in mode A through the initial part of input list B, leaving a rest-list C, and insert the resulting analysis in structure D to yield structure Ε pass_list(L,A,B,C,D,E) 192f make a sequence of passes in mode A through input list B, leaving a rest-list C, and insert the resulting analysis in structure D to yield structure Ε perform(L,A,B,C) 92f,99f perform an operation A on a term position Β to yield term position C person_agreement(A,B) 103 there is agreement in Person between A and Β perspro(A) 106 A is an abstract personal pronoun placel(L,A,B,C) 135f place the finite and non-finite verb and the satellite-4 in the ordering template C, leaving constituents Β to be placed by later placement rules place2(L,A,B,C,D) 136f place Pl-constituents from A in initial position in B, yielding D, and leaving rest C place3(L,A,B,C,D) 137 place est-ce que in initial position of interrogatives in French, yielding D and leaving rest C place4(L,A,B,C,D) 137f place satellites-3 from A in appropriate positions in B, yielding D and leaving rest C place5(L,A,B,C,D) 138 placement not used in the present grammar place6(L,A,B,C,D) 138f place satellites of level 2 from A in appropriate positions in B, yielding D and leaving rest C
Index of Prolog predicates 255
place7(L,A,B,C,D) place a subject clause from A in appropriate position in B, yielding D rest C place8(L,A,B,C,D) place an object clause from A in appropriate position in B, yielding D rest C place9(L,A,B,C,D) place the subject term from A in appropriate position in B, yielding D rest C place 10(L,A,B,C) place an object term and remaining arguments from A in appropriate B, yielding C. polarity(A) A is a value for (positive or negative) polarity poss_paradigm(L,A) list A contains the forms of possessive pronouns predl(N) the chance for Progressive aspect is Ν pred2(N) the chance for Past tense is Ν pred3(N) the chance for Perfect aspect is Ν pred4(N) the chance for positive and negative polarity is Ν pred5(N) the chance for propositional operators is Ν preda(L,A) A is an adjectival predicate with redundant features specified predaa(L,A) A is a derived attributive participle predader(L,A) A is a derived comparative predicate for attributive usage predaderl(L,A) A is a derived adjectival phrase consisting of degree + adjective predaderpred(L,A) A is a derived comparative predicate for predicative usage predforml(N) the chance for Agent nouns is Ν predform2(N) the chance for attributive present participles is Ν predform3(N) the chance for comparatives is Ν
140 and leaving 140f and leaving 141f and leaving 142 positions in 83 61 40 40 40 40 40 70 75 78 78 77 40 40 40
256 Index of Prolog predicates
predform4(N) the chance for degree adverb + adjective is Ν predform5(N) the chance for term predicates is Ν predform6(N) the chance for adpositional predicates is Ν predform8(N) the chance for (Dutch) diminutives is Ν predic_operators((L,A) A are predication operators (level 2) predicate_operators(L,A,B) A is a predicate operator sensitive to conditions Β predn(L,A) A is a nominal predicate with redundant features specified predndim(L,A,B) A is a derived diminutive noun with selectional features Β prednn(L,A,B) A is a derived Agent noun with selectional features Β predsat(L,A) A is a derived satellite predicate predt(L,A) A is a derived term predicate preposing_parameter(L,A) A determines whether adpositional restrictor may be preposed to noun predv(L,A) A is a verbal predicate with redundant features specified predvm(L,A) A is a verbal matrix predicate with redundant features specified prevocalic_stem(A,B) Β is the prevocalic form of stem A pro(N) the chance for pronouns is Ν pronoun(A) A is an (abstract) pronominal element prop(L,A) A is a proposition propoperators(A) A are propositional operators propreferent(A) A is a referential index for a proposition qext_pred(L,A) A is a questioned extended predication
40 40 40 40 83 81 70 74 73 79f 78 116,167 69 70 166 40 117 94 95 95 85
Index of Prolog predicates 257
qterm(A) 88 A is a questioned term with redundant features specified que(N) 40 the chance for questions is Ν reflexive(L,A,B) 28,106 an anaphorical term in predication A which qualifies as such is marked "reflexive", yielding predication Β rel(N) 40 the chance for relative clauses is Ν relative(L,A,B,C) 91 A is a relative clause structure compatible with the head noun Β and the Number of the term phrase C remember(L) 233 deep-parse an input sentence onto an underlying clause structure, take out the propositional part of this structure, and remember it as knowledge remove_mood(A,B) 218 removes subjunctive mood from embedded predication/proposition A, yielding predication/proposition Β remove_voice(A,B) 196 remove the voice from the nucleus of clause A to yield clause Β restr2(L,A) 90 A is an adjectival restrictor within the term structure restr3(L,A) 91 A is an adpositional restrictor within the term structure retract (A) 15 retract clause A from the program reverse(A,B) 42 reverse the members of list A to yield the mirror image list Β rf(A) 89 A is a referential index for first-order entities same_gender(A,B) 106 type A and type Β have the same value for Gender sandhi(L,A,B,C)
134,154f apply sandhi adjustment to A and B, yielding C sandhi2(L,A,B,C) 154 idem sandhi Jist(L,A,B) 133 effect all sandhi adjustments between constituents of list A to result in list Β satl(N) 40 the chance for satellites of level 1 is Ν satlb(L,A,B) 82 A is a beneficiaiy satellite sensitive to conditions Β
258 Index of Prolog predicates
satld(L,A,B) 82 A is a directional satellite sensitive to conditions Β satli(L,A,B) 82 A is a instrumental satellite sensitive to conditions Β satlm(L,A,B) 81 A is a manner satellite sensitive to conditions Β sat2(N) 40 the chance for satellites of level 2 is Ν sat2cog(L,A) 84 A is a cognitive satellite of level 2 sat21oc(L,A,B) 83 A is a local satellite of level 2 sensitive to conditions Β sat2pol(L,A) 84 A is a polarity satellite of level sat2temp(L,A,B) 50,59,66,83f A is a temporal satellite of level 2 sensitive to conditions Β sat3(N) 40 the chance for satellites of level 3 is Ν sat3(L,A,B,C) 50,59,66,95 A is a satellite of level 3 sensitive to conditions Β and C sat4(N) 40,59 the chance for satellites of level 4 is Ν sat4(L,A,B) 50,66,98 A is a satellite of level 4 sensitive to conditions Β satsl(L,A,B) 81 A are satellites of level 1 sensitive to conditions Β sats2(L,A,B,C) 83 A are satellites of level 2 sensitive to conditions Β and C save_knowledge 234 file all the knowledge you have for later retrieval sibilant(A) 145 A is a sibilant consonant sol(A,B) 101 subject assignment to sole first argument turns nucleus A into nucleus Β so2(A,B) 101 subject assignment to first argument turns nucleus A into nucleus Β so3(A,B) 102 subject assignment to first argument and object assignment to second argument turns nucleus A into nucleus Β so4(A,B) 102 subject assignment to second argument turns nucleus A into nucleus Β
Index of Prolog predicates 259
so5(A,B) 102 subject assignment to first argument and object assignment to third argument turns nucleus A into nucleus Β so6(A,B) 102 subject assignment to third argument turns nucleus A into nucleus Β soassign(L,A,B) 101 potential subject/object assignments for L, turning nucleus A into nucleus Β sonant(A) 145 A is a sonant consonant specification(L,A,B) 99 extended predication A is specified to yield extended predication Β specify_clause(L,A,B) 100 apply specification to a given clause structure startknowledge 234 consult knowledge previously stored stem_fusion(A,B) 165 elements of list A are fused into form Β subl(L,A) 119,154,167 A is a subordinator-1 of L subj_obj_assignment(L,A,B) 28,101 subj and object are assigned to the arguments of predication A, yielding predication Β subj_stem(L,A,B,C) 60f,152 C is the subjunctive stem of verb Β in tense A sublist(A,B) 42 list A is a sublist of list Β subsat(N) 40 the chance for satellites in the form of subordinate clauses is Ν subst(A,B,C,D) 42 substitute all occurrences of element A is list Β by element C to yield list D tense(A) 83 A is a value for Tense term(L,A,B) 87f,96f A is a term structure with selectional features Β terml(N) 40 the chance for indefinite terms is Ν term2(N) 40 the chance for demonstratives is Ν term3(N) 40 the chance for quantifiers is Ν term4(N) 40 the chance for plural terms is Ν
260 Index of Prolog predicates
translatel(Ll,L2) 209 create a fully specified clause in LI, find the equivalent fully specified clause in L2, and give all possible expressions of both translate2(Ll,L2) 209 create a clause structure in LI, find the equivalent clause structure in L2, specify it as a fully specified clause, and give all possible expressions of the latter true_parse(L,A,B) 177,192 parse sentence A onto its fully specified underlying clause Β true_return(L,A,B,C) 200 true-parse input sentence A onto fully specified clause Β and true-return Β as sentence C true_return_clause(L,A,B) 200 express fully specified clause A and write it as a sentence on the screen true_translate(Ll,L2) 210 true-parse an input sentence in LI and true-translate this into L2 truth_pred(LA) 230 A is a truth-predicate verb_agreement(L,A,B) 28,103 relevant features of Person, Number, and Gender of the subject of predication A are copied onto the Tense operator, yielding predication Β voiceless(A) 145 A is a voiceless consonant vowel(A) 144f A is a vowel vowel(L,A) 154 A is a vowel in L vowelinitial(A) 154 A is a vowel-initial form write(A) 10 write A on the screen write_items(A) 44 write the members of a list of lists as sentences withhout brackets and commas writelist(A) 43 write the contents of list A (e.g., a sentence) without brackets and commas, and with spaces between members where needed
Index of names and subjects
accusativus cum infinitivo 120 adjectival predicates 47, 58, 64, 90 expression of 112f, 148f, 157f parsing of 178f adjustment in translation 207f of agreement 208, 218 of gender 208, 216 of mood 208, 217f of relative clause 207f, 216 of subject-object 208, 216f adpositional modifier 26, 36, 91 expression of 116, 147 parsing of 183 adverbial clauses 37 agent noun 73f agent noun expression 111, 155 Algol68 3 ambiguity 202 anaphora 28, 104f anaphorical term 52, 88, 104f expression of 116 answering questions 233 backtracking 5 Bakker, D. 2f BasFac 34, 39f beneficiary satellite 82 Bratko, I. 2, 5, 43, 235 Capel, C. 3 clause structure 19, 26, 98f, 170f formation of 25 translation of 205f clitic placement 142 Clocksin, W.F. 2, 5, 15 cognitive satellite 84 Colmerauer, A. 5 comparative formation 75f concession 84 condition 84 Connolly, J.H. 1, 3, 19, 235
converse relations 224f copula support 29, 107f core predication schema 80f Covington, M.A. 5 declarative programming 1, 5 degree adverbs 49, 59, 65, 78 derivational stem 145f, 165 Des Tombe, L. 2 determiner expression of 11 If, 147f, 156f parsing of 177f Dignum, F. 3, 223 diminutive expression of 155f formation of 74f direction satellite 82 discontinuous constituents 175f, 194f dummy noun 46 DutExp 34, 155f DutLex 34, 63f embedded predication/proposition 37, 49 formal expression of 131 EngExp 34, 109f EngLex 34, 45f entailment 215, 227f equi marking 29, 106 equi term expression 118 equivalence 21 If of adjectives 211 of basic predicates 219f of clause structures 216 of core predications 214 of extended predications 215 of full terms 212 of nuclei 214 of predicates 213f of propositions 215 of restrictors 21 If
262 Index of names and subjects
equivalence of satellites 213 of terms 211 expletive element 139 expression rules 19, 29, 109f, 147f compositional nature of 110 formal 29, 109f extended predication 85f, 94 extended predication schema 82 extraposition 139f final devoicing 166 flattening a list 43 FreExp 34, 147f FreLex 34, 57f full term expression of 117f, 158f parsing of 182 fully specified clause 19, 29, 69, 99f, 171f expression of 143 formal expression of 130f translation of 205f function expression 113, 149, 158 Functional Grammar If, 19f, 170f computational 3 Functional Logic 22 If gambling 39f Gatward, R. 3 gender 57, 64, 103f, 147f generator 19 Hengeveld, P.C 235 hypernonymy 226 hyponymous terms 226f hyponymy 223f idioms 48f parsing of 203 illocution, see: operators illocutionary logic 222
inferencing 221f, 231 infinitival construction 120, 125 expression of 130f, 144, 153, 164 parsing of 185 instrument satellite 82 instruments 46 integrating 173f, 189f interpreting 173f Janssen, Τ. 3 Johnson, S.R.
3
Kahrel, P. 236 knowledge base management 232 Korst, B. van der 3, 235f Kwee T.L. 3, 235f Lakoff, G.L. 73 lemma retrieval 187 lexical logic 222 lexicon 45f, 57f, 63f list 9 empty 9 Head and Tail of 9, 42 locative satellite 83 logic 22 If LPA-Prolog, see: Prolog Mackenzie, J.L. 45f, 236 manner satellite 81f matrix predicate 49,58, 65 meaning postulate 22, 51f, 59f, 66, 228f Meijs, W J . 3 Mellish, C.S. 2, 5, 15 nominalisation 120, 125 expression of 130f, 144 nominal predicates 45f, 57f, 63f parsing of 180 number expression 111, 147, 156
Index of names and subjects 263
object placement 141 operators 24f illocutionaiy 24f, 35, 98, 128, 162 predicate 24f, 35, 81, 160 operators predication 24f, 35, 83, 151, 161 propositional 24f, 35, 95, 127, 151, 162 term 26f, 35, 88 ordering template 29, 132f, 135 outputting sentences 44 Pl-constituent 136f Panini 2 paradigm 22f, 37f, 52f, 60f, 66f ParSel 34, 109f parser 31f, 169f capacity of 176 deep 3 If, 170f, 191f performance of 201f true 3 If, 170f, 192f parsing strategy 172f participle 126f, 152f, 163f formation of 75 parsing of 185f past 126, 152f, 163f present 126f, 152, 164 passes in parsing 192f passive voice 102 pattern matching 5f perfect aspect 83 expression of 127, 151, 161 placement rules 29f, 13 If polarity 83 expression of 127, 152, 162 pragmatic functions 30 predicate 19f arbitrary 71, 80 arguments of 21 form of 21 type of 21, 46
predicate formation 36, 72f predicate frame 19f basic 22, 45f, 57f, 63f derived 22 predicate logic 222 predication 24f core 24f extended 24f nuclear 24f predicational logic 222 predicational term 96f expression of 119f premodifier 115f procedural programming 1, 5 ProfGlot If, 31f capacities of 1 structure of 3 If progressive expression 126,151,160f Prolog 2f, 5f dialects of 2 Edinburgh 2 LPA 2, 6, 14f, 41, 235 Prolog constants 6 Prolog cut 17, 69 Prolog facts 5f Prolog list, see: list Prolog questions 5f Prolog rules 5f Prolog variables 6 pronoun anaphorical 88 demonstrative 89 interrogative 87 personal 86 possessive 115, 150f pronoun expression 117f proper noun 47 proper term 88 proper term expression 114 proposition 24f, 94f parsing of 192 propositional logic 222
264 Index of names and subjects
prepositional term 96f expression of 119f parsing of 182f pseudo-phonology 30, 144f quantifier
89
reason 84 recursion 26, 36f recursive definition lOf redundancy rule 69f reflexive 29, 105f relative clause 26f, 36f, 88f, 91, 93 restrictor 26f, 88f Samuelsdorff, P. 3 sandhi 29f, 133f, 153f satellite 24f, 35f, 95 expression of 121f, 151, 160 lexical 49f, 59, 65 parsing of 183f placement of 136f satellite-derived predicates 79 Schaaik, G. van 3 selection restriction 21 sentence and clause types 35 semantic function 21 separable compound verbs 65, 165 settings 35f, 39f specification 19, 28f, 99f spelling adjustment 167 standard of comparison 75f stop condition 11 subject-object assignment 28, 206f subject placement 141 subjunctive 59, 84, 97, 152f temporal satellite 83f tense 83 expression of 128f, 152f, 162f
lOlf,
term 26f, 85f basic 86f parsing 177f, 180f ρ red icational/proposi tion al 96f term expression 114f, 149f term formation 85f term insertion 92f term logic 222 term matching 196f term predicates 78 term structures 35 translation 33, 205f deep 33, 205f, 210f true 33, 205f, 210f see: adjustment, equivalence unification 5 UniExp 34, 109f UniGen 34, 69f UniLog 33, 22If UniPar 3 If, 169f UniTra 33, 205f variable, see: Prolog anonymous 7 verbal complex 124f parsing of 184f verb agreement 28, 103 verbal predicates 47f, 58, 64f voice expression 125f, 151, 160 Voogt-van Zutphen, H. 3, 65 Vossen, P. 3 Weigand, H. 3, 222f Westra, D. 3
m
Simon C. Dik
m
The Theory of Functional Grammar
m
Part I: The Structure of the Clause
m
1989. XIV, 433 pages. Paperback. ISBN 311 013263 X (Functional Grammar Series 9)
m
m m m m m
m m
m m m m m m
This work gives a new systematic treatment of the theory of Functional Grammar (FG) as it has developed out of the initial statement in the author's "Functional Grammar" (1978). It incorporates many theoretical ideas and empirical results which have been generated by a considerable number of scholars in the past decade. The theory of functional grammar is an attempt to develop a general account of linguistic organization in a way which reveals the interplay between the morpho-syntactic, the semantic, and the pragmatic aspects of languages. In doing so, it tries to live up to the standards of pragmatic, psychological, and typological adequacy. This first volume concentrates on the structure of the clause. It can be used as a basic textbook for introductory courses on functional grammar. A second volume, which is in preparation, will treat the properties of complex and derived construction types.
mouton de gruyter Berlin · New York
m
Erich Steiner
m A Functional Perspective on Language, Action, and m Interpretation m An Initial Approach with a View to Computational Modeling
m
m m m m m m m m m m
1991. X, 289 pages. Cloth. ISBN 3 11 012379 7 (Natural Language Processing 1)
In this research monograph, a functional theory of language is related to a theory of goal-directed action. This functional approach is combined with a cognitive perspective and discussed with a view to formalization and implementation. In contrast to many other studies in computational linguistics, the argumentation is combined with empirical work on dialogues stemming from interactions between children at play. In the first part, after outlining a theory of goal-directed action, situating it within an overall framework of "Systemic Linguistics", one of the functional schools of linguistics, a model of human text production is developed specifically for computational applications. In the second part, predictions concerning semantic complexity derived from the theory suggested in the first part are developed and tested. The linguistic model is the functional model used throughout, yet the specific version discussed here derives from an application of that same theory to machine translation. The author points out the potential for mutually rewarding interaction in linguistic models between specifically computational versions and versions developed originally for different purposes.
m m
mouton de gruyter
m
Berlin · New York