155 81 17MB
English Pages 383 [384] Year 1984
Sprache und Information
O
Sprache und Information Beiträge zur philologischen und linguistischen Datenverarbeitung, Informatik und Informationswissenschaft Herausgegeben von István Bátori, Walther von Hahn, Rainer Kuhlen, Winfried Lenders, Wolfgang Putschke, Hans Jochen Schneider, Harald Zimmermann Band 9
I
Magdalena
Zoeppritz
Syntax for German in the User Specialty Languages System
Max Niemeyer Verlag Tübingen 1984
Meinen Eltern gewidmet
CIP-Kurztitelaufnahme der Deutschen Bibliothek Zoeppritz, Magdalena: Syntax l'or G e r m a n in t h e user specialty languages system / Magdalena Zoeppritz. - Tübingen : Niemeyer, 1984. (Sprache und I n f o r m a t i o n ; Bd. 9) NE: G T ISBN 3-484-31909-7
ISSN 0722-298-X
© M a x N i e m e y e r Verlag T ü b i n g e n 1984 Alle R e c h t e v o r b e h a l t e n . O h n e a u s d r ü c k l i c h e G e n e h m i g u n g d e s V e r l a g e s ist e s nicht gestattet, dieses Buch oder Teile daraus auf p h o t o m e c h a n i s c h e m Wege zu vervielfältigen. P r i n t e d in G e r m a n y . D r u c k : W e i h e r t - D r u c k G m b H , D a r m s t a d t .
Acknowledgement The work reported here would not have been possible without the help and encouragement of my colleagues and the management of the Heidelberg Science Center of IBM Germany GmbH. Dr. Albrecht Blaser and Professor Dr. Hans-Jörg Schek set the task of describing the grammar and let me continue, even though it took much longer than both they and I had expected. Marlies Gauss, Ursula Mertens, Monika Reisch, Rosemarie S c h e m e r , and Regina Umland helped with the typing, whenever things got tight. Dr. Rober Herr, Michael Holz, and Dirk Bethe made it possible for me to use the computer after hours and Jacques Vaessen helped me cope with the printer. Klaus Horländer wrote the prompt, programs for vocabulary definition and listing of rules by feature. Ernst Christmann and Michael Barabas ran innumerable tests and helped pick out spurious analyses. Nikolaus Ott and Dr. Hubert Lehmann pointed out structures that were omitted or badly analyzed. Dr. Hubert Lehmann discussed many points of syntactic detail with me. Since Dr. Hubert Lehmann and Nikolaus Ott designed and implemented the semantics of the system, all points relating to the interpretation of structures created by the syntax were worked out in close cooperation with both of them. Dr. Hubert Lehmann read and commented on the passages relating to the interpretation in the description of the grammar, and gave me much good advice. Dr. Luis Sopena-Pastor used those parts of the description that were then written as the basis for writing a syntax for User Specialty Languages input in Spanish and pointed out inconsistencies, errors, and unclear wordings. All users who developed applications with the User Specialty Languages System pointed out gaps in the syntax. Sabine Schubert made available to me the grammar written bei Dr. Heide Kressin and herself for teaching German to foreign students. Professor Dr. Broder Carstensen taught me to respect linguistic data, PhDr. Petr Piïha taught me many things about coordination, and Professor Dr. Istvàn Bàtori introduced me to computational linguistics. Professors Dr. Walther von Hahn and Dr. István Bàtori supervised my thesis and helped me make the book presentable. To all these persons I feel deeply grateful. They are of course not responsible for any errors, omissions, and misrepresentations of their ideas.
vi
CONTENTS
1 Introduction
1
2 In defense of syntax
5
3 The User Specialty Languages System 3.1 3.2 3.3 3.4 3.5 3.6
Table 1: System structure Properties of the system Data structure and vocabulary definition Table 2: Data types and standard role-names Analysis and Interpretation Disambiguation in the interpretation process Description of the answer or translation Subject dependent and subject independent words
9 9 10 11 12 15 16 17 18
4 Preface to the grammar 20 4.1 History of the syntax and contributors to it 20 4.2 Coverage 20 4.3 Methodological considerations: Determining needed coverage 23 4.3.1 Rule formulation: language system and user language 23 4. 3. 2 Alternati ve: Simulation 25 4.4 The parser: User Language Generator 26 4.4.1 Morphology 27 4.4.2 Ambiguity 28 4.4.3 Rule formulation: Juxtaposition vs. constituency 30 4.5 Target structures 31 4.5.1 Overview of categories and features 33 Table 3: Categories (constructs) and features 33 4.6 Strategy of the analysis 37 Parse tree of noun phrases, prepositional phrases, and adverbials. . .40 Parse tree of main clauses and yes/no questions: main verb initial or second 41 Parse tree of main clauses and yes/no questions with auxiliary: main verb final 42 Parse tree of dependent clauses: main verb final 43 Parse tree of sentences in passive voice: main verb final 44 5 Concepts and notation 5.1 Notation 5.2 Concepts 5.3 Processor Extensions: IPE and FPE 5.4 Order of rule application
45 45 46 48 49
6 Symbols, categories and features defined for German 6.1 Primitives 6.2 Constructs 6.3 Features and their values
51 51 52 58
7 The grammar: Preliminary
85
8 System commands and strings 8.1 Nul Istrings and STOP
86 86
vii
8.2 System commands 8.3 Ordered rules, variable token 8.3.1 Names 8.3.2 Strings in quotes and numbers 8.4 System nouns 9 Morphology 9.1 Affixes Table 4: Meanings of inflected forms Table 5: Meanings of uninflected forms 9.1.1 Noun affixes 9.1.2 Adjective affixes 9.1.3 Verb affixes 9.2 Inflection of nouns and names 9.3 Inflection of nouns with adjective inflection 9.4 Inflection of adjectives 9.5 Inflection of verbs 9.6 Nouns with adjectives 9.6.1 Adjective as relation (LEV=0) 9.6.2 Adjective as part of relation name (LEV=1) 9.6.3 System defined adjective (LEV=3) 9.7 Nouns with numeric quantifiers 9.8 Overview of features of NOMEN Table 6: Overview of features of Nomen
87 89 90 91 93 95 95 96 . . . 90 99 99 100 100 105 105 105 108 110 113 115 115 116 117
10 Noun phrases (NP) 10.1 Interrogative pronouns 10.2 Nouns and names without article 10.2.1 Nouns without article, nominal inflection 10.2.2 Nouns without article, adjective inflection 10.2.3 Names without article 10.3 Articles Table 7: Gender case and number on QU 10.4 Nouns with article 10.4.1 Nominal inflection: WIEVIEL - WIEVIELE and LAUTER - NUR for inflected nouns 10.4.2 Nominal inflection: LAUTER - NUR with uninflected nouns . . . . 10.4.3 Nominal inflection: DER - EIN 10.4.4 Adjective inflection: WIEVIEL - WIEVIELE, LAUTER - NUR 10.4.5 Adjective inflection: DER - EIN 10.4.6 Names: WIEVIEL - WIEVIELE, LAUTER - NUR 10.4.7 Names: DER - EIN Table 8: Overview of features of NP from NOMEN or NAME
120 121 122 123 127 128 130 132 133 135 139 140 146 147 149 ISO 153
11 Sentence rules. . 11.1 Negation 11.2 Verb complex and sentence kernel rules: Overview Table 9: Overview of verb types Main verb initial and main verb second word order Main verb final word order Sentences with missing complements Sentences in passive voice 11.3 Verb complex and sentence kernel rules: Rules 11.4 Auxiliary verbs 11.4.1 Compound tenses: Perfect and pluperfect 11.4.2 Passive auxiliary 11.5 Verbs with separable prefixes
155 157 159 163 163 166 170 171 172 181 182 184 185
vi i i
12 Syntax for names
188
13 Genitive attributes 13.1 Genitive attributes following their head nouns 13.2 Genitive attributes preceding their head nouns 13.2.1 Nominal inflection 13.2.2 Adjective inflection 13.2.3 Names
191 191 196 199 201 202
14 Appositions
205
15 Adjectives and comparatives 15.1 Comparative operators 15.2 Comparative phrases 15.3 Noun followed by comparative 15.4 Verb complex from VC:TYP=NN and adjective 15.5 Adjectives with complements 15.6 Quantifier modifiers and comparatives
210 210 211 214 215 217 218
16 Prepositional phrase syntax 16.1 Prepositions Table 10: Locative and temporal features of prepositions 16.2 Prepositional phrases 16.3 Nouns with prepositional complements 16.4 Verbs with prepositional complements
223 223 224 225 227 228
17 Measure expressions
235
18 Syntax of locative? 18.1 Locative adverbs 18.2 Locative adverbial s 18.3 Nouns with locative complements 18.4 Verbs with locative complements
240 240 240 244 245
19 Syntax of temporals Table 11: Temporal expressions Table 12: Constructs and features from temporal expressions . . . . 19.1 Dates 19.2 Temporal adverbs and names 19.3 Temporal adverbials 19.4 Nouns with temporal complements 19.5 Verbs with temporal complements 19.6 Points left open
248 249 256 258 260 262 265 267 268
20 Relative clause syntax 20.1 Relative pronouns 20.2 Sentences with main verb final word order 20.2.1 Non-relative clauses 20.2.2 Relative clauses 20.2.3 Verb and complements 20.2.4 Adverbials 20.3 Participial phrases 20.4 Nouns modified by relative clauses 20.5 Extraposition
270 270 271 272 273 275 281 285 286 290
21 Arithmetic Arithmetic operators
294 294
ix
Numbers from noun phrases Arithmetic expressions Nounphrases from numbers Variable definition Set operations Arguments for set operations Functions Simple numerals Compound numerals
294 295 297 298 299 300 300 300 301
22 Less common verb types 22.1 Verbs with genitive objects 22.1.1 Main verb initial and main verb second word order 22.1.2 Main verb final word order 22.2 Verbs with two accusatives 22.2.1 Main verb initial and main verb second word order 22.2.2 Main verb final word order
303 303 303 304 306 306 306
23 Coordination 23.1 Excursion: Interpretation of coordinated structures 23.2 Syntax of coordinated structures 23.3 Parallelism and balance from a structural point of view 23.4 Cordinating conjunctions 23.5 Coordination of adjectives 23.6 Coordination of nouns, numbers, and names 23.7 Coordination of noun phrases 23.8 Coordination of prepositional phrases and adverbials 23.9 Coordination of verbs and sentences
308 308 310 311 316 316 317 319 325 326
24 Subordination of clauses.
329
25 Pronouns Table 13: Features of NP from pronouns 25.1 Verbs with pronominal complements
330 331 332
26 Declarative sentences
336
27 Sentences with missing complements 27.1 Pattern of rules for sentences with missing complements 27.2 Remaining rules and examples:
343 344 346
28 Sentences in passive voice 28.1 Pattern of rules for passive voice 28.2 Rules and examples
351 352 353
29 Conclusion
357
30 References
358
31 Subject index
366
32 Index of names
372
χ
TABLES
Table 1:
System structure
Table 2:
Data types and standard role-names
Table 3:
Categories (constructs) and features
Table 4:
Meanings of inflected forms
96
Table 5:
Meanings of uninflected forms
98
Table 6:
Overview of features of NOMEN
117
Table 7:
Gender case and number on QU
132
Table 8:
Overview of features of NP from NOMEN or NAME . . . .
153
Table 9:
Overview of verb types
163
Table 10: Locative and temporal features of prepositions. Table 11: Temporal expressions
9 11 . . . 33
. . . 224 249
Table 12: Constructs and features from temporal expressions . . 256 Table 13: Features of NP from pronouns
331
1 INTRODUCTION
The grammar presented here was designed and implemented for data communication in restricted German as part of the User Specialty Languages System. The book documents the syntax for this system. At the same time, the rules constitute a coherent formal description of a fragment of German syntax. The User relational and does sense and
Specialty Languages System is a natural language interface to a data base. The system is not designed for a specific mini-world not use knowledge about a specific application, as given by word application context.
Instead, the system uses a framework for the evaluation of sentences that has come into linguistics through Montague (1970), but has predecessors in syntax directed compiling techniques (Irons I960;1983): Rules analyzing syntactic constructions are each associated with an interpretation function that provides the semantic interpretation for the construction. The interpretation functions interpret sentences in terms of a model of the application world as given by the structuring of the data in the data base: the values, concepts and relationships represented. Of the two components of the evaluation, syntactic rules and interpretation functions, only the rules are discussed here, the interpretation functions will be discussed separately in Lehmann (forthcoming). In a framework like this, the syntax plays an important role in recognizing the functional relationships among the constituents in the sentence entered, because they determine the appropriate interpretation functions and their operands. The body of this book describes the rules proposed for the syntax and the structures they do (or do not) handle. That means that most of the presentation deals with detail, because this is where the problems lie, once the general strategy of the analysis and the target structures have been chosen. Target structures of analysis grammars depend on the one hand on what is believed to be the most useful division of labor between the syntactic and the semantic component of the analysis, i. e. how much of the structural meaning shall be reflected in the parse trees for the interpretation to work on. On the other hand they depend on how much of the meaning of sentences shall be accounted for by the system. This is, of course, related to the purpose of the system as a whole, text analysis or question answering, information retrieval or fact retrieval, data base query or dialogue simulation, etc. The framework of this syntax requires a certain level of detail in the structuring of the trees to be generated by the grammar and of differentiation in the labelling of the nodes, and consequently a certain amount of categorization and subcategorization in the target structures. Strategy of the analysis means the direction of parsing and application, as enforced by the formulation of rules, not parsing algorithm. The stategy depends on how the language tured and on the target structures envisaged, both seen in properties of the parser.
sequence of rule as given by the itself is strucrelation to the
2
While different strategies may be useful with different parsers, and target structures can be evaluated with respect to what the system shall do, the actual analysis involves the type of computational linguistic questions that Bàtori (1977:7) describes as not being asked in linguistics itself: Mit anderen Worten, in der Linguistik werden nicht Fragen gestellt (und beantwortet): wie erkennt man das Ende einer Nominalphrase (oder einer sonstigen Einheit)? oder in welcher Reihenfolge soll die Zuordnung von Relativsätzen (oder sonstigen Bestimmungen) erfolgen, wenn mehrere Möglichkeiten offenstehen. In der Linguistik werden anstatt dessen Fragen gestellt: was ist die Struktur einer Nominalphrase (oder sonstige Konstituente), welche Selektionsbeschränkungen gelten zwischen Verben und Nomi nalgruppen usw. Solutions proposed concerning constituent boundaries and substrategies to the general stategy are largely independent of both target structures and general strategy, but draw upon the structural properties of the language in question: For all grammars analyzing German that shall recognize genitive attributes, there is the problem of finding criteria in the structuring of the language that make recognition possible. This is the aspect of language processing that is emphasized here. For syntaxes of comparable size, target structures and the strategy of analysis have been described for PLIDIS at the Institut fUr deutsche Sprache in Kolvenbach (1979), and Berry-Rogghe (1980), the latter includes a listing of the rules and a description of the meaning of nodes and operations. Detailed proposals for morphological analysis are found in the publications of the precursor project (Arbeitsgruppe MasA 1974). For CONDOR, developed by Siemens (Taeuber 1978), homograph reduction is found in Hal 1er (1978) and a draft grammar for structural and content analysis is found in Billmeier (1980). Both PLIDIS and the analysis proposed for CONDOR use ATN-grammars. Another ATN analysis has been proposed by Koch (1979) as part of a question-answering system at the Zentral institut für Kybernetik und Informationsprozesse in Berlin, GDR. Predictive analysis is used in SATAN, developed at Saarbrücken University, different aspects of the analysis are described in working papers (Linguistische Arbeiten des SFB 100), the strategy and details of the morphological analysis are found in Eggers (1969), which also contains an inventory and classification of homographs in German. More information on the syntactic and semantic disambiguation of SATAN is found in SALEM (Sonderforschungsbereich 100, 1980). Aspects of the analysis have also been described for HAM-RPM at Hamburg University in a series of working papers focussing on the interpretation (von Hahn 1980 and HAM-RPM 1978-1981). The general strategy and examples of specific structures have been described for all of these systems. Billmeier (1980) and particularly Koch (1979) go further in discussing and motivating proposed solutions to problems in the context of their analyses. In describing the syntax for the User Specialty Languages System as a whole, two purposes shall be served: First, it shall be possible not only to evaluate the proposed solution schemata, but also their formal integration in the grammar, and second, it shall be possible to evaluate the adequacy of rules
3 with respect to the data that they validity of the data selected.
are intended
to account
for, and
the
A full description can also show the extent to which what is proposed for a given structure has been actually incorporated into the rules, and of the place of individual structures in the system of the grammar. A language is, after all, not a collection of structures. Neither is it so uniform that the analysis of individual structures follows systematically from chosen strategies and substrategies. Formalization involves different degrees of reduction of the observable subtleties of language. There are omissions, restrictions, compromises and, in this particular grammar, assumptions about the type of structure to be expected in the context of data base query. This reduction does not only concern questions of structures to be left out entirely, like e. g. adverbial clauses, but it affects the details of the analysis of structures that are included. To vary the quotation from Bàtori (1977), the question is not only whether relative clauses are included, and whether the implementors are aware of the structural variety of relative clauses, but also how much of their variety is reflected in the rules and what assumptions are made about the occurrence and structure of relative clauses in the framework of the grammmar. The loss of subtlety that goes with formalization is to some extent counterbalanced by a gain in explicitness and coherence, particularly if there is a computer program that accepts the formalism so that rules can be tested. For any grammatical description that is not confined to a specific aspect of language or a single set of structures, there is the problem of whether the parts fit together and how they interact. That coherence for grammars beyond a certain size cannot be merely asserted, was pointed out already in the context of the transformational grammar written for English at the University of California at Los Angeles (Stockwell 1973), and initiated the development of the testing programs for transformational grammars described in Friedman (1971). The interplay between grammar writing and computer testing is also the subject of Machovà (1978). However, while computer testing makes it easier to keep control over the working of rules and their side-effects, and in that sense also permits testing of coherence, testing itself presents the problem of chosing a battery of sentences and phrases for testing systematically. Though such sentences are intended to represent the area of language to be covered, the decision as to what a given sentence represents is influenced by the overall design of the grammar. To give an example: Assuming that a systematic test of the rules for three-place verbs with dative and accusative objects should include instances of all permutations acceptable in German for such verbs: Is it enough to have one instance of each permutation, or is it necessary to also vary the internal structure of the noun phrases used, thereby multiplying the number of sentences for each permutation? The decision depends not only on the facts of the language (e. g. accusative preposing for pronouns), but also on the importance of these facts with respect to the overall design of the grammar (how important is it in a given framework to recognize preposed accusatives unambiguously as accusatives?). Systematic testing is necessary to make sure that test results are not accidental but indeed represent the behavior of the grammar for a given class of cases, but in order for testing to be effective, both the overall design and the design of individual parts of the grammar must allow reliable predictions as to the interaction and side-effects of rules. But these predictions also influence the choice of representative test cases.
4 The book has the following parts: The next section is an excursion, it contains a plea for syntactic analysis in the form of a polemic against certain attitudes in natural language processing of which a disregard for syntax is only one of the symptoms. Then an overview of the User Specialty Languages System establishes the context for which this syntax was written. This is followed by a preface to the grammar, with its development over time and the people that have contributed to it. The preface includes sections on the coverage of the system, the parser, the target structures with an overview of the categories and features used, and the general strategy of the analysis. The actual description begins with the section on Concepts and Notation containing the rule formats and notational conventions and including an alphabetical reference list and explanation of the categories and features used in the grammar. Then follows a brief section containing some system oriented matter but also the rules recognizing names and numbers. The remaining sections each contain a brief outline of the grammatical facts with respect to the structures to be covered in the section, followed by a rule-by-rule description of how these facts are accounted for. Where necessary, a class of structures is exemplified by an overview over the expressions that are considered to fall within this class. For the rules, the features assigned and tested are justified and alternative formulations discussed. Where several rules follow the same pattern, that pattern is discussed first, differences are described with the individual rules. Examples of structures analyzed, or not analyzed, by a given rule illustrate its scope. Readers who want a general outline are referred to the overview in the preface to the grammar, and to the beginnings of sections and subsections. For readers who are interested only in specific questions, e. g. how does the grammar handle temporals? it is suggested to look for a corresponding section in the table of contents and begin with that section, using, if necessary, the crossreferences to other sections and the description of categories and features in the alphabetical reference list.
5 2 IN DEFENSE OF SYNTAX
In view of the fact that human question answering requires knowledge other than recognizing the subject of a sentence, and that knowledge about the world, beliefs, and expectations makes humans understand garbled and distorted sentences, ellipsis, and the style of telegrams and headlines led to the assumption that syntactic analysis could be dispensed with, provided the semantics and pragmatics of the application world were properly understood (the idea is associated with Schank, and has been widely accepted, see the survey in Waltz 1981. It has been challenged independently by Kac (1982), for reasons similar to the ones given here). An example like the following one seems to support the idea: Peter verkauft Apfel Apples are usually sold by people, not the other way round, so what does it contribute to the understanding of the sentence to know that Peter is the subject, apples are the object, as evidenced by the position of the nounphrases relative to the verb and number agreement between subject and verb? Maybe it is sufficient to differentiate nomináis and verbals and leave the rest to semantics? But compare: Der Apfel verkauft Peter The statement is quite clear and unambiguous but what it states contradicts common sense beliefs. We may question the speakers sanity, but we cannot question the interpretation that the syntactic form of this statement forces us to make. If the syntactic marking is not as clear, as in the following examples, Apfel verkaufen Peter Peter verkaufen Apfel one could assume insufficient command of the language and take the common sense interpretation, particularly since 3rd person plural and infinitive have the same form and the infinitive is believed to be a characteristic of 'foreigner talk1 (cf. Ferguson 1971). Back to the examples at the beginning: Peter verkauft Apfel Der Apfel verkauft Peter In the first example, common sense beliefs and syntactic form are in agreement. To conclude from the redundancy of the syntactic information that it can be dispensed with is wrong, as the second example shows. Where syntax and common sense beliefs conflict, the syntactic form determines the interpretation. That statements in contradiction to common beliefs should require stronger marking than those that conform to them is not really surprising, given normal human inertia. But given the fact that the syntax of utterances determines their interpretation to the extent that meaning can be conveyed even against the hearer's expectations, it seems wrong to ignore it.
6 Even a rough syntactic analysis is not sufficient because it is liable to miss the more subtle aspects of syntactic marking that cooperate to convey very specific meanings. I am not disputing the fact that beliefs and expectations guide the interpretation of elliptic or garbled utterances. But I am questioning the usefulness of systems that treat all input as potentially garbled in other than research environments. The argumentation in Waltz (1981) is typical of a confusion of interests that is not Waltz's alone, but can be found in other sources as well: The interest in the cognitive processes underlying human dialogue as a research interest is stated in terms of an interest in constructing systems for actual use. Other phenomena were noted during the 70's that we still do not know how to deal with well. These include processing language which falls outside of a narrow domain, handling dialogue (which can itself be the topic of conversation in any dialogue), insuring rapid response, and meeting other "human factors" requirements (such as presenting information to a user about the system's abilities and limitations, etc.). (Waltz 1981:15). While both the study of cognitive processes through systems modelling human language behavior and the study of natural language for data base query systems are legitimate scientific pursuits, there is reason to believe that these two goals are not identical. (Cf. Section 4.3.2) This point shall be illustrated by examples from Waltz in PLANES (Waltz 1978) is described as follows:
(1981).
The analysis
PLANES uses a "semantic grammar", that is, it has ATN parsers for every kind of phrase that can occur in its world: time phrases, phrases referring to aircraft, places, etc. The goal of the parsing phase of PLANES is a set of semantic constituents; so for example, the sentence, "Which planes had 10 or more flights during January 1970?" yields the semantic constituents "Which plane"; "greater than ten flights"; and "January 1970". The goal of the front end is to take these constituents and fit them into a "query template". The query template may not be filled in completely by the information that was given in the English sentence and in this case, PLANES looks back through the dialogue to locate and fill in the missing item. (Waltz 1981:19) This sounds quite reasonable, but what is it that actually happens? PLANES handles speech acts by matching all the speech act words it can find and then ignoring them; PLANES always assumes that the user is requesting information. (Waltz 1981:20). If the system does not state that it ing that
ignores such phrases, it is not surpris-
In testing, we have found that users often ask vague and complex questions. For example, users ask for reports such as in the following: "Give me a month by month status report for F-4's." The system is simply incapable of deciding easily what "status report" means in this example. (Waltz 1981:20).
7 Also the system ... can deal with unforeseen requests because it is able to use the query template to do expectation-driven analysis of such requests. (Waltz 1981:20) Again it is not surprising that in tests with casual users 22 of the difficult errors were due to the query generator, where the heuristics led PLANES to answer a question different from what the user had intended. For example, when asked "How many planes had NOR hours in during the time period?", it counted all the planes that had an NOR hours field instead of counting only planes that had greater than 0 NOR hours. (Waltz 1981:22). As a model of human behavior, even though Waltz does not claim psychological reality for the internals of PLANES, the performance of the system is quite realistic. Humans partially ignore what they hear and misunderstand what does not fit their expectations. Systems for actual use must be both reliable in their reactions and transparent as to their limitations. Even the most tolerant system for practical use must take the user at his or her word, and heuristics can be used to complement but not in lieu of both general and detailed study of the semantics and pragmatics of questions and answers. The interpretation of null values in a data base is not for a heuristic to decide. A look outside the mini-world would have shown that there is a small set of meanings of null values: value zero, no data, or undefined. Which one of these meanings applies to the null values in a given data field depends on the concepts represented in each field and may be different for different data fields in the same application. Consequently, the system must be prepared in principle to interpret null values in all three different ways, depending on how the data are defined, while the specific meaning of the null values in a given field is part of data definition by the user or the application developer who sets up the data base. This calls for an appropriate division of labor between user and system. The system should not presume to know better what its users mean than the users themselves. Nevertheless, systems that are based on sound syntactic and semantic analysis, but without conversational capabilities, appear clumsy and sometimes unnecessarily pedantic. To improve the situation for natural language data base query, research must uncover much more detail on how meaning is actually conveyed, to avoid hasty inferences. In the area of semantics, the spatial semantics Waltz reports on, or the heuristic hint (Berry-Rogghe 1978) appear promising. In the area of syntax, one could think of a system that would use a second pass for utterances that are not understood by a tightly formulated syntax. The second pass could replace specific rules by other rules with a different formulation. Something like this looks possible with 'controlled partitioned grammars 1 (Mhckstein 1975). In the case of unexpected utterances with markings that are ambiguous with respect to certain varieties of the language, the system could be made to recognize such ambiguities. But all this must be based on a detailed study of the forms of the language in question that makes it possible to distinguish between erroneous and unexpected utterances. A system that ignores clearly stated meanings cannot possibly be useful.
8 It is true that the referent of they in the Winograd (1972) example (quoted after Waltz 1981) cannot be decided on syntactic or purely semantic grounds: The city councilmen refused to give the women a permit to march because (a) they feared violence. (b) they advocated revolution. In (a) they seems to refer to the city councilmen, whereas in (b) they refers to the women. How do we judge this? The structures of the two sentences are identical, so they cannot help us to distinguish the two cases. The only answer seems to be that we know a great deal about human behavior, and can readily access and apply this knowledge when it is needed in understanding language. (Waltz 1981:16) But this is not necessarily the type of knowledge that data base query systems should have. In actually ambiguous cases, such systems could ask the user. Or should not councilmen be allowed to change their minds without having to have their information systems reprogrammed?
9 3 THE USER SPECIALTY LANGUAGES SYSTEM
The following brief description of the User Specialty Language System outlines the context for which this syntax has been written. This section begins with an overview of the properties of the system that is followed by more detail in separate subsections. Unlike most natural language systems so far, this system has been evaluated in applications, References to the evaluation studies are found at the end of the overview (Section 3.1).
TABLE 1: SYSTEM STRUCTURE User
10 3.1 PROPERTIES OF THE SYSTEM The User Specialty Languages System is designed as an application independent natural language interface to a relational data base for data query, analysis, and manipulation including data entry. The underlying data base management system now is System R (Chamberlin 1981), earlier versions of the system interfaced to the Peterlee Relational Test Vehicle (Todd 1975). The User Specialty Languages System analyzes and interprets natural language sentences. From the interpretation, the system generates queries which specify the appropriate answer in the formal data base query language. This language was Information Systems Base Language (ISBL) for the Peterlee Relational Test Vehicle (PRTV), it now is Structured Query Language (SQL) for communication with System R. The project started out as a joint effort to construct interfaces for several natural languages using the same technology. The basic concepts of the technology derive from the Rapidly Extensible Language (REL) System of Thompson (1969). While both the present REL System (cf. Henisz-Thompson 1978) and the system here differ in important respects from one another and from the technology of REL in 1969, two basic principles have remained. One is the parsing philosophy of the Kay parser (Kay 1967, the actual parser used is User Language Generator, see Section 4.4, with considerable extensions and modifications over the Kay parser), the other is the way in which application independence is achieved. The parsing philosophy assumes that syntax and parser are separate entities: The set of rules defining a given language is used as data by a general parsing algorithm to compare input sentences against the well-formedness conditions specified in the rules and to construct the specified nodes if these conditions are met. All analyses allowed by the rules are constructed in parallel (for detail, see Section 4.4). Application independence in the present system rests on a model independent interpretation that evaluates sentences with respect to the application world as modelled in the data base (cf. Lehmann 1978c and more recently Scha 1983). As in REL, this depends on distinguishing in natural language between features that have the same form and function irrespective of subject matter and those that do not. Among the first are syntactic constructions and structural words and their interpretations, among the latter are most nouns, verbs, adjectives, and proper names, which are interpreted with respect to the world model (for detail, see Section 3.6). The nouns, verbs, and adjectives that users need for their application are entered by the users themselves via a prompting routine. Each of these words addresses a relation in the data base. For these relations, the User Specialty Languages System assumes a view of data that is close to natural language. The columns of these relations each have one of five data types and one of a set of standard role-names listed in Table 2. The data types indicate the kind of values found in the columns, with consequences for arithmetic operations and how many questions. The role-names correspond to the way in which the words addressed by the relations are used in the user's application jargon. This, in turn, reflects how the underlying concepts are interrelated. In the syntax, features representing the role-names function like (surface) case frames or valence indicators (for detail, see Section 3.2). Having the relations for natural language access stored physically in the data
11 base would in large data bases lead to duplication of information, update problems, and problems with application programs. Therefore the relations are defined as views on data base relations with a different structure. The translation from queries against views to queries against the base relations is done by the viewoptimizer program (Ott 1982). The views, then, mediate between the linguistic structure and the structure of the data base. The fact that the system does not rely on a specific data base and specific vocabulary makes its generalizations applications independent, not just in principle, but in actual fact. Contrary to claims in Fauser (1981), this distinguishes the User Specialty Languages System from systems like PLIDIS (Berry-Rogghe 1980). Though solutions to problems of natural language analysis in these systems are sought for the general case, not for a single application domain, the interpretation uses information that has to be obtained separately for each data base and that may not necessarily be within the power of unsophisticated users to supply consistently and in accordance with the system's understanding of the information. With this information, however, the analysis can in principle be more subtle and the responses more natural than they can be within the framework used here. On the other hand, recourse to information about the application can lead to putting too much of a burden on lexicon and knowledge base. Rather than using domain specific information, the attempt has been made in the User Specialty Languages System to find general solutions on the syntactic and semantic level, that are equivalent in result to solutions possible with domain specific information. In order to test the system, and also to understand more about natural language communication with a data base, a subject about which there is much argument, but very little factual information, the system was offered to users who brought their own data and applications to the system and let us study how they used it. The German version of the User Specialty Languages System has been used in several applications, the English version is now under study at New York University. Reports on the results of the user studies are found in Lehmann (1978b), Krause (1979), Ott (1979), Kettler (1981), and Krause (1982). The results have been encouraging so far. User experience was the basis of many of the extensions and revisions of both syntax and interpretation. Since the studies with users are described in detail in the references cited above, methodology and findings of the studies shall not be repeated here. The evaluation of systems with users working on their own projects provides valuable insight both into user requirements as regards natural language query systems (cf. Zoeppritz 1983) and into language behavior in a new communication situation.
3.2 DATA STRUCTURE AND VOCABULARY
DEFINITION
As outlined above, the User Specialty Languages System assumes a view of data in which each noun, verb, or adjective addresses a relation. These relations are normally defined as virtual relations, or views, on the user's base
TABLE 2: DATA TYPES AND STANDARD ROLE-NAMES ι type Name
Values
Example
W C Ζ Q D
Proper name Coded name Measure/amount Counted number Time specification
Employee Meier Inventory number 4326 Salary of 2345 Population figure 2345 Meeting on June 15
WORD CODE NUMBER QUANTITY DATE
•-name Role
Values in the column
NOM
Nominative
Values addressed by the subjects of verbs, set having the property denoted by nouns or adjectives
GEN DAT ACC
Geni tive Dative Accusative
Values addressed by oblique cases governed by verbs and adjectives
AN,MIT.. . Prepositions TS TG TP TD LS LG LP LV LD
Temporal Temporal Temporal Temporal Locative Locative Locative Locative Locative
MOD
Modality
Governed prepositions
Source Time data denoting starting point end point Goal Point point in time Duration time span Source Locations denoting origin Goal goal place Place points on a route Path Di stance distances Values for degree of membership in the set denoted by an adjective
relations. The columns in these virtual relations or views have standard role-names that correspond to the cases, prepositions, or adverbials that are governed by the word as used in the application. So the views can be regarded as templates or, suface oriented, case frames. The tuples in these relations then can be regarded as assertions about the data base world (cf. also Scha 1983). A role-names is defined for each column in a relation, together with the data type of the values in the column. Data types and role-names are summarized in Table 2. The relationship between base relations and views is straightforward in most cases: Given a relation PER of a personnel data base with the following columns:
13 PER: Personalnuminer,Name, Abteilung, ... The views can be defined as follows: Name: WNOMName,CVON Mi tarbe i ter Abtei lung:CNOM_Abteilung,CVON_Mitarbeiter Arbeiten:CNOM_Mitarbeiter,CFURBEI_Abteilung The surface orientation is needed if users are to define their own vocabulary and relations. The loss in flexibility that a purely surface oriented definition could cause is counteracted in two ways: Firstly, more than one role-name can be defined for a given column, so that e. g. the verb arbeiten can be defined as both arbeiten bei Abtei 1ung and arbeiten für Abtei 1ung and then be used with either preposition. Secondly, several systematic correspondences are reflected in syntax and interpretation, so, for instances, correspondences between genitives, possessive pronouns, and the verb haben 'to have', or between dative and prepositional phrases with the preposition an, and between the different prepositions introducing adverbials of the same type. Details on data types and role-names are found in Lehmann (1977 and forthcoming), Quantity was introduced in Zoeppritz (1979), criticism of Zoeppritz (1979) is found in Krause (1982). The words that shall address the relations are defined via a prompting routine. For the design of this routine it was important to find a way to elicit, the necessary information reliably and consistently from the user, without falling into the trap of asking systematic-1inguistic questions and getting and accepting ontological answers: The responses that are elicited should reflect language use rather than beliefs as to the properties of things: to run is something animate bodies do, machines are inanimate, but machines also 'run'. Of course, this is not to say that there are no correspondences, but there are important differences, of which the quasi metaphoric use of run for the functioning of machines is one of the more obvious cases. Similar, and specific to languages with gender like German, is neuter gender for diminutives, whether they denote females or males: das Mädchen, das Brüderchen. Reference to such nouns can be by grammatical gender or by natural gender. In designing the prompting routine for vocabulary definition, the attempt has been made to ask for the information needed in terms of language use by asking for the evaluation of examples using the word in question, rather than for statements about the word. The following examples shall illustrate the type of question asked. iate steps have been omitted and the layout changed to save space.
Intermed-
If users are defining a noun, they are asked for the singular form and for the article (yielding the features masculine, feminine, or neuter). If the shape of the noun indicates that it could be a member of the class that takes adjective endings, for instance Bevttlkerungsdichte, the next prompt is as fol lows: Sind folgende Formen korrekt? Der Bevölkerungsdichte, ein Bevölkerungsdichter Depending on the answer, the corresponding feature is set.
14
If the user says that he has time data, he is asked: In welcher Form fragen Sie nach der Zeitangabe? 1 Wann 2 3 4 0
?
Von wann,ab wann,seit wann Bis wann ? wie lange ? falls keiner der Fälle zutrifft
?
Geben Sie die zugehörige Nummer ein, Kombinationen sind möglich. Similarly for locational
data:
In welcher Form fragen Sie nach der Ortsangabe? 1 Wo
?
2 Von wo, woher ? 3 Nach wo, wohi η ? 0 Falls keiner der Fälle zutrifft Geben Sie die zugehörige Nummer ein. Kombinationen sind möglich. For governed prepositions, prepositions.
a list
is
offered for
chosing the
appropriate
If a verb is being defined, the verb type is elicited by showing the verb in sentence context. The following questions appear if the verb is auftreten and the prepositions are bei und mit: Sie haben Präpositionen angegeben, Für diesen Fall stehen folgende Verbtypen zur Wahl: TYP NPZ TYP NAP TYP NDP TYP=NDAP TYP=NPP
Wer Wer Wer Wer Wer
tritt tritt tritt tritt tritt
bei/mit wem auf? was bei/mit wem auf? wem bei/mit wem auf? wem was bei/mit wem auf? bei wem mit wem auf?
Geben Sie die Typenbezeichnung ein. Additional questions prompt for different forms, in case there are several stems, and for the name of the relation to be addressed. The rules are then written by the routine: brother> would be spurious with respect to some representations (also the one here) but not to others.
30
Consequently, restricting spurious syntactic ambiguity involves three different, but related concerns: subcategorizing sufficiently to exclude non-constituents, avoiding artefacts of rule formulation that do not reflect structural differences in the language, and neutralizing differences in the language that are not also differentiated in the semantic representation. Subcategorization to distinguish between constituents and non-constituents is the subject of the next section.
4.4.3 RULE FORMULATION: JUXTAPOSITION VS. CONSTITUENCY
Many rules of the grammar are restricted to apply only to acceptable sequences. In the context of language analysis, it may seem odd if rules are restricted to exclude unnacceptable sequences. If the sequences are indeed unacceptable, they will not occur, so they need not be specifically excluded. On the other hand, such restrictions may seem dangerously close to censoring the users' language. Though it is an unfortunate side-effect of several of these restrictions that less than native-like command of German will often not be understood by the system (e. g. adjective endings), this effect is not the purpose of the restrictions. If one thinks of unacceptable sequences, one tends to think in terms of sequences within a phrase or constituent. There, indeed, it would be unnecessary to specifically exclude those that are unacceptable. In other words, the sequences are unacceptable as members of the same constituent. However, such sequences can occur on both sides of constituent boundaries. Therefore the presence of an unacceptable sequence that is not caused by a grammatical error indicates that the elements of the sequence are not part of the same constituent, even if they belong to categories that form constituents. This makes it possible to distinguish between juxtaposed and constituent elements. By checking for unacceptable sequences before applying a rule that creates a construct, spurious ambiguities can be avoided that might result from creating constituents from juxtaposed elements and across constituent boundaries. Another reason for excluding unacceptable sequences is that an inflected language like German relies on inflection to convey meaning. This is most obvious in the case of nominal complements of the verb. In the absence of case marking, the position of the complements with respect to the verb indicates their function (cf. Zoeppritz 1976a for discussion). With case marking, the order is relatively free. So excluding unacceptable sequences of verb and complement can reduce spurious ambiguity that would result from not taking into account that function can be indicated by position. However, position marking and an awareness of the loss of inflection is not y e t as much part of the language system of all speakers of German as is free order of complements with reliance on case marking. Unmarked free word order sentences are still written, but they create problems for the reader. This can be observed when one finds oneself rereading sentences, because the interpretation of the first part does not make sense with the way the sentence continues. The following example is particularly illustrative because of the evidence contained not only in the example itself but also in the commentary from the reader's point of view. Note that the dative is specially marked in the commentary: 'an den Bundespräsidenten' .
31 'Die Hessisch-Niedersächsische Nachricht:
Allgemeine Zeitung
brachte folgende
Fussbai 1trainer Dettmar Cramer hat Bundespräsident Carstens in Anerkennung seiner Verdienste mit dem Bundesverdienstkreuz ausgezeichnet. Leider ging aus der Meldung nicht hervor, wann die ungewöhnliche Auszeichnung an den Bundespräsidenten Ubergeben wurde, ob es sich um ausgesprochen fussballerische Verdienste handelte, und ob Cramer Uberhaupt berechtigt ist, Orden zu verleihen. 1 (HÖR ZU No. 38, September 9, 1980, p. 3). The reader's commentary shows that, on first reading, he read the initial nominal as the subject of the sentence and that he regards the sentence as odd. These observations justify taking the reader's point of view in a grammar for analysis and using the position of unmarked noun phrases as an indicaton of their function in the sentence. However, position marking can be used to full advantage only if the rules are formulated tightly enough so that such odd sequences do not become constituents.
4.5 TARGET STRUCTURES In a system like User Specialty Languages, in which meaning is not inferred from the word sense of application dependent words, the meaning that is conveyed by means other than word sense, supporting or modifying word senses in the language, becomes particularly important. Apart from the meaning of prepositions, quantifiers, copula verbs, etc., the structural properties of sentences and the functional dependencies among sentence elements that are indicated by syntactic structures carry most of the meaning of sentences. This determines the target structures aimed for. The parse tree shall reflect the functional relationships that obtain in the sentence as specifically as possible, limited possibly by meaning differences that do not change the content of the query. As a consequence, several trees should result where the structure of the sentence does not allow unambiguous recognition of dependencies. On the other hand, there should not be more trees created for a given sentence than correspond to meaningful analyses of the class of sentences that the specific sentence represents. This is facilitated by the syntacto-semantic information contained in the valence features. These features are obtained from the user to match the shape of the relation addressed and reflect the the valence of a word in a given application context. In this respect the syntax here differs from those in PLIDIS and CONDOR, who both aim at a single parse (see Krause 1982, for a comparison of PLIDIS and User Specialty Languages, including the question of number of parses) and leave decisions as to type and governor of prepositional phrases and adverbials to the interpretation (introduction of valence is planned for CONDOR). The several parses resulting from syntactic analysis are disambiguated (cf. Section 3.4), but in most cases not rearranged, in the course of the interpretation (an exception are adverbials and prepositional phrases with haben and sein). In addition to
the functional dependencies, categorial
information is impor·
32 tant, where the categories have consequences for the interpretation. The interpretation of a given structure is specified in the rules analyzing that structure by giving the name of the interpretation routine. Therefore syntactically similar structures are distinguished by the grammar if they have different meanings: Prepositional phrases are analyzed as ABL, ABT, or PP, depending on whether the phrases function as locative, temporal, or prepositional complements. Conversely, adverbs, prepositional phrases, and measure expressions, though structurally different, are treated alike above a certain level, if they represent instances of locatives or temporals. The target structures just outlined show how the analysis tasks are distributed between syntax and interpretation: The syntactic analysis shall reflect structural meaning as closely as possible, for the interpretation to work on. In addition, it shall transmit the information that is used by the interpretation to determine more specific aspects of meaning: time calculation, reference, scope of negation, meaning of coordinated structures depending on number and part of speech, etc. (see Lehmann forthcoming). The interpretation does not establish functional dependencies, they are established in the syntax using syntacto-semantic information. Conversely, the syntax does not decide on the probability of structures with respect to the data base content.
33 4.5.1 OVERVIEW OF CATEGORIES AND FEATURES The categories (here called constructs) and features, their motivation and use in the grammar is described in Sections 6, where the descriptions of categories and features are found in alphabetical order for easy reference. The outline here shall serve as an introduction. TABLE 3: CATEGORIES (CONSTRUCTS) AND FEATURES Word
Primary categories Name Meaning
auf nur klein sehr Haus
PREP QUMOD ADC ADJ ADOMOD NOMEN
Preposition Quantifier modifier Operator Adjective Adjective modifier Noun
wer Paul Prof. Firma &Co. der 1.
NP NAME TITEL NHEAD NTAIL QU TAG
Noun phrase Proper name Title Head of a name Tail of a name Quanti fier Day
Mai 1956 heute wo 1234 vier +
MONAT JAHR ABT ABL CNU ZW OPERATOR
Month Year Temporal adverb Locative adverb Numeric Numeral Operator
lauf-
VERB
Verb
-en aufund nicht
FINMOT PREFIX CONJ NEGAT STOP SENT
Suffix Prefix Conjunction Negation particle Nul 1 strings/special symbol s System commands
=
debug
Derived categories Name Meaning . PP QU ADJ
Prepositional phra: Quantifier Adjective
NP SENT
Noun phrase Sentence
NP
Noun phrase
NP ABT
Noun phrase Temporal adverbial
ONU
Ordinal number
ARGLIST FUNCTION VC SC SENT
Argument list Function Verb phrase Sentence kernel Sentence
34
TABLE 3: FEATURES
NAME
MEANING
USE
Valence features (cases and prepositions) on VERB, NOMEN, ADJ: TYP=
Case feature with values: NI Nominative only NA Nominative Accusative Similarly: NN NO NG NPZ NPP NOA NAG
Values of TYP indicate the required cases and prepositional phrases for verbs and adjectives. NAP NAA NOAP NOP
DAP from NDAP Additional values result where DA from NDA and NDAP required cases are found. Similarly: AP AG PPZ DP AA PZ NZ and GZ AZ DZ below. AZ GZ DZ
Adverbial +LP +LS +LG +LD +LV
+TP +TS +TG +TD
Accusative Genitive Dative
Adjectives have values only for a single case complement, nouns have no case complements. The prepositional complements of nouns and adjectives are indicated by values of LAB and LABI only.
features on VERB, NOMEN, ADJ and PREP, ABT, ABL: Place Source Goal Distance Path
Point in time Start time End time Duration
Valence features for locatives and temporals on nouns, verbs and adjectives. The same features subcategorize prepositions and adverbials.
Features indicating satisfied valence on VC, SC, NOMEN, NP: +CN Required nominative found +CNA First of 2 nominatives +CG Genitive found +CD Dative found +CA Acccusative found Similarly: +CAA +CP +CPA
Complement features distinguish between verb phrases that are derived from different valence classes, when they have the same value of TYP at some stage.
For locatives and temporals: CLP CLS CLG CLD CLV CTP.CTS CTG CTD Features on PREFIX and FINMOT: ALEV=n NLEV=n VLEV=n
Function with adjective Function with noun Function with verb
Different values identify marking functions of affixes, for case, tense, etc., depending on the category of the stem.
Tense, case, gradation on NOMEN, NP, VERB, VC, or ADJ: +FLE +ADF
Inflected form Adjective inflection
An affix has been attached. The noun takes adjective endings.
+SG +PL +NEU
Singular Plural Neutral
Number features on nouns, verbs, and adjectives. Nouns unmarked for number have +NEU.
35 TABLE 3: FEATURES (CONT.) NAME
MEANING
USE
+MAS +FEM +NTR
Masculine Feminine Neuter
Gender features on nouns.
+NOM +GEN +DAT +ACC
Nominative Genitive Dative Accusative
Case features on nouns, articles and their derivations. Nouns unmarked for case have several case features.
+PRS +PST +PPR +PPE +IMP
Present Past Present participle Past participle Imperative
Tense and mode features on verbs and derived constructs. Tense features are set according to inflection, compound tenses have no special features.
+GRD +SPL
Comparative Superlative
Gradation features on adjectives.
Referent features on noun phrases: +RSG +RPL +RMAS +RFEM +RNTR
Referent singular plural masculine feminine neuter
Gender and number of the referent may differ from that of the noun phrase containing the relative pronoun. The referent features are copied into this noun phrase.
Features on CONJ and coordinated construct!ons: +CON Coordinating +SUBO Subordinating +AB No coordination below NP Also: +PARR +PARL +NPF
Different types of conjunction are distinguished. CON and AB are also used on coordinated constructions.
Features for inherent properties of different constructs: UNIT=n 1-8 +RLM +SUB +DATUM •DAY +JHR +MON +UOT
Unit of measure
Different values distinguish units of different dimensions.
Measure relative to span Subdivision of a unit Date: day, month, year Temporal expressions are Name or number of day subcategorized for internal Number denoting year structure. Name or number of month Unit of time, used as duration
+NAM +QUOT +NOCNU
Derived from NAME Enclosed in quotes Not usable as number
Features on noun phrases indicating derivational history.
+UM +PARE
Only with numerics Two part comparative
UM and PARE subcategoryze quantifier modifiers.
36 TABLE 3: FEATURES (CONT.) NAME
MEANING
USE
+ON +CHAR +SGN SZ= 1-5002
1 Character Numeric Type of numeral
Features tracing derivation from digits, nomináis used in arithmetic expressions, and numerals with different orders of magnitude.
LAB=n LABI=n LEV=n
Preposition or prefix One of two required PREP Identifier for prefix, operator, adjective class
Values of LAB, LABI, and LEV identify underlying or governed words. LEV can also indicate level of complexity.
ART=n Article type 1-5 including possessives +DEF Defective paradigm +REC Reciprocal +REF Reflexive +REL Relative +WPR0 Interrogative +PR0N Personal +P1 +P2 10, ONU becomes NOMEN. The second rule is not used now because the system does not yet handle units of measure. There are no features on ONU.
OPERATOR Construct for arithmetic operators. The construct is assigned by fixed token rule to the words and symbols for the arithmetic operators. The construct is used in grammar rules joining OPERATOR and CNU. The
56 CONSTRUCTS CONT. feature LEV on OPERATOR establishes the order of arithmetic operations. LEV is the only feature on OPERATOR. PP
Construct for prepositional phrases. In the context of this grammar, PP is one of three categories assignable to prepositions followed by noun phrases. The other two are ABL for locative adverbials and ABT for temporal adverbials. Accordingly, the term prepositional phrase is restricted here to non-locati ve and non-temporal uses of preposition noun sequences. By this classification, constructions are differentiated according to their function with higher constructs, i. e. with respect to the valences they satisfy, and not according to internal structure. PP are constructed by grammar rules from PREP and NP, and from PREP and PREFIX from voir)-. PP with NEGAT and coordinated PP become PP. PP having PREP from von are attached as genitive attributes to NOMEN, NAME, and NP. PP having PREP from zu are joined as appositions to NAME, and CNU. PP are attached to NOMEN, NAME, ADJ, VC, and SC as prepositional complements. PP extraposed from their head nouns are attached to VC or SC from the verbs haben and sein.
PREFIX Construct. PREFIX, like FINMOT and NOTEST is a reserved name used by the IPE-TEST, and cannot be changed without changing IPE-TEST. The construct is assigned to the participial prefix ge- and to the separable prefixes of verbs. It is also assigned to the compound form of the word Durchschnitt and to the relative/interrogative wo(r)- before prepositions, as in woran, wodurch. PREP
Construct for prepositions. The construct is assigned by fixed token rules to prepositions. The individual prepositions are distinguished by different values of LAB. Prepositions taking either dative or accusative are treated as two separate prepositions, each with its own LAB feature. PREP from prepositions introducing temporal or locative phrases are subcategorized for the type of adverbial they introduce. PREP and NP with -TIM become PP. Where PREP contain locative or temporal features, PREP and NP also become ABL with NP having -TIM and ABT with NP having +TIM. PREP with PREFIX from wo(r)- becomes PP and ABL. Some PREP are attached to ABL or ABT. In temporal phrases, PREP can be attached to CNU to become ABT, and PREP from um with QU from wieviel is attached to NP and SC.
QU
Constructs for articles (or determiners or quantifiers). The construct is assigned in fixed token rules to definite and indefinite articles, interrogative, demonstrative, and possessive pronouns functioning as determiners, and quantifiers like mehrere, einige, and to one interpretation of nur.
QUMOD
Construct for quantifier modifiers. The construct is assigned by fixed token rule to words like nur, fast. QUMOD Is assigned by grammar rule to pairs of QUMOD and pairs of ADC. QUMOD is attached to NAME and NP to become NP and to CNU and ZW to become CNU.
SC
Construct. SC is assigned by fixed token rule to the system defined command verbs. These SC become command sentences with VC, VERB, NP, and ADJ. In grammar rules, SC results from VC for which all complements reflected in the TYP feature have been found: SC is assigned to the resulting construct in the rules incorporating the last nominal or prepositional complement. SC is also assigned to incomplete VC, to allow for sentences that contain at least the nominative complement but
57
CONSTRUCTS CONT. do not have all complements defined for a given VERB. ments are usually dropped depends on the verb type.
Which comple-
SENT
Construct for the root of the tree. SENT in the at signs set by the parser as input delimiters are recognized as full parses. The construct is assigned by fixed token rule to system defined commands and by grammar rule to STOPs from formatting commands, to SC from system defined command words and their complements, to the word key and its complements, to SC having +IMP, to SENT with CMD in delimiters, to SC with +WPRO or +HS and -WPRO in delimiters, and to variable definitions from NAME and CNU or NP.
STOP
Construct. This construct was designed for different types of strings that the grammar should recognize but that should be kept separate from the grammar. Different types of STOP are distinguished by values of the feature LEV. STOP is constructed by fixed token rules from blank, period, and the reflexive sich. The blank and the reflexive are made invisible to the grammar. The STOP from period is used to transmit the period as a parameter for concatenation with the string in NAME in rules processing names like U.S.A. Since the rules constructing NAME treat the period as a delimiter of names, the period is not included in NAME. The STOP from period is also used with system defined abbreviations like Fa., which are defined without the period to prevent their initial part from being recognized as NAME. STOP without LEV (LEV=0) is assigned to words that occur in phrases recognized by fixed token rules as entire phrases, to prevent the individual components of these phrases from being analyzed as NAME. System defined STOPs of this type are e. g. transitiv from transitive HUI le and the empty subject es of gibt es. Corresponding STOPs are also defined by the user where his nouns, verbs, or adjectives contain blanks, hyphens, slashes or other characters recognized as delimiters of NAME, to prevent spurious parses.
TAG
Construct. The construct is assigned in variable token rules to digits and delimiters indicating the day part of dates. TAG is used in grammar rules constructing NOMEN from combinations of TAG, MONAT,and JAHR.
TITEL
Construct for titles like Professor. The construct is assigned by fixed token rule to full forms and abbreviated forms of titles. The period is left off in abbreviations, to prevent double parsing of the strings both as TITEL and as NAME before period. The period is attached by the grammar rule attaching STOP. TITEL and NAME becomes NAME.
VC
Construct assigned by grammar rule to VERB after FINMOT is incorporated and before the last nominal complement is found. This construct is also assigned by fixed token rules of the grammar to the full forms of system defined verbs with the exception of the participle of haben.
VERB
Construct. VERB is assigned to the verb stems of regular verbs, to the present and past stems of irregular verbs, and to the past participles of irregular verbs. The construct VERB is assigned by users to user defined verb forms. In the grammar it is assigned to gehabt, the past participle of the Irregular verb haben. The construct is used in rules attaching FINMOT and PREFIX.
58 CONSTRUCTS CONT. ZW
Construct for numerals. The construct is assigned in fixed token rules to the primitive elements of the numerals up to and including Hi 11 ion. The combination form ein is constructed by variable token rule to avoid ambiguity with QU. ZW are attached to ZW, with or without intervening CONJ from und, for compound numerals like einundzwanzig. ZW with different privileges of occurrence are distinguished by the feature SZ. ZW are attached as numeric quantifiers to NOMEN resulting in NOMEN and to NOMEN from units of measure resulting in NP. QUMOD and ZW results in CNU.
6.3 FEATURES AND THEIR VALUES The features used in the grammar are listed here in alphabetic order, and not in the sequence of the corresponding feature declarations in the grammar. An overview of constructs and features is found in Table 3. Each feature is described by meaning and use in the grammar. Where the same feature is used for different purposes on different constructs, (e. g. LEV) the uses are described separately for each construct on which the feature is used. AA
Case value of the TYP feature. TYP=AA is assigned to incorporating the nominative into VC of type NAA.
VC in
rules
AB
Logical feature on CONJ. The feature is set in the fixed token rules constructing CONJ from aber and sondern. The feature is used to restrict the scope of quantifiers, negation, and complements to one conjunct where the conjunction is aber or sondern. The feature is checked for absence where NEGAT from nicht is attached, and where complements are attached to NP:+CON. The feature is checked for presence in ADJ and NP coordination and for absence in rules coordinating NOMEN, NAME, and CNU. The feature is also checked for absence in the rule combining ZW and in the rules combining units of measure. AB is copied from CONJ into the resulting constructs to block attaching nicht, genitive attributes, and complements to constructs coordinated by aber or sondern.
ACC
Logical feature for accusative case. In the same way as NOM, GEN, and DAT below, the feature ACC is assigned by grammar rule to NOMEN, NP, QU, NHEAD, and ADJ. ACC is assigned to NOMEN and NAME in rules analyzing appositions with zu like Schmidt zum Manager, 3500 zum Gehalt. ACC is copied into NP, NAME, CNU, ABL, ABT and PP. The features is checked on NP where NP are attached as complements to VC. The feature is set on VC where a nominative NP is attached and the incorporated nominative also has the feature +ACC. This is to prevent ambiguity in declarative sentences where case is indicated by the order of complements if neither nominative nor accusative is unambiguously marked for case.
ACON
Logical feature set on NP from article and coordinated feature Indicates that the article has been attached to the construction and not to the individual elements. -ACON where complements are attached to the right of NP:+C0N. The
nouns. The coordinated is checked complements
59
FEATURES CONT.
in question are also attached to NOMEN. The rules complementing shall be used only if there is no underlying NOMEN:+CON.
NP
AD
Integer feature indicating presence of an adjective, values are 0, 1, and 2. AD=1 is set for NOMEN, NAME, and NP in rules attaching ADJ preceding NOMEN, NAME, or NP. AD=2 is set for ADJ from comparative phrases following NOMEN or NP. AD=0 is checked in rules incorporating FINMOT and system defined NOMEN and PREFIX. It is also checked in rules incorporating ADJ to make sure that several ADJ preceding NOMEN are coordinated before incorporation into NOMEN. AD