207 96 10MB
English Pages 347 [356] Year 1999
The Mathematics of Syntactic Structure
1749 '
I
f
f
I i
1999
Studies in Generative Grammar 44
Editors
Jan Köster Henk van Riemsdijk
Mouton de Gruyter Berlin · New York
The Mathematics of Syntactic Structure Trees and their Logics
Edited by
Hans-Peter Kolb Uwe Mönnich
W Mouton de Gruyter G Berlin · New York DE
1999
Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter & Co., Berlin.
The series Studies in Generative Grammar was formerly published by Foris Publications Holland.
© Printed on acid-free paper which falls within the guidelines of the ANSI to ensure permanence and durability.
Library of Congress Cataloging-in-Publication
Data
The Mathematics of syntactic structure : trees and their logics / edited by Hans-Peter Kolb, Uwe Mönnich. p. cm. — (Studies in generative grammar ; 44) Includes bibliographical references and index. ISBN 3-11-016273-3 (alk. Paper) 1. Grammar, Comparative and general — Syntax. 2. Mathematical linguistics. 3. Computational linguistics. 4. Generative Grammar. I. Kolb, Hans-Peter, 1954II. Mönnich, Uwe, 1939- . III. Series. P291.M354 1999 410'.1'51—dc21 99-24939 CIP
Die Deutsche Bibliothek — Cataloging-in-Publication
Data
The mathematics of syntactic structure : trees and their logics / ed. by Hans-Peter Kolb ; Uwe Mönnich. — Berlin; New York : Mouton de Gruyter, 1999 (Studies in generative grammar ; 44) ISBN 3-11-016273-3
© Copyright 1999 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Printing: Arthur Collignon GmbH, Berlin. Binding: Lüderitz & Bauer GmbH, Berlin. Printed in Germany.
Preface This book is intended to show that formal methods can enlarge our understanding of issues in theoretical and computational linguistics. It is designed for scholars and graduate students in linguistics. A certain affinity to formal matters on the part of the reader will be required, even though we have tried to make the material accessible to students of cognitive science in general by sketching in the introduction the principal lines of development which have led to the topics treated in the contributions to this volume. Mathematical linguistics has had a rather chequered history during the last four decades. Alongside the abiding and central subjects of automata theory and formal languages, the development of logic as a medium for specifying syntactic and semantic structures has undergone several changes. With hindsight, it seems fair to say that the pronounced disregard for computational issues which characterized the "semantic turn" of modeltheoretic type theory as exemplified by the Montagovian paradigm caused widespread neglect of logical methods for an extended period of time. It was the rise of complexity theory which finally led to a revival of interest in logical methods; from that moment on its impact on the field of mathematical linguistics can not be overestimated. Finite model theory in particular, with its emphasis on the relationship between logical and computational resources, has created new bridges between declarative and procedural approaches towards theoretical accounts of linguistic structures. We hope that the present collection will suggest new lines of research and testify to the fruitfulness of logical methods as a unifying theme in formal linguistics. The idea to this volume was born at the workshop on "The Mathematics of Syntactic Structure," held during the European Summer School on Logic, Language, and Information 1996 in Prague. We wish to thank the program committee of the summer school for entrusting us with the organization of this workshop, and the participants for the stimulating discussions which made it such a rewarding event. The editors gratefully acknowledge the support provided by the Deutsche Forschungsgemeinschaft through its funding of the Sonderforschungsbereich 340 "Sprachtheoretische Grundlagen für die Computerlinguistik" at the Universities Stuttgart and Tübingen. H.-P. Κ. & U. Μ.
Contents Introduction Hans-Peter Kolb and Uwe Mönnich The descriptive complexity of generalized local sets James Rogers
1 21
Monadic second-order logic, tree automata, and constraint logic programming Frank Morawietz
41
Principle languages and principle-based parsing Hugo Volger
83
The expressivity of tree languages for syntactic structure AdiPalm
113
Underspecification in Tree Description Grammars Laura Kallmeyer
153
On cloning context-freeness Uwe Mönnich
195
Macros for Minimalism? Towards weak descriptions of strong structures Hans-Peter Kolb
231
Adjunction structures and syntactic domains Marcus Kracht
259
Representational Minimalism Thomas L. Cornell
301
Index
341
Introduction Hans-Peter Kolb and Uwe Mönnich
1 After a long period of almost exclusively empirical work which characterized the Principles & Parameters (P&P, a.k.a. Government/Binding, G/B) era of Generative Grammar research (Chomsky 1981, 1986), with the advent of Minimalism (Chomsky 1995) formal language and complexity theoretic considerations have reentered—even though in a somewhat disguised form—the stage. This seems to be a mixed blessing, however. While the P&P program is certainly not unproblematic from a formal point of view, its attempt to discover substantive properties of Natural Language has borne considerable fruit. Today's minimalist focus on the means of driving derivations is on first glance more amenable to—and more aware of the need for—formal analysis, yet it seems to be somewhat less successful in stimulating the empirical component of P&P research. The two approaches illustrate very different attitudes towards formalization. To clarify the issue let us briefly consider the original research program of Generative Grammar as set forth in Chomsky (1975, 1965). Important details ignored and put in current terms, it rests on the conviction that (1)
a. there is a formal framework which allows exactly the specification of the class of Natural Languages: any theory (i.e., any consistent set of axioms) within this framework specifies a potential Natural Language;1 b. there exists a non-trivial set of axioms (the initial state, Universal Grammar), any learnable extension of which specifies a possible Natural Language, and every Natural Language has a theory which is a learnable extension of the initial state.
The goal of linguistic theorizing, then, is fourfold:
2 Hans-Peter Kolb and Uwe Mönnich (2)
a. to determine the primitive building blocks of linguistic theory (the substantive universals), e.g., the inventories of linguistic categories or features; b. to determine the permissible operations and/or relations on these primitives; c. to determine the initial state; d. to specify a learning procedure (the Language Acquisition Device, LAD) which maps initial state and primary data onto the steady state, the theory (grammar) of a particular language.
Since the steady state is heavily under-determined by the primary data, (lb, 2c/d) comprises much of the gist of the Generative paradigm: a highly structured initial state reduces the burden on the LAD and increases the explanatory force of any grammar established this way—providing the underlying framework supports the establishment of consequences in a rigorous way. Hence, ideally, the formal framework is controlled to a considerable extent by substantive considerations such as questions of the structures admitted ("strong generative capacity"), or degrees of transparency in the expression of structural regularities.2 Conceptually, on the other hand, (1) and (2) are independent of each other. (2) can be understood as an entirely empirical agenda, to be explored under the very weak assumption that there exists a framework expressive enough to eventually formalize the results. Yet, the advances—not the least through important contributions by Chomsky (e.g., Chomsky 1956,1959)—of Formal Language Theory in the 50s and 60s, which had provided much initial evidence for the feasibility of the basic approach, seemed to suggest a stronger röle of the formal framework in Generative Grammar research. It had established, for the classes of the Chomsky hierarchy of formal languages, an intimate connection between the form of a grammar (the rule types employed), the computational resources needed to process it, and the type of (string) language specified by it. Applied to Natural Language this line of proceeding aims at establishing a formal system defined automata-theoretically and/or via rule types3 which by its inherent restriction of expressive power provides guidance through the empirical mine-fields of linguistic theory formation. Consequently, (la) became a major concern of pre-P&P Generative theorizing, be it somewhat slanted towards considerations of computational complexity. Intricate—and not always accurate—arguments placed the natural languages on the Chomsky-Hierarchy somewhere between context-free and
Introduction 3 context-sensitive, and various varieties of transformational models were explored with the aim to discover the one "exactly right" formal framework, which, by our remarks above, would "automatically" provide for the right type of structures in a maximally transparent way. A first seed of doubt about the adequacy of this approach was planted by Ross (1967), who, based on careful analysis of a vast amount of empirical data, had demonstrated that many important regularities of Natural Language can only be expressed by means orthogonal to the descriptive apparatus of phrase-structural and transformational rules. Even though he had formulated his constraints on variables in syntax within the transformational framework, namely as restrictions on the interpretation of transformational rules, it was manifest that they were most naturally construed as substantive constraints on the structure of Natural Language. So when in the early 70s Peters and Ritchie formulated their famous (though not necessarily well-understood) result that there is an Aspects-style Transformational Grammar for any recursively enumerable language, i.e., that the formal framework of TG in itself does not only not restrict the class of natural languages in any way, but does not even ensure computability of the grammars couched in these terms (Peters and Ritchie 1971, 1973), this was the straw that broke the camel's back.4 The original enthusiasm for questions of computational complexity was seriously abated, and the stage was set for a major shift in emphasis of linguistic theorizing towards substantive issues, i.e., from (la) towards (2c): Not to get the formal framework "exactly right" became the primary goal now, but the discovery of "exactly the right" structural properties of Natural Language—in what ever way formulated. Again, the expectations ran high: The view of UG as a system of substantive structural constraints with only a very limited range of variation directly implemented in the mind as part of the human endowment was supposed to trivialize to a considerable extent the problems of language acquisition,5 and, even more so, Formal Language Theory: 6 In fact, the questions do not even arise [ . . . ] if UG permits only a finite class of grammars. It might, for example, turn out that these grammars characterize languages that are not recursive or even not recursively enumerable, or even that they do not generate languages at all without supplementation from other faculties of mind, but nothing of much import would necessarily follow, contrary to what has often been assumed. (Chomsky 1981:12f) P&P-theory, the culmination of this development for the time being, has been very successful in expanding the scope of syntactic research—in depth (new
4 Hans-Peter Kolb and Uwe Mönnich phenomena/constructions) as well as in breadth (different languages). However, as a formal system it looks largely incoherent. A formal frame of reference is—if at all—only very loosely determined and formal considerations have a low priority. As a consequence, P&P theory often gives the impression of a mere collection of "interesting" facts which is largely data driven and where every new phenomenon may lead to new (ad hoc) formal devices, often incompatible, and without a measure to compare, and/or decide between, conflicting analyses meaningfully. In this respect it seems to constitute a capital departure from the original means and methods of Generative Grammar research. One of the goals of this book is to show that this is a contingent fact and not a necessary consequence of the concentration on (2c). As a tacit corollary of this fact it follows that the explanatory potential of the P&P approach has not yet been pursued to the limit and that the step (back?) towards strictly formalism-driven theory formation as ventured by the Minimalist Program is somewhat pre-mature. It should be noted that the conceptional reorientation from generating to licensing theories is by no means restricted to the Chomskyan variety of Generative Grammar. The evolution of HPSG from its "generating" GPSG roots via the somewhat ambiguous approach of Pollard and Sag (1987) to the clearly "licensing" based Pollard and Sag (1994) is a case in point, as are recent developments in TAG theory towards the use of (partial) tree descriptions (cf., e.g., Rambow et al. 1995). None of these revisions have lead to a comparable disintegration of the respective formal systems.7 However, except for lip-service, most of the little formal work that has been done on the basis of P&P theory has not been concerned with questions of (1) at all. Either formalization was restricted to some fixed version of the theory—usually in some "machine-readable" language under the heading of G/B-parsing—without consideration for alternatives or future developments, or a strong meta-theory like full predicate logic—maybe even of higher order—was used in order to accommodate the wide variety of means of P&P theory formation. Despite the indisputable merits of this work in elucidating many aspects of P&P theory neither strategy is likely to yield a formal framework for Generative Grammar in the sense of (1): The former, while relatively close to the ultimate purpose of the P&P model, is obviously premature: change seems to be about the only constant in linguistic theorizing and consequently there has not been a consistent, let alone complete, set of principles at any stage of G/B-development. The latter, on the other hand, seems to go nicely with actual linguistic practice in that it provides all the flexibility needed to express
Introduction 5 new generalizations. However, in a Swiss-Army-Knife-type framework the particular structural properties of Natural Language grammars will always just be accidents without any explanatory force. Neither can such formalizations, due to their unrestrictive character, provide any significant guidance for linguistic theorizing, nor are there general decision procedures for such strong logics, hence these attempts are always in danger of formalizing for formalization's sake. This book advocates a different perspective, which warrants a more ambitious goal of the formal investigation of Natural Language syntax: The development of a specific meta-theory for the expression of theories of natural language syntax on the basis of a weak logic, which makes direct use of the primitive notions of grammatical theories. This line puts complexity theory and formal language theory firmly back on the map, although in a guise slightly different from the early days. In particular, it reconciles the rift between formal and substantive universale, providing a firm ground (i.e., clear formal bounds) for the formulation of substantive constraints.
2 Descriptive complexity, descriptive theory of recognizability, and linguistic theorizing The key concept for the formal treatment of linguistic theories of the licensing kind is that of descriptive complexity. While the computational complexity of a problem is usually defined in terms of the resources required for its computational solution on some machine model, descriptive complexity looks at the complexity of describing the problem (seen as a collection of relational structures) in a logic, measuring logical resources such as the type and number of variables, quantifiers, operators, etc. It is a non-trivial fact that there is a close correspondence between these two, with many natural logics corresponding exactly to independently defined complexity classes (cf. Fagin 1974, Immerman 1987). By hindsight, one can look upon Trakhtenbrot's theorem, according to which the set of all finitely satisfiable sentences of first-order logic is not decidable, as suggesting the problem of characterizing the cardinalities of finite models of first-order sentences. This problem, which has been coined the spectrum problem, was generalized by Fagin to the case of existential second-order, or Σ], sentences. In this context the problem consists in delineating generalized spectra, i.e., the classes of finite models of Σ] sentences. It is easy to see that a spectrum is a decidable set of natural numbers. Given
6 Hans-Peter Kolb and Uwe Mönnich a finite signature τ and a natural number n, there are, up to isomorphism, only a finite number of τ-structures of cardinality n. For each of these structures, it takes only a finite amount of work to verify whether a first-order sentence φ of signature τ holds in it. By a similar procedure it can be shown that generalized spectra are decidable. In order to relate logical expressibility with computational complexity finite structures have to be encoded as strings to turn them into possible inputs for Türing machines. Once a suitable convention for representing structures as strings has been chosen one can study the computational properties of finite structures. The remarkable result, known as Fagin's theorem, establishes the historically first equivalence between a problem in complexity theory and a problem in logical expressibility: Given a finite signature τ and a set of isomorphism-closed finite τ-structures Κ, Κ is a generalized spectrum if and only if K's set of encodings is in Ν P, where Ν Ρ designates the class of problems that take nondeterministic polynomial time for a Hiring machine to decide its individual instances. This close connection between a syntactically restricted class of logical formulae and a time-restricted class of computations has attracted considerable attention both in the logic and the computer science community and was the beginning of a series of important results that provide analogous logical descriptions of other complexity classes. This research program, i.e. the systematic study and development of logical resources that capture complexity classes is well documented in the recent monograph Ebbinghaus and Flum (1995) to which we refer for further information. The basic aim of this program—descriptive complexity theory—consists in giving a characterization of the degree of complexity of a family of problems by looking into the structure of formal specifications in which these problems can be stated. In the words of Fagin (1993), theorems that relate two distinct fields, logic and complexity theory in the present connection, are to be regarded as "mixed" theorems. They provide a precise justification for the use of methods from one field to solve problems in the other field, in the case at hand the employment of modeltheoretic methods e.g. to attack problems in complexity theory. As far as we can see, descriptive complexity theory in general has not exercised any direct influence on linguistic theorizing. The topic to which the contributions of this volume belong and where from the computational perspective finite automata are taken into account rather than resource-bounded Turing machines has been called descriptive theory of recognizability in the recent overview by Thomas (1997). In terms of the logical specifications
Introduction
7
at issue, recognizability amounts to a restriction of existential second-order logic to its subsystem where only quantification of monadic second-order variables is permitted. We shall sketch in the following some of the remarkable achievements in this subfield of descriptive complexity theory that are directly related to central topics in theoretical linguistics. Along the way, we will try to point out in which way the interplay between computational and descriptive complexity reflects the change of paradigm from generating to licensing models in linguistics recounted in the first half of this introduction. Before we start with this line of reasoning we have to insert a remark on the difference between two types of relations between syntax and semantics of formal systems. They can be illustrated with respect to Fagin's theorem. In one direction the theorem informs us that the question whether an arbitrary finite structure of signature τ belongs to the class of models of an existential second-order sentence of that signature can be decided in nondeterministic polynomial time. In other words, the theorem contains an upper bound for the complexity of the satisfaction relation of sentences in Σ]. This satisfaction relation must not be confused with the satisfiability problem of theories stated in terms of the syntactic resources of a certain logic. The intuitive argument outlined above to the effect that the relation A 1= φ between a finite structure A and a first-order sentence φ is decidable is easily extended to a wide variety of logics. Fagin's theorem and subsequent results on the computational requirements of this relation lead to logical characterizations of a whole family of sub-recursive complexity classes. These results do not imply that the theory of an arbitrary structure or of a family of structures, whereby we designate by the theory of a structure—as is customary—the set of sentences satisfied by the (whole family of) structure(s), is a decidable set of sentences. Once the restriction to finite structures or the restriction to a single finite structure is lifted the question whether a sentence φ holds in an infinite structure or in a class of finite structures may become undecidable. It was shown in Rabin (1969) that the monadic second-order theory of two successor functions S2S is decidable. As soon as the syntactic means available for the formulation of the properties of the intended model of S2S, the infinite binary tree, are enriched by a single binary relation the resulting theory specified in terms of this enriched signature becomes undecidable. Even the restriction to the Σ] fragment of this extended theory suffices to obtain this negative result. This constitutes no contradiction with Fagin's result, as should be obvious. What it shows is the importance of the interplay between the class of models under consideration and the syntactic resources at hand in order to state their structural properties. One of the main attractions of descriptive complexity theory
8
Hans-Peter Kolb and Uwe Mönnich
consists in this promising avenue it has opened for the systematic application of model-theoretic methods to problems of computational complexity. In our discussion above of Chomsky's attitude towards issues in computational theory we have emphasized the internal relation between fundamental tenets of the constraint-based variant of the Generative model and a pronounced distance towards questions of computational complexity. Under a certain perspective the considerations devoted a moment ago to the decisive role of the intended (class of) models can be read as a vindication of Chomsky's view defended in G/B. Under this perspective the fact that a specific logical theory is undecidable should be no cause of concern as long as its model can be discarded for independent reasons having their origin in the nature of the human language faculty. Should, e.g., cognitivist assumptions of this sort require that the hypothesised principles that constrain this faculty of the mind, be only tested against finite structures, decidability becomes a nonissue. We shall return to this topic in connection with Rogers' formalization of the theory of Government/Binding. The logical characterization of the complexity class Ν Ρ was predated by a couple of results, especially the characterization of regular string and tree languages by means of monadic second-order logic of one (SIS) and multiple (SnS) successors, respectively. It is these earlier characterizations that provide the formal foundations for a renewed interest in logical approaches to grammar specifications. The papers in the present volume, while belonging to the logical tradition in linguistic theorizing, are particularly sensitive to the potential offered by logical techniques in generalizing formal language results from strings to other structures like trees and special types of finite graphs. To be a little more specific, the original results in the descriptive theory of recognizability establish a tight connection between logical formalisms and language classes by providing translation procedures that transform logical specifications into finite automata equivalent to the language classes and vice versa. Büchi (1960) and Elgot (1961) have shown that regular string languages represented through finite (string) automata can be expressed by sentences in the weak monadic second-order logic with one successor. For tree languages an analogous result is well known to the effect that a tree language is definable in weak monadic second-order logic with multiple successors if and only if it is recognizable by a finite tree automaton (Doner 1970, Thatcher and Wright 1968). The logical approach to the specification of language classes involves a lot
Introduction
9
of advantageous properties that have paved the way to its application to linguistic issues. First, the equivalence between automata theoretic operational and logic oriented declarative formalisms leads to a lot of closure properties of the defined language classes that follow immediately from the closure of the specification logics with respect to the traditional operations like negation, conjunction, alternation and (universal and existential) quantification. Second, the transition from strings to finite modeltheoretic structures of arbitrary signatures requires no extra conceptual or technical ideas in the logical framework whereas in formal language theory the step from string to tree languages and the concomitant distinction between weak and strong generative capacity constitutes a significant extension of the research agenda. Third, since the logical approach does not depend on an operational process which, starting from some given objects, generates its space of interpretation, but refers directly to an assumed universe of structures, its statements can be phrased in terms of linguistically significant notions that enter into universal principles and language-particular constraints. Finally, those logical languages that capture complexity classes indicate lower bounds on the computing resources a system has to make available that wants to use those structures that fall within the classes correlated with the corresponding logical language. This spectrum of desirable properties constitutes by itself no guarantee for the success of a framework that avails itself of modeltheoretic techniques for the study of linguistic structures. Languages that are expressive enough to define significant properties of natural languages may be too powerful from the perspective of complexity theory or even classical recursion theory. On the other hand, logics which are well-behaved in terms of the language families definable by them may lack the syntactic resources necessary for the direct determination of relevant linguistic notions and principles. Given this threat of a possible failure between an overpowering Scylla and an impoverished Charybdis, the value of the formalization of major parts of Head-Driven Phrase Structure Grammar achieved in Paul King's Speciate Re-entrant Logic (King 1989, 1994) and of the formalization of essential aspects of G/B theory attained within a definitional extension of monadic second-order logic of multiple successors in Rogers (1994) cannot be overestimated. The intended models of King's logic being a special type of graphs with labeled edges, we restrict our attention in the following on Rogers' work, because he is concerned with another type of graphs, category-labeled finite trees, that have been the focus of interest in the Generative tradition and that constitute the preferred "domain of discourse" of the papers assembled in this volume.
10 Hans-Peter Kolb and Uwe Mönnich As was pointed out above, the weak variant of the logic chosen by Rogers is severely restricted from the point of view of descriptive complexity theory. Languages are definable model-theoretically in this logic just in case they are an instance of a recognizable tree language, where we disregard in the context of the present discussion the issue of unbounded finite out-degree of nonterminal nodes in trees belonging to such a language. In spite of this low degree of computing resources to decide membership in a language specified by means of this monadic second-order logic the logic is surprisingly flexible in capturing central concepts of the P&P approach. This apparent conflict can be easily accounted for by looking at particular definitions proposed in Rogers' work. It then becomes immediately clear that two main ingredients underlie the success of this attempt to couch a linguistic theory of the licensing type in a precise logical medium. There is, on the one hand, the device of second-order quantification which allows one to speak about arbitrary sets of nodes in a tree model. While this device implies already a great advantage in comparison with the definitional power offered by first-order logic, it would by itself carry not very far without the possibility of addressing directly linguistically significant tree configurations. The (extended) signature of the logic contains relations like dominance —Component{ Y)] Component (X) = Path(X) AVx,y[X(x) A X ( y ) -> F.Eq(x,y)] A Vx, x'3yVz[X(x) A X(x') Α χ < χ ' -> —Adj(x') Ax < y Ay 96 x ' A A d j { y ) A (x € -A(QS) and π 2 ο τ is a run of A onT}. Which is to say that (Τ,τ) € ^ ( Q s ) iff the first projection of Τ is a tree in .A(Qs) and the second projection of τ encodes a run of >l(Qs) on that tree. Thus yi(Qs) is a projection of A(Qs). It is not hard to show that >A(Qs) is generated by a CFG and is, therefore, a local set. • Corollary 10
A language is a CFL iff it is the yield of a recognizable set.
This follows from the fact that we might as well label the leaves in Ji(Qs) with I .
4 Descriptive characterization We are now in a position to establish the characterization we took as our starting point, which is, in essence, a characterization of the recognizable sets by definability in L^ p . The result was originally established for SnS, the monadic second-order theory of η successor functions (Rabin 1969). As we will show shortly, L^ p and SnS are equivalent in their descriptive power; we work in L^ p because it has a relatively natural signature for reasoning about trees in the manner typical of theories of syntax. Thus, while encodings of linguistically significant relationships in SnS tend to be rather difficult to decipher, in L^ p they are reasonably transparent. Definition 11 (SnS) Let N n = (Tn, ε, S, equivalently L^ p . Corollary 20 A language is context-free iff it is the yield of a set of finite trees definable in SnS for some η < u>; equivalently, if it is the yield of a set of finite trees with bounded branching that is definable in Ι.έ p.
5 Unbounded branching As should be clear from the discussion of the previous section, there is something of a misfit between L^ p and the recognizable and local sets. While the requirement that grammars and automata be finite restricts the latter to defining sets of trees with finitely bounded branching L^ p suffers from no such restriction. The distinction would be, perhaps, a minor point if it were not for the fact that sets of trees in which there is no finite bound on the branching are characteristic of some linguistic analyses, in particular flat analyses of coordination. Perhaps the best known presentation of this is in GPSG (Gazdar
34 James Rogers et al. 1985). The relevant component consists of three (finite) rule schemas and a linear precedence rule: X —> Η [CONJ ao],H[CONJ a j + (do, a i ) G {{and, NIL), (NIL, and), {neither, nor), (or, NIL), (NIL, or)} X[CONJ NIL] —> Η XfCONJ a] —> {[SUBCAT a]}, Η a 6 {and, both, but, neither, nor, or] [CONJ oq] X [CONJ ai] ao G {both, either, neither, NIL} ai € {and, but, nor, or} Here X is not a variable but, rather, is a radically underspecified category (no features are specified at all). The effect of the first schema (the iterating coordination schema) is to allow any category to be expanded to a sequence of two or more categories, all of which are heads (and therefore are of the same basic grammatical type although they can differ considerably in details), in which any one is marked as a CONJ of type ao while all the rest are marked as CONJ of the corresponding type αϊ. The second and third schemas expand categories marked as [CONJ NIL] simply as the same category unmarked, and those marked [CONJ α] as and X, both X, etc. for α in and, both, etc. Finally the linear precedence rule requires categories marked with CONJ of type both, either, etc. to precede in the sequence all categories marked with CONJ of type and, but, etc. The effect is to license expansions of any category X into one of X and X XorX neither X nor X Χ... X and X X and X... and X X ... XorX neither X nor X ... nor X Such an account is easy to capture in L^ p . We will assume that Ρ is the (finite) set of feature sequences employed in the GPSG account.5 One can assert, for example, that the {and, NIL) instance of the iterating coordination schema is satisfied in the local tree rooted at the node assigned to χ with: (3-yo.Vi)[
x χ -< y]· Which just says that whenever χ and y are siblings, χ is marked [CONJ NIL] and y is marked [CONJ and], then χ will precede y.
6 Generalized local and recognizable sets The iterating coordination schema is stateable in GPSG because GPSG specifically allows the Kleene star to be applied to categories on the right-hand side of its rewrite rules. Thus these grammars are no longer finite, but are still finitely presentable. That is to say, the grammar consists of an infinite set of rewrite rules, but that set itself is generated by a finite grammar, in this case a regular grammar. Such a notion of grammar is not new. Langendoen refers to them as hypergrammars in Langendoen (1976), where he motivates them with a flat account of coordination quite similar to the GPSG account. A class of hypergrammars is determined by both the class of the generated grammar and the class of the generating grammar, with the weak generative capacity being the larger of the capacities of the two. Here we will focus on the generated grammars, referring to infinite, but finitely presentable, sets of context-free rewrite rules as generalized CFGs, and to the sets of trees generated by them as generalized local sets. We will refer to the equivalent notions in the realm of tree automata as generalized tree automata and generalized recognizable sets. We are interested, in particular, in the variants in which the generated grammar (or automaton) is a regular set. This gives a slight generalization of the GPSG style grammars, in that, rather than restricting application of the Kleene star to individual non-terminals, we, in essence, admit arbitrary regular expressions on the rhs of rewrite rules. Definition 21 (Regular Generalized Context-free Grammars) A regular generalized context-free grammarlocal trees over some alphabet Σ: G C Σ χ Γ , regular.
36
James Rogers
Definition 22 (Regular Generalized Local Sets) A set of trees is a regular generalized local set iff it is admitted by a regular generalized CFG in the sense of Definition 4. Definition 23 (Regular Generalized Tree Automata) A regular generalized tree automaton over an alphabet Σ and a finite set of states Q is a regular set of triples: A C I χ Q χ Q*, regular. Definition 24 (Regular Generalized Recognizable Sets) A set of trees is a regular generalized recognizable set iff it is A (Q s) for some regular generalized tree automaton A and Qs, where .A(Qs) is defined as in Definition 6. As the fact that the iterating coordination schema of GPSG can be captured in L^ ρ suggests, L^ p suffices to define any regular generalized local set, and in fact any regular generalized recognizable set. Lemma 25
Every regular generalized recognizable set is definable in L^ p.
Proof. The construction of the proof of Theorem 17 is limited to bounded branching, first of all, because it employs individual variables to pick out the children of a node and only a fixed number of these can occur in a formula. We can circumvent this obstacle by using a single set variable in their stead, but this is still not quite enough. Since the set of triples is, in general, infinite, the disjunction over the formulae encoding the triples which is used to require one of these to hold at every local tree will not necessarily be finite. The way we overcome this is to follow the approach used in capturing the iterating coordination scheme. Rather than having a distinct formula for each triple, we use a distinct formula for each label/state pair (P, Q) € Ρ x Q, which will require the string of states assigned to the children of a point with label Ρ and state Q to be in the set of sequences of states associated with (P, Q) in A. (If we regard the triples of the automaton as a set of (Ρ χ Q) U Q labeled local trees, this set is just the child language Ch(pQ)(>l).) Since Λ is a regular generalized tree-automaton, this set of sequences is a regular language and thus, by Theorem 18, definable in SIS. Now, the set of children of a node, when ordered by linear precedence, is isomorphic to a unary branching tree. It follows that the set of sequences is definable in terms of -< (and some auxiliary labels, S, say) on the set of children of a node. Let
The descriptive complexity of generalized local sets
37
(A, si) (B,S2> •
j^> (D ,s 6 ) Gx(Qo)
···
Figure 4. Proof of Lemma 27
\jj( P Q)(X,Q,S) define Ch( P Q)(A) on X. Using these we can replace the formulae cpi(xo, • • · , Xm. Q) with:
(X,Q,^)],
X F E ) ] )
Since Ρ and Q are both finite, there are but finitely many of such formulae. With this modification, the proof of Theorem 17 goes through. • Corollary 26
Every regular generalized
local set is definable in L^ p.
This follows from the fact that the regular generalized local sets are a subclass of the regular generalized recognizable sets. The question now arises whether we can do more than this. While L^ p clearly cannot capture any class stronger than the context-free generalized recognizable sets (since such sets can have non-context free yield languages) is it possible to capture the context-free generalized recognizable sets in p? The answer is no. L e m m a 27 Every set of finite trees definable in L^ p is a regular ized recognizable set.
general-
Proof. Suppose Τ is a set of finite P-labeled trees definable in L^ p . The fact that Τ is definable in L^ p implies that it is definable in ScuS, which in turn def
implies that h(T) ( = {h(T) | Τ e T}) is definable in S2S (where h is the embedding of ScuS into S2S). From the proof of Lemma 9 we have, then, that there is CFG G T C (Ρ' χ Q) χ (Ρ' χ Q)* and some Q s C Ρ ' χ Q such that h(T) = πι(GT(QS))· (Here P ' extends Ρ with some arbitrary labels for the nodes falling between the nodes in the range of h.) Now, Τ is certainly recognizable in the sense that the states (and hence labels) of any set of children depend only on the state and label of their parent (this follows
38
James Rogers
from the fact that Gj(Qs) is local), so it is a generalized recognizable set of some sort. It remains only to show that the child languages of Τ are regular. Note that, for any (P, Q) 6 Ρ x Q we can pick out the subset of G j that licenses the images in Gt(Qs) °f local trees in Τ rooted at nodes labeled Ρ and assigned state Q; and that these form a CFG G(P>q) generating, in essence, Ch(PQ) (T) (modulo taking the first projection of the leaves). In the example of Figure 4, for the pair (A, si) this would include (A,si) —) Β C D (A',S 7 )
Because h. maps local trees in Τ to right branching trees these CFGs are all right linear. Consequently, Ch(P Q)(T) is regular for all Ρ and Q and Τ is a regular generalized recognizable set. • This, then, establishes the strong generative power of L^ p with unrestricted branching. Corollary 28 A set of finite trees is definable in L^ p iff it is a regular generalized recognizable set, equivalently, iff it is a projection of a regular generalized local set.
7
Conclusions
The characterization of the recognizable sets in terms of definability in L^ p was originally developed as a means of establishing language-theoretic complexity results for theories of syntax that are presented as systems of constraints on the structure of trees. Here we have shown that if we don't require L^ ρ theories to explicitly establish the finite bound on branching that is a characteristic of recognizable sets then the class of sets they can define is a mild generalization of the class of sets of trees definable by GPSG-style grammars. Thus the natural strong generative capacity of p coincides with a linguistically natural class of sets of structures. When coupled with the flexibility of L^ p in expressing principles and constraints occurring in a variety of theories of syntax, this offers a new perspective on relationships between these theories. We have, for instance, shown that p suffices to capture both a substantially complete G/B account of
The descriptive complexity of generalized local sets
39
English D- and S-structure (Rogers 1996a) and the syntactic component of the GPSG account of English of Gazdar et al. (1985) (see Rogers 1997a). Because L^ p gives a common formalization that is independent of the grammars and mechanisms employed to present these theories, it provides a framework for direct comparison of the properties of the sets of structures they license. In the case of G/B and GPSG, issues that have been the focus of many contrasts of the two—like the distinction between transformational and monostratal theories—turn out to have little in the way of actual formal consequences. Distinctions that have been previously overlooked, on the other hand, in particular differences in the nature of linguistic universale, appear to be formally more significant (Rogers 1996b). As this model-theoretic approach is extended to more complex classes of sets of structures and is applied to a broader range of theories it promises to offer considerable insight into the regularities of natural language syntax that transcend specific theories and to the issues that actually distinguish them.
Notes 1 Doner's result was in terms of wSnS, a definable sub-theory of ScuS. 2 The requirement that the trees be finite is a simple consequence of our assumption that they can be generated by a finite process. This restriction is without great significance. 3 Note that finiteness of the trees will still require all branching to be finite, branching of any finite degree, however, may be admitted. 4 For full details see Rogers (1996a) 5 For details of this and other aspects of the treatment see Rogers (1997a)
References Büchi, J. R. (1960): Weak second-order arithmetic and finite automata. Zeitschrift für mathematische Logik und Grundlagen der Mathematik 6:66-92 Doner, J. (1970): Tree acceptors and some of their applications. Journal of Computer and System Sciences 4:406—451 Gazdar, G., E. Klein, G. Pullum, and I. Sag (1985): Generalized Phrase Structure Grammar. Harvard University Press G6cseg, F. and M. Steinby (1984): Tree Automata. Budapest: Akademiai Kiadö
40 James Rogers Gorn, S. (1967): Explicit definitions and linguistic dominoes. In: J. F. Hart and S. Takasu (eds.), Systems and Computer Science, Proceedings of the Conference held at Univ. of Western Ontario, 1965, Univ. of Toronto Press Kayne, R. S. (1984): Connectedness and Binary Branching. Dordrecht: Foris Langendoen, D. T. (1976): On the weak generative capacity of infinite grammars. CUNYForum 1:13-24 Rabin, M. O. (1969): Decidability of second-order theories and automata on infinite trees. Transactions of the American Mathematical Society 141:135 Rogers, J. (1996a): A Descriptive Approach to Language-Theoretic Complexity. Studies in Logic, Language, and Information, CSLI Publications, To appear Rogers, J. (1996b): A model-theoretic framework for theories of syntax. In: 34th Annual Meeting of the Association for Computational Linguistics, UC Santa Cruz Rogers, J. (1997a): "Grammarless" phrase structure grammar. Linguistics and Philosophy To Appear Rogers, J. (1997b): Strict LT2 : Regular :: Local : Recognizable. In: C. Retore (ed.), Logical Aspects of Computational Linguistics : First International Conference, LACL '96 (Selected Papers), Springer, vol. 1328 of Lecture notes in computer science/Lecture notes in artificial intelligence, pp. 366-385 Thatcher, J. W. (1967): Characterizing derivation trees of context-free grammars through a generalization of finite automata theory. Journal of Computer and System Sciences 1:317-322 Thatcher, J. W. and J. B. Wright (1968): Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory 2(1):57-81
Monadic second-order logic, tree automata, and constraint logic programming Frank Morawietz
1 Introduction Model-theoretic techniques are the underlying principle of a number of grammar formalisms for natural language. In this paper, we focus on the use of monadic second-order (MSO) logic for formalizations of Principle and Parameter (P&P) approaches in the Government and Binding tradition (Chomsky 1981). The suitability of this formalism for P&P grammars has been shown in a recent monograph by Rogers (To appear). Constraint-based formalisms characterize objects with logical description languages declaratively., i.e., without the specification of how to generate admissible structures. To be able to use them in applications, computational linguistics has to provide a connection between model theory and theorem proving on the one hand, and natural language processing on the other. We can bridge the gap between the two by exploiting the connection between constraints in MSO logic on trees and tree automata. Our purpose in this paper is twofold. Firstly we discuss the operational interpretations of MSO logic. Since the solutions to constraints expressed in MSO logic are represented by tree automata which recognize the assignment trees satisfying the formulas, we can directly use the automata as the operational interpretation of our formalism; the aim of the paper being the exploitation of Rogers' formalization in relation to tree automata. And secondly we discuss possible extensions of MSO logic via the addition of inductive definitions with respect to both the operational interpretation and the generative capacity, since MSO logic characterizes only the context-free languages, but natural languages are generally believed to be higher in the Chomsky hierarchy. More concretely, since inductive definitions can be seen as special cases of constraint logic programming (CLP) clauses and the specification of a CLP
42
Frank Morawietz
interpreter re-introduces the operational concept of derivations, we propose the embedding of MSO logic into the constraint logic programming scheme of Höhfeld and Smolka (1988) to define an even more powerful and flexible language. This CLP language allows a stepwise compilation of tree automata guided by appropriate programs and dependent on an initial goal. The paper is organized as follows. In the first section we review the results on MSO logic and tree automata briefly discussing advantages and disadvantages of the formalism. The next section is centered on constraint logic programming. We apply the Höhfeld and Smolka scheme to derive a relational extension, discuss its properties and present some examples. In the last section, we argue for the usefulness of the approach with examples from P&P-based parsing.
2 Trees, tree automata, and MSO logic 2.1 Trees and tree automata For convenience, we present short definitions of (binary) trees and tree automata here. A proper introduction to tree automata can be found in G6cseg and Steinby (1997). We actually give two definitions here, suitable for different aspects of the following discussions. Assume an alphabet I = Σο U Σ 2 with Σο = {λ} and i the arity of the elements in Σ^. The set of binary trees Tj; over Σ is the set of terms built by letting λ denote the empty tree and by letting σ ( ΐ ι , t2), σ Ε Σ2, t 1 ( t 2 Ε Tj: denote the tree with label σ and subtrees t i , Ϊ2- This algebraic perspective facilitates the automata theory. In the model theory underlying MSO logics it is more convenient to assume a tree t to be a function from the addresses of a binary tree domain T2 to labels Σ. Intuitively a tree domain is a way to canonically label the nodes in an infinite tree. More formally, a tree domain Τ is a subset of strings over a linearly ordered set which is closed under prefix and left sister. T2 results from an alphabet of strings over, for example, {0,1}. A deterministic (bottom-up) tree automaton 21 on binary trees is a tuple (Α, Σ, do, F, a ) with A the (finite) set of states, Σ as defined above, Qo € A the initial state, F C A the final states and α : (Α χ Α χ Σ 2 ) —> A the transition function. We omit the transitions on Σο here and in the following. Instead, we just designate the initial state. Formally we would need
MSO logic, tree automata, and CLP
43
α = ({αο,αι},{Α,Β},αο,{αο},α) α(αο,αο, A) α ( α ι , α ο , A) α(αο,αο,Β) αίατ,αο,Β)
= = =
α0 αϊ αϊ αϊ
α ( α ο ( α ι , Α) α ( α ι , α ι , Α) α(αο,αι,Β) α(αι,αι,Β)
= = = =
αϊ αϊ αΊ ατ
Figure 1. A simple tree automaton a transition α(λ) = αο· We can extend the transition function homomorphically to trees by inductively defining Η α (λ) = αο and Κ α ( σ ( ί ι , t2)) = a ( h a ( t i ) , h . a ( t 2 ) , σ ) , t i , t 2 € Τ ι , σ e Σ ι . An automaton 21 accepts a tree t e Te iff h. a (t) e F. The language recognized by 01 is denoted by Τ (21) = {t| h-a(t) € F}. Emptiness of the language Τ (21) is decidable by a fixpoint construction on the reachable states. Intuitively, the automaton creeps up the tree from the frontier to the root using the labels as symbols for the transitions and assigning a state to each subtree. Recognition of a given tree t is just trivially achieved by running an automaton on an input tree, i.e., computing K a ( t ) . As an example, the automaton in Figure 1 recognizes trees from an alphabet with Ί-2 = {A}. This admittedly very simple tree automaton recognizes all binary trees whose interior nodes are labeled with A by staying in the initial and, at the same time, final state αο· As soon as we encounter a node labeled B, we go into a sink state (αϊ). We will eliminate the transitions to the sink state in the remainder of the paper since they do not contribute knowledge on acceptable structures. Bottom-up tree automata are closed under complementation, conjunction, disjunction, projection and cylindrification of alphabets, determinization and minimization. Most of the constructions are adaptions from the corresponding ones on finite state automata. We will use the constructions during the compilation from formulas to automata to parallel connectives and quantifiers in MSO formulas. More details on both the specification and implementation of tree automata and their constructions can be found in Morawietz and Cornell (1997) and Klarlund (1998).
2.2 MSO logic on trees We choose MSO logic on trees since it is decidable, flexible and has a direct operational interpretation via its decidability proof. Both theorem proving
44 Frank Morawietz and inductive definitions are well defined. Furthermore, MSO logic has a descriptive complexity result: it characterizes exactly the trees whose yields are context-free languages. An MSO logical language is like predicate logic extended with variables ranging over sets and quantifiers ranging over these MSO variables. More specifically, it has a syntax of both first and (monadic) second-order quantification, all the usual logical connectives, first-order variables ranging over nodes and a (countably infinite) set of monadic second-order variables ranging over finite sets. Terms and formulas are built as usual. The techniques we are using come originally from an old result in logic, namely that the weak MSO theory of two successor functions (WS2S) is decidable (Thatcher and Wright 1968, Doner 1970). A "weak" second-order theory is one in which the set variables are allowed to range only over finite sets. There is a more powerful result available: it has been shown by Rabin (1969) that the strong second-order theory (variables range over infinite sets) of even countably many successor functions is decidable. However, in our linguistic applications we need only to quantify over finite sets, so the weaker theory is enough, and the techniques correspondingly simpler. In fact, since we are interested in using the technique of the decidability proof for natural language processing and the proof works by showing a correspondence between formulas in the language of WS2S and tree automata and there does not exist an efficient minimization algorithm for the corresponding class of Rabin automata on infinite sets, using strong SnS is not an option open to us. All of these are generalizations to trees of a result on strings originally due to Biichi (1960). The applications we mention here could be adapted to strings with finite-state automata replacing tree automata. Informally, we create a tree description logic by fixing the domain of the interpretation to "trees" and adding binary relations to the syntax which will be interpreted as the successor functions. So, for the structure of WS2S, we are going to assume a tree domain with the extension of (at least) the two successor relations. These correspond intuitively to the relations of left and right daughter and are used to navigate through the tree. And the structure can be extended with interpretations of other definable relations we may want to use. We will call this basic structure of WS2S N2. Definition 1 The structure of WS2S (N2) is a tuple (T2, ε, So, Si) such that 72 is a binary tree domain with root ε and so, si the left and right successor relations respectively. We will overload the term WS2S to mean the structure of two successor functions as well as its MSO language.
MSO logic, tree automata, and CLP 45 Intuitively, MSO predicates, i.e., monadic second-order variables, pick out sets of nodes. We can think of the predicates as features labeling the nodes. A tree, then, is just a rooted, dominance connected subset Τ of the domain of N2. A labeled tree is a k + 1 -tuple (T, F i , . . . , F^) of the tree Τ and k features. Therefore, MSO formulas with the underlying interpretation on Ή2 are constraints on trees. And a grammar in this setting becomes just the specification of a k + 1 -ary relation picking out the well-formed trees. Formally, each MSO formula represents a constraint on the valuation of its free variables which is determined by the assignment of the variables to (sets of) nodes. Definition 2 Let Τ be a tree domain and VAR a set of (MSO) variables. A variable assignment is a total function α : VAR —> p(T). We call the set of all those mappings ASS. Obviously, satisfaction is relative to these assignments. We will write satisfaction as N2 (= φ [a] for φ an MSO formula, α a variable assignment. Since these assignments are such that they map variables to nodes in a tree, i.e., the assignments together with the domain of interpretation form a (labeled) tree, we will also speak of assignment trees. Since the proofs of the decidability results are inductive on the structure of MSO formulas, we can choose our particular tree description language rather freely, knowing (a) that the resulting logic will be decidable and (b) that the translation to automata will go through as long as the atomic formulas of the language represent relations which can be translated (by hand if necessary) to tree automata recognizing the "right" assignments to their free variables. We will see how this is done in the next section. Note that it requires further proof that these languages have the full power of WS2S, though. Therefore, the use of the decidability result is not fixed to a particular area of natural language formalisms. For example, Ayari et al. (1998) have investigated the usefulness of these techniques in dealing with record-like feature trees which unfold feature structures; there the attributes of an attributevalue term are translated to distinct successor functions. On the other hand, Rogers (To appear) has developed a language rich in long-distance relations (dominance and precedence) which is more appropriate for work in G/B theory. Compact automata can be easily constructed to represent dominance and precedence relations. One can imagine other possibilities as well: as we will see in Section 2.3, the automaton for Kayne-style asymmetric, precedencerestricted c-command (Kayne 1994) is also very compact, and makes a suitable primitive for a description language along the lines developed by Frank and Vijay-Shanker (1998).
46 Frank Morawietz 2.2.1 An example language:
p
In this paper, we draw our examples from tree description logics used in the P&P paradigm. In particular p , the logic proposed in Rogers (To appear), will serve as our main source. Note that p has been demonstrated to offer concise and well founded formalizations of concepts involved in P&P approaches. In fact, Rogers encodes in his monograph most of the proposals made in Relativized Minimality by Rizzi (1990). Although Rogers has shown that ρ i s inter-translatable with SnS and therefore not limited to either binary or finite trees, we use it only in the weak sense over finite binary trees. The language of ρ designed to express relationships between nodes in trees representing linguistic structures. There are local relations on nodes such as the immediate dominance relation as well as nonlocal ones such as the reflexive closure of immediate dominance, simply called dominance. Various other theory independent relations for reasoning about relations between nodes can be defined in WS2S and added, e.g., proper precedence to express ordering information. We parameterize the language with both individual and predicate constants. Example 3 Let p be defined by a set Κ of countably many individual constants; a set Ρ of countably many predicate constants; a countably infinite set X = Xo U Xi of first-order and monadic second-order variables; Λ, V, - logical connectives; V, 3 - quantifiers; ( , ) , [ , ] - brackets; «, ) = αϊ α ( α ο , α 4 , ( - χ , - V ) ) = θ4 = α2 α α α ( 1 , 0, ("'Χ, υ ) ) = CL2 α - ( α 2 ι α ο , ( ^ χ , - Μ » = α2 = α4 α ( α 4 , α ο , ( - χ , -ν)) = θ4 all other transitions are to as Figure 3. The automaton for AC-Com(x, y) with free variables X. We can safely ignore individual variables since they can be encoded as properly constrained set variables with the formula for Sing given in (5). For each relation r(X) from the signature of the MSO logic, we have to define a tree automaton 2Γ recognizing the proper assignment trees. Then we can inductively build a tree automaton 21* for each MSO formula φ(Χ), such that (8)·
μψ[τ] < ^ ΐ € Τ ( 2 1 φ )
where τ assigns sets of nodes to the variables X and t is the corresponding assignment tree. The automaton compactly represents the (potentially infinite) number of valuations, i.e., the solutions to constraints. Consider as an example the tree automaton corresponding to Kayne's asymmetric c-command relation from (6), see Figure 3. For readability, we denote the alphabet of node labels as tuples indicating for each free variable whether a node is assigned to it or not. In this case we have only free variables χ and y, so the alphabet consists of the tuples (x, y), (-oc.y), (x, ^ y ) and (- , x, - , tj). On closer examination of the transitions, we note that we just percolate the initial state as long as we find nodes which are neither χ nor y. From the initial state on both the left and the right subtree we can either go to the state denoting "found χ " (a.3) if we read symbol (x, ^ y ) or to the state denoting "found y " (αϊ) if we read symbol ( _ , x,y). After finding a dominating node while being in αϊ - which switches the state to a 2 - we can percolate a 2 as long as the other branch does not immediately dominate x. If we come into the situation that we have 03 on the left subtree and a 2 on the right one, we go to the final state 04 which again can be percolated as long as empty
50
Frank Morawietz
symbols are read. Clearly, the automaton recognizes all trees which have the desired c-command relation between the two nodes. But in addition to the arbitrary number of intermediate nodes, we can also have an arbitrary number of (-oc.^y) labeled nodes dominating the constellation or being dominated by it such that we recognize the closure of the (AC-Com) relation in the sense of Doner. The relation can now serve as a new primitive in further MS Ο formulas. To keep the automata as small as possible, we employ the suitably adapted minimization of finite state automata. The minimization procedure also gives us an alternative way to decide emptiness since the canonical empty automaton has only one state which is initial but not final. The non-elementary complexity is caused by alternation between quantifier blocks. Quantification results in nondeterministic automata, but negation needs deterministic ones as input. So for each universal quantifier followed by an existential one, we need the subset construction which has an exponential worst case complexity: the number of states can grow according to a function with a stack of exponents whose height is determined by the number of V—3 quantifier alternations.
2.4 Automata-based theorem proving Automated theorem proving is the standard approach to processing with constraint-based grammars. The decidability proof gives us a procedure to do theorem proving since the compilation produces an automaton recognizing the empty language iff the formula is unsatisfiable. Practically this means that a formula is satisfiable if the resulting automaton has a non-empty set of final states and valid if the resulting automaton has exactly one state which is initial and final such that all assignments are recognized. So, theorem proving and compilation are in fact one and the same process. An obvious goal for the use of the discussed approach is the (offline) generation of a tree automaton representing an entire grammar. Then parsing would be just proving that the grammar implies the input. Unfortunately, as mentioned above, the problem to generate the tree automata from arbitrary MSO formulas is highly exponential because of the necessary determinization. And even worse: the alphabet space is an additional source for combinatorial explosion. Features are free variables and therefore alphabet symbols for the automata. Since there is no a priori limit on any combination not occurring, we have to represent all their permutations. In extreme cases, this dominates
MSO logic, tree automata, and CLP
51
even the state space. As an example, consider a naive specification of the lexicon: (9)
Lexicon(x) Ä
(Sees(x) A V(x) Λ 3rd(x) A . . . ) V (Jan(x) A N(x) A Masc(x) A . . . ) V (Lena(x) A N(x) A Fem(x) A . . . )
We have defined a relation called Lexicon via a disjunctive specification of lexical labels, realized as for example the second-order variable Sees, and the appropriate combination of features, e.g., V. The way we presented it, Lexicon looks like a monadic predicate with just one argument. But considering the automaton to be generated from the formula, one has to know that the alphabet of the automata has to contain every free variable to encode the right assignments. Therefore all the free variables contained in the formula are implicitly present: Lexicon is a relation. Remember that since constants are free variables to the compiler, using them does not change the argument. And since every feature and every lexical entry is represented with a new free set variable, the alphabet size will be 2 n + m , η the number of lexical entries and m the number of features. Nevertheless we will continue to leave these "global" variables implicit. For tests, we used the tool MONA (Klarlund and M0ller 1998). The major improvement of MONA against our own prototype (Morawietz and Cornell 1997) is the use of binary decision diagrams (BDDs) for the representation of large alphabets. The graph structure of BDDs allows compression of the transitions by removing redundant tests and nodes. But the amount of compression which can be achieved is very much dependent on ordering the free variables and finding an optimal ordering is NP-hard. An introduction to BDDs can, for example, be found in Andersen (1996). It is still an open problem whether the generation of an entire grammar can be handled. The question is not efficiency of the compilation, but rather feasibility since the size of an automaton representing a grammar is still indeterminate. And it is unlikely that even the use of BDDs helps sufficiently with the representation of a realistic lexicon. However, the tests have shown that we can compile modules and by this gain insight into parts of the formalization (e.g., see Morawietz and Cornell (To appear) for a report on the compilation of Rogers' entire X-Bar theory). As an example consider the c-command relation between categories instead
52 Frank Morawietz CP
NPi
C
Who
C
AC
L
did 0
IP
NP
Lena fc
I VP
Figure 4. A syntax tree from Rogers (To appear)
of nodes following the presentation of Rizzi's ideas from Relativized Minimality in Rogers' monograph. This is a nontrivial part of the theory, but turns out to have a compact automaton. Informally, a category is a node which has been split into segments through adjunction. Categories are used in P&P approaches since for example in a WH-question the auxiliary is supposed to c-command its trace. Following Rizzi, this is explained by the fact that the node is adjoined to its head and therefore not dominated by it. In Figure 4 one can see that Ij is indeed not dominated by the category formed by the two nodes labeled C. Ij ccommands its trace t j since all categories which dominate it, also dominate tv The definition of categories can be found in (10) and (11). In the formulas, we assume some additional predicates: Feq ensures agreement on some set of relevant features; Path can be understood as a dominance-connected ordered subgraph, and A d j is the feature marking an adjoined node. (10) Component(X)
Path(X) Λ (Vx,y)[(x € X A t ) € X)
Feq(x,-y)]A
(Vx, x')(3\/)(Vz)[(x e X A x ' e X A x < x ' ) 4 (x' £ Adj A x < x j A y 96 x ' A y G Adj A (χ < ζ =£> (ζ « χ ' V ζ wy)))] So, a component is a set of nodes which (a) form a path in the tree; (b) agree
MSO logic, tree automata, and CLP
53
on a set of particular features; and (c) have exactly one node in every local tree which is the result of an adjunction. A category is then just a maximal component. (11)
Category(X) Ä
Component(X) Λ (VY)[(XC Υ Λ Υ % X ) · - C o m p o n e n t ( Y ) ]
Note that a number of additional constraints on the distribution of adjuncts is necessary to achieve the "right" behavior. Building on the given definition, we can define primitives of domination, precedence and c-command in terms of categories rather than nodes. We can relativize all of them to nodes, e.g. C a t e g o r y ( x , y ) means that there exists a category such that χ and y are both members of it, thereby avoiding the third-order notions. As might be suspected considering (10) and (11), the complete presentation of the formulas encoding category based c-command is fairly complex although c-command itself is just a lifted version of what we saw in (6) using the new category-based primitives for domination instead of the ones from the signature - and therefore beyond the scope of this paper. The reader is encouraged to check the full formalization in Rogers' monograph. But actually, after existentially binding the global variables, the resulting automaton is small. It has only 5 states and 16 transitions, see Figure 5. Since c-command on categories is a complicated relation, we cannot give a detailed explanation of the automaton. One can see from the discussions in this section that the approach of pure theorem proving, i.e., using only the automata as the operational interpretation of our formalism, is computationally not without problems. But we can define new primitives and work with them since we can represent MSO constraints, i.e., knowledge about variable bindings on trees, efficiently with tree automata. In the following sections we are going to use constraint logic programming to reintroduce a notion of derivation into the purely modeltheoretic presentation. By this we provide a way to use the previously compiled parts or primitives of formalizations separately under the control of programs and gain the desired extension of the generative capacity.
3 Constraint logic programming Logic programming (LP) reflects nicely the separation of the actual logic of a program and the control of its execution (Lloyd 1984). Programs in the LP
54
Frank Morawietz 21 = (Α,Σ,αο,Ρ, α) Α = {α0)αι,α2,α3,α4} F = {a3} α(αο,αο,( = α0 y ) ) = α. α(αο,αο»( χ, α ( α ο , α ι , ( - • ^ . - Ί Ι » = αϊ α(αο,αι,( G is valid in every model of S. The reduction rule we defined is actually sound, i.e., if we reduce a goal we are at least as specific about the answer as we were before. Theorem 29 (H&S prop. 5.1) Let S be a definite clause specification and C J goals. IfG F, then {T]A C \G\A for every model A of 2>. Lemma 30 (Soundness) Let § be a definite clause specification, G a goal τ * and φ an MSG-constraint. If G —>§ y φ, then φ is an S-answer. Proof, φ is an S-answer just in case (ASS — [φ]· 4 ) U IG]^ = ASS for all models A of S. This is true by Theorem 29 since [φ]· 4 C IG}A holds. • Given a finite set of variables V as above, the notion of a solution is extended accordingly, i.e., [φ]γ = {a|V I α Ε [φ] Λ }; a f y being the restriction of a to the variables occurring in V. We state in the following corollary that reduction is also complete. Intuitively, if there exists a solution to a goal, then we can find a constraint representing it via goal reduction.
MSO logic, tree automata, and CLP 63 Corollary 31 (Completeness, H&S cor. 5.3) Let G be a goal, A an atom in G, Λα minimal model ofS and α 6 [G]]y. Then there exists an §-answer φ of G such that G —^»gfv Φ and
α
€ Ιφίγ.
3.2.4 Properties o/^(MSO) In this section we investigate which properties of MSO are preserved by the extension to CR(MSO). Höhfeld and Smolka establish the fact that if MSO is closed under renaming, so is iR(MSO). Closure under intersection is covered by definition. But what about generative capacity and decidability? Unfortunately, we have to answer the last point in the negative. We can indeed define undecidable problems. Consider the Post Correspondence Problem (PCP) (Post 1946). A Post correspondence system over an alphabet Σ is a finite nonempty set Ρ = {(Ιά,τ·,.) | i = 1 . . . m & l i . t t G Σ*}. A solution for such a system Ρ is a sequence of indices 1 < i i , . . . , i n < m such that It, · · · Um = r ii " ' ' r im· Κ i s undecidable whether there exists a solution for a given system (if the alphabet has more than two symbols). To be able to facilitate the presentation of the encoding into IR(MSO) we sugar our notation. Instead of using the constraint part to deal with identity of sets (unification) and the decomposition of linearly ordered sets or better lists, we use the " · " to encode list concatenation ([X · Y] for Hd(X, Ζ) ΛΤΙ(Ζ, Y)) directly in the atom and not in an extra constraint in the body. Furthermore we assume that all our variables are constrained to be lists and that there exists an appropriate extension of our List and append examples in (21) to (27) and in Figure 6 to lists of lists. The encoding adapts the one from Hanschke and Würtz (1993). Words are represented as lists, e.g., αια2 ( a , l ) ( a , 0 ) | ( a , 0 ) ( a 1 l ) | b ( a , 0 ) | ( a l 0 ) b (a,0) -» (a,0)(a,0)
b -» (a,0)(a,0)
It can be shown that the extension of the set {a, b} cannot be avoided in this example. More generally, one can consider the class ECFG(£) of extended contextfree grammars where regular expressions rather than finite sequences over A are used as the right side of productions (cf. Thatcher 1967). The interpretation of the regular expression in an extended contextfree production yields a derived set of contextfree productions which will be infinite in general. As before the grammars in ECFG(£) define classes of attributed trees with labels in A. Moreover, in this case the degree of branching in the class of parse trees may be unbounded as the one-b-example below will show. Once again there is a model preserving translation tτ 2 from ECFG(£) to PDL(A, £ ) . In this case the translation is based on the formula below which states that the sequence of values of the children of a given node with value α is an instance of the regular expression τ: α => (X)(first A (x(r))iast) The modal operator t ( r ) associated with the regular expression r which tests whether the value sequence of the children of a node is an instance of r is built up inductively from tests ?l for the symbols in L To describe the one-b trees without any bound on the branching the symbol
Principle languages and principle-based parsing
97
α has to be split as above. Then the finite one-b trees are determined by the following set of extended productions: ( α , Ι ) -> (α,0)*(α, 1 ) ( a , 0 ) * | ( a , 0 ) * b ( a , 0 ) * ( a , 0 ) -> (a,0)*
b -> (a,0)*
The same splitting can be used to describe the even-depth trees. The symbol (α,Ο) which is a start—and a terminal symbol describes nodes at an even distance from the root. Then the even depth trees are determined by the following set of productions: (α,Ο)
(α,ΙΓ
( α , Ι ) -> (α,Ο)*
It is well known that the regular operators in an extended contextfree production can be simulated with contextfree productions involving additional symbols. However, the tree structure is changed by this simulation. Hence the word languages defined by extended contextfree grammars are still contextfree. However, the class of one-b trees shows that CFG(C) and ECFG(£) are weakly equivalent but not strongly equivalent. The principle languages and their translations discussed above are collected in the proposition below:
Proposition 1 FO(A,£)
η(χ))) i=1
where pi(x) is a negation-free structural constraint and Yi(x) is conjunction of atomic value constraints. To get a more compact representation one considers the type structure L. It is an extension of £ obtained by introducing negations of values in £ and special values for value equations and inequations as in Schneider (1993). With the help of the type structure L the value of a node can be computed incrementally. Starting from the empty information Τ one can take care of a value constraint by unifying it with the actual value.
5.2 Parsing in the strongly local case Having described the internal grammar we have to deal with the parsing operations. During the parsing process a parser must deal with incomplete parse trees, the objects to which the parsing operations are applied. Since the parsing process need not be deterministic it has to handle finite sets of incomplete parse trees which represent the set of options to be considered. They can be described as disjunctions of conjunctions of atomic constraints i.e. finite sets of constraints. The structural part of such a constraint determines a quasi tree in the sense of Rogers and Vijay-Shanker (cf. Rogers and Vijay-Shanker 1992, Vijay-Shanker 1992). For the general case we have nothing worthwhile to offer if we stick to our requirements for a principle-based parser. However, in the strongly local case the situation is much better. A principle φ is called strongly local if in its normal-form Qx(A™i (Pi(*) ßi(x))) in the sense of (2) the positive structural constraints Pt(x) contain only the immediate structural relations in Δ. Since any set of positive local structural constraints on a sequence of nodes may be interpreted as a set of trees the parser can work with a set of trees in this situation. As outlined above, we consider parser operations for the universal grammar which do not presuppose any information from the internal grammar. Thus
Principle languages and principle-based parsing
107
they are general operations operating on attributed trees. But the applicability of a parser operation in a particular situation is controlled by the input sentence and the internal grammar. In the strongly local case the state of the parsing process is given by a stack of trees and the part of the input sentence still to be read. Here we shall discuss a set up with three basic parser operations read, insert and expand which use at most two trees at the top of the stack. The question of how many trees below the top tree can take part in decisions of the parser is a linguistic question which needs to be decided on linguistic evidence. The read operation creates a one-node tree from the next symbol (=word) of the input sentence and pushes this tree on top of the stack. The insert operation is a structure integrating operation which takes care of the restriction principles. It integrates the top tree of the stack into the second top-most tree of the stack and, after removing both trees from the stack, places the new tree on the top of the stack. Thus Jns(t2, t ] , u, r)) inserts the tree t j into the tree Xz at the position u in Χχ i" the direction r. Its applicability is controlled by the internal grammar by means of the restriction grammar relation for r. The expand operation is a structure generating operation which takes care of the generation principles. It creates a new node and adds it to the top node of the stack. Thus expand[ti, x, u, r) expands the tree ti at the node u by a new node with label χ in the direction r. Again its applicability is controlled by the internal grammar by means of the licensing grammar relation for r. In our approach the state of a parsing process is described by configurations which are pairs of the form (s, w) where s is a stack of trees and w is the part of the input still to be read. The parser proceeds by transitions between the configurations. A transition from the configuration (s,w) to the configuration ( s ' , w ' ) is a quadruple of the form (s, x, op, s'), where χ € Σ U {e} is an input symbol and op € {read, insert, expand} is a parser operation such that s ' is the result of applying op to s and w = w'x. Clearly, we must have x = e whenever op φ read. A computation for the input sentence w is a sequence of configurations where successor configurations are obtained by transitions and which starts with the initial configuration (so,w), where so is the empty stack. A computation is successful if the last configuration is a stop configuration with a stack consisting of one tree, a parse tree of w, the result of the computation. To obtain a concrete parser generator we have to fix a parsing strategy. A parsing strategy regulates the choice of transitions which can be used by the parser in a particular configuration. If we adhere to our general concept it follows that the parsing strategy works at the level of the universal grammar.
108
Hugo Volger
Therefore the definition of a strategy should not depend on the particular internal grammar with which it is used. However, the strategy may require the test of the applicability of a certain parser operation. By this route the internal grammar interacts with the strategy. Using this approach we obtain a parsing schema which yields a parser generator after fixing a parsing strategy. Moreover, in our approach we can address the question of correctness and completeness of a given parser with respect to a parsing strategy. A computation is called admissible for an internal grammar if all transitions of the computation are admissible i.e. the parser operation used satisfies the applicability constraints of the internal grammar. Thus we can study the question whether an admissible computation yields only such parse trees which satisfy the given set of principles and whether the existing parse trees can be obtained. Using the approach described so far Schneider has implemented a parser for an interesting fragment of the German grammar (cf. Schneider 1993). Obviously, he resorted to ad-hoc solutions in dealing with some nonlocal principles. The parser turned out to be quite efficient although no special tuning for efficiency was used. In addition, he has obtained a correctness result (cf. Schneider 1995b). Restricting the principles to a special class of strictly local V3-formulas satisfying a set of consistency properties he has proved a correctness result which is independent of the parsing strategy considered.
6 Summary For the study of principle-based grammar formalisms like the G/B-theory we have advocated the use of dedicated principle languages with an efficient principle compiler. As an example we have presented our first order language C l a t for attributed trees. In the context of a survey of principle languages we have discussed two expressivity results of A.Palm for Clat (cf. Palm 1999 in this volume). The second result which uses stack values and yields linear index languages gets us for the first time beyond contextfree languages. In addition, we have suggested that a principle-based parser should be a parser generator for the universal grammar which is determined by a set of universal parsing operations and a parsing strategy. Following ideas of K.M.Schneider we have shown how to separate the structural information in the principles from the value information. The latter is collected into the internal grammar i.e. a set of node-free relations over attribute values and contains the grammatical knowledge of the particular grammar. In the case of
Principle languages and principle-based parsing
109
local universal-existential principles K.-M.Schneider has implemented such a parser which turned out to be quite efficient.
Notes 1 Actually the original version of this language in Feldmeyer (1991) contained also some definable operations like conditional functional uncertainties. 2 As a local fragment one obtains the fairly weak modal logic ΡΜ.Ι_(Δ, £,) with a translation sto from PM.L(A, -C) to the local first order logic FO(A, £). 3 A candidate for the question mark in the table is the class ECFGi (-C) of 1extended contextfree grammars where the right side of a production must be •-free expression over L. There is a translation into FO(A + ,£). However, we have not succeeded in defining a direct translation into PDLi (Δ, £).
References Abney, S. and J. Cole (1985): A government-binding parser. In: Proceedings ofNELS 15 Aho, A. (1968): Indexed grammars - an extension of contextfree grammars. Journal of the Association for Computing Machinery 15:647-671 Backofen, R., J. Rogers, and K. Vijay-Shanker (1995): A first-order axiomization of the theory of finite trees. Journal of Logic, Language and Information 4:5-39 Barton, G., Jr. (1984): Toward a Principle-based Parser. Cambridge, Mass.: ΜΓΓ Press Blackburn, P., W. Meyer-Viol, and M. de Rijke (1995): A proof system for finite trees. Tech. rep., Universität des Saarlandes Blackburn, P. and E. Spaan (1993): A modal perspective on the computational complexity of attribute value grammar. Journal of Logic, Language and Information 2:129-169 Büchi, J. (1960): Weak second-order arithmetic and finite automata. Zeitschrift fur mathematische Logik und Grundlagen der Mathematik 6:66-92 Carpenter, Β. (1992): The Logic of Typed Feature Structures; with Applications to Unification Grammars and Constrained Resolution, vol. 32 of Cambridge Tracts in Theoretical Computer Science. Cambridge University Press Chomsky, N. (1981): Lectures on government and binding. Dordrecht: Foris
110 HugoVolger Chomsky, Ν. (1986): Knowledge of language. New York: Praeger Doner, J. (1970): Tree acceptors and some of their applications. Journal of Computer and System Sciences 4:406-451 Ebbinghaus, H.-D. and J. Flum (1995): Finite Model Theory. Berlin: Springer Fanselow, G. and S. Felix (1987): Sprachtheorie, eine Einführung in die generative Grammatik, Band 1: Grundlagen und Zielsetzungen. Tübingen: Francke Feldmeyer, M. (1991): CLAT: Eine Sprache zur Beschreibung von attribuierten Bäumen durch globale Constraints. Master's thesis, Universität Passau, Fakultät für Mathematik und Informatik Felix, S. (1990): The structure of functional categories. Linguistische Berichte 125:46-71 Fong, S. (1991a): The computational implementation of principle-based parsers. In: S. Abney and C. Tenney (eds.), Principle-based parsing: Computation and psycholinguistics, Dordrecht: Kluwer Fong, S. (1991b): Computational properties of principle-based parsers. Ph.D. thesis, MIT, Dept. of Electrical Engineering and Computer Science Gazdar, G. (1985): Applicability of indexed grammars to natural language. Tech. Rep. CSLI-85-34, Center for the Study of Language and Information, Stanford Gazdar, G., E. Klein, G. Pullum, and I. Sag (1985): Generalized Phrase Structure Grammar. Oxford: Blackwell Kozen, D. (1983): Results on the propositional μ-calculus. Theoretical Computer Science 27:333-354 Kracht, Μ. (1993): Mathematical aspects of command relations. In: Proceedings of 6th Conference of the EACL, pp. 40-249 Kracht, Μ. (1995): Syntactic codes and grammar refinement. Journal of Logic, Language and Information 41-60:5-39 Palm, A. (1992): Erweiterung der Sprache CLAT für attribuierte Bäume. Master's thesis, Universität Passau, Fakultät für Mathematik und Informatik Palm, A. (1997): Transforming tree constraints into rules of grammars. DISKI173, Infix Verlag Palm, A. (1999): The expressivity of tree languages for syntactic structure. This volume, pp. 113-152 Pollard, C. and I. Sag (1994): Head-Driven Phrase Structure Grammar. Chicago: Chicago University Press Pratt, V. (1981): A decidable μ-calculus, preliminary report. In: Proceedings
Principle languages and principle-based parsing 111 22nd IEEE Symposium on Foundations of Computer Science, pp. 421—427 Rabin, M. (1969): Decidability of second-order theories and automata on infinite trees. Transactions of the American Mathematical Society 141:1— 35 Rogers, J. (1994): Studies in the logic of trees with applications to grammar formalisms. Ph.D. thesis, University of Delaware Rogers, J. (1996): On descriptive complexity, language complexity, and gb. In: R Blackburn and M. de Rijke (eds.), Specifying Syntactic Structures, CSLI Publications Rogers, J. (1999): The descriptive complexity of generalized local sets. This Volume, pp. 21-40 Rogers, J. and K. Vijay-Shanker (1992): Reasoning with description of trees. In: Proceedings of 25th Annual meeting of the ACL, pp. 2-80 Schneider, K.-M. (1993): Entwurf und Implementierung eines GB-Parsers für ein Fragment des Deutschen. Master's thesis, Universität Passau, Fakultät für Mathematik und Informatik Schneider, K.-M. (1995a): An attribute-value vector formalism in a principlebased parser. Tech. rep., Universität Passau Schneider, K.-M. (1995b): Description of a parser model in a many-sorted first-order language. Tech. rep., Universität Passau Shieber, S. (1986): An Introduction to Unification-based Approaches to Grammar, vol. 4 of CSLI Lecture Notes. Stanford: Center for the Study of Language and Information Thatcher, J. (1967): Characterizing derivation trees of contextfree grammars through a generalization of finite automata theory. Journal of Computer and System Sciences 1:317-322 Thatcher, J. and J. Wright (1968): Generalized finite automata theory with an application to a decision problem of second order logic. Mathematical Systems Theory 2:57-81 Vijay-Shanker, K. (1992): Using descriptions of trees in a tree adjoining grammar. Journal of Computational Linguistics 18:481-517 Vijay-Shanker, K. and D. Weir (1994): The equivalence of four extensions of context-free grammars. Mathematical Systems Theory 27:511-546
The expressivity of tree languages for syntactic structures Adi Palm
1 Introduction During the last decades, several different formalisms in linguistics and computational linguistics have emerged, varying from context-free, or even regular, grammars to constraint-based and principle-based grammar formalisms. All are concerned with the same issue of describing and constraining syntactic structures of natural languages. They differ, however, in their manner to specify syntactic structures. The context-free grammars focus on the detailed construction of labeled trees. In detail, they employ a set of rules to describe the set of trees being constructed by these rules. On the other hand, a principle-based formalism emphasizes partial descriptions of trees by employing well-formedness conditions. Thus, a principle-based specification focuses on a particular property of a tree while stating less about the construction of a tree. In contrast, a rule-based description, e.g. a context-free grammar, provides detailed information concerning the explicit construction. The notion of principles, however, allows us to concentrate on particular aspects of the structure without constantly needing to keep the other properties in mind. Hence, we can utilize principles to formulate the universal properties of natural language. This declarative notion of principles has served as the formal foundation of Chomsky's Theory of Government and Binding (G/B) (Chomsky 1981, 1986). The G/B-principles state certain structural relations between tree nodes having certain properties. In addition, we should mention some constraint-based grammar formalisms, e.g. LFG (Kaplan and Bresnan 1982), GPSG (Gazdar et al. 1985) and HPSG (Pollard and Sag 1987, 1994). They employ a similar notion of 'constraints' to specify syntactic structures. However, these constraints apply to context-free rules, cf. Shieber (1986, 1992),
114
AdiPalm
rather than to certain structural relations between tree nodes. Nevertheless, such constraints constitute a particular type of local principles. Although the principle-based approaches offer a number of formal benefits they lack a detailed description of the trees specified. Since a principle emphasizes only a certain aspect of a tree, e.g. case assignment, binding, etc., it leaves open the properties of the nodes which are not regarded by this principle. Consequently, a principle-based specification of a set of trees provides only partial information on the shape of the trees. For lack of a complete description we do not exactly know the set of trees being defined by the principles. However, these sets of possible trees denote the expressivity of a principle-based approach. In essence, we formulate the expressivity of a formalism specifying sets of trees by means of formal grammars and their derivation trees, since a rule-based approach provides total descriptions of a set of trees rather than a partial description as provided by principles. In other words, the expressivity of a tree language states its strong generative power. In this article we discuss a series of tree languages, i.e. languages to specify sets of trees. We establish a stepwise transformation from classical logic into rule-based formal grammars. We start by discussing the Constraint Language for Attributed Trees (CLAT(X)) which is formally based on first-order logic. The node attributes used here are the simple labels of the finite label domain L. Beside the Boolean operators and the quantifiers, the language CLATOC) employs two binary structural relations. The dominance relation holds between a node and every other node below it while the precedence relation states that a node stands somewhere left of another one. In addition, each label I of the finite label domain L constitutes a unary label predicate. Those structural relations and the label predicates are sufficient to specify most of the structural relations for linguistic purposes, especially within the G/B framework. To establish the expressivity of Clat(X) and, hence, of G/B-principles formulated in Clat(X), we deal with some intermediate formalisms. First we give up the notion of structural relations and replace them with paths. A path describes the route from one node called the source node to another one called the target node. The formal foundation of this path formalism is the propositional dynamic logic, cf. Harel (1984). Hence, PDLj (£) denotes the propositional dynamic logic for labeled ordered trees. After establishing the relationship between C l a t ( X ) and P D L j (£) we turn to a restricted version called the 'single-step-path' language P M L j ( £ , n ) based on propositional modal logic. The crucial difference to PDLj (£) is the absence of long-distance paths, so a P M L j ( £ , n ) formula only states local constraints
The expressivity of tree languages 115 on tree nodes. We compensate this reduction of expressivity by making use of the auxiliary label domain {0,1} n . In a particular representation, the local constraints expressed by P M L t ( £ , ti) formulae specify types of tree nodes and possible structural relations among them. Such formulae correspond to disjunctive or regular rational trees which provide a graph-based definition of certain sets of trees. Moreover, these types of rational trees coincide with a slightly extended version of context-free grammars. Consequently, we achieve a transformation from CLAT(X) principles into (extended) context-free grammars. In addition, we briefly discuss an extension of our approach that also deals with an infinite label domain and some certain kinds of context-sensitive grammar formalisms.
2 A first-order language for principles The first tree language we consider is based on first-order logic (see e.g. Partee et al. 1990). This formalism serves to formulate principles such as the ones used within G/B. A principle is a logical formula constraining a tree structure with respect to some specific aspects. Therefore, a corresponding formalism dealing with G/B principles must be based on the structural properties of trees. The underlying notion of trees regards them as a particular kind of graph with certain properties. Some similar approaches to specify G/B-principles have been provided by Stabler (1992) and Rogers (1995). Stabler has utilized the horn clause fragment of first-order logic to specify principle-based logic programming. Like our formalism, Rogers' monadic second order language L^ p focuses on the description of trees rather than to establish a principle-based parsing method. In contrast to Rogers, we employ first-order logic which is properly weaker than (monadic) second-order logic. Nevertheless, first-order logic performs sufficient generative power to specify G/B principles. Basically, we employ labeled ordered trees to represent syntactic structures. In essence, an ordered tree consists of a set of tree nodes Κ whose elements are ordered by two binary relations, the dominance and the precedence relation. The properties of these relations mainly characterize the typical shape of an ordered tree. Consider the example in Figure 1: A node, say rit, dominates another node, say rij, if rti stands above rij. Every tree has a unique node, called the root, that dominates all other nodes and that is not dominated by any other one. If a node n i immediately dominates the node rij, we say rii is the parent of rij and rij is a child of rij.. Two nodes rij. and rij are
116
AdiPalm
n-2
TI3 n
4
ng
ng
n9
TT-5
ni0
mi
Π12
Figure 1. A tree
siblings if they have the same parent. Similarly, a node rt^ precedes a node rtj if tit stands left of η>. For a precedence relation between rii and rtj it is not necessary that they are siblings, e.g. n.2 of the tree above precedes all other nodes except no and rii. Obviously, if rit dominates rtj or vice versa then neither rit precedes rij nor rij precedes Ut. If the nodes rit and rij are siblings and Tit immediately precedes rij then n^ is the right-hand sibling of rij and, conversely, rij is the left-hand sibling of rit- The immediate neighbors of a node are its children, its parent, its left-hand and right-hand sibling. For instance, the immediate neighbors of the node T13 are the parent Πι, the left-hand sibling ri2, the right-hand sibling τΐβ and the children π.4 and n.5. In addition, a labeled ordered tree includes a labeling function α that associates every tree node with a certain label of the label domain L . For a first-order formalism, it is sufficient only to employ the dominance and the precedence relation, since we can represent each of the other structural relations mentioned above by a corresponding first-order formula. Accordingly, we introduce CLAT(X), the Constraint Language for Attribute Trees. In detail, CLAT(£) employs only a single attribute of a tree node which is a label I e Definition 1 CLAT(X) is a first-order language on a set of tree nodes Κ that includes the following predicate symbols: I GL δ C Χ x Ν π C Κ χ Ν
denotes a monadic label predicate denotes the irreflexive dominance relation denotes the irreflexive precedence relation
A CLAT(X)-principle is a first-order formula using only these predicates. Beside the mentioned dominance and precedence relation CLAT(X) includes the node equivalence ~ since we assume that the dominance and the precedence
The expressivity of tree languages 117 relation are irreflexive, i.e. a node does not dominate or precede itself. Moreover, the language CLAT(X) includes neither constant symbols for tree nodes nor function symbols. The only way to make statements about nodes is to use variables with the corresponding quantifiers. For instance, if we want to state that every node χ with the label Ii € JL must precede one of its siblings •y with the label I2 £ we can write the following formula: Vx[l,(x)
3y [ 7 i ( x , u ) A l 2 f e ) A V z [ 5 ( z , x )
b(z,y)}]}
To restrict the precedence relation to siblings, i.e. nodes with a common parent node, we use the additional condition Υζ[δ(ζ, χ) δ(ζ,xj)] which states that κ and y must have the same nodes ζ dominating them. Since we often make use of this restricted version of the precedence relation, we define a special structural relation called the sibling precedence: 7ts(x,y)
:=
Tt(x.y) AVz[6(z,x)
6(z,y)]
The index s indicates the restriction to siblings. Note that the sibling precedence does not mean a real extension of CLAT(X). It is rather a short-cut of a particular CLAT(X) formula. In the same manner we may define some other frequently used 'auxiliary' structural relations as, for instance, the reflexive dominance relation δ°, the immediate dominance relation δ 1 and the immediate sibling precedence relation πς. 6°(x,y) δ^χ,ΐ)
:= := :=
6(x,y)Vx«y δ(χ,υ) AVz[6(z,-y) => (δ(ζ,χ) V ( z « χ))] 7i s (x,y) A V z [ n s ( z , y ) (π 5 (ζ,χ) V ( z « x ) ) ]
Although CLAT(-C) has a well-defined syntax and semantics provided by the properties of first-order languages, we can never be sure that the models of CLAT(X) formulae are ordered trees. Therefore, we assume an axiomatization of ordered trees such that the models of these axioms exactly mean the set of labeled ordered trees. Such a first-order axiomatization has been already established by Partee et al. (1990), Backofen et al. (1995) and Palm (1997). In essence, these axioms must assert that the dominance and precedence relation are strict partial orders, i.e. irreflexive, asymmetric and transitive relations. Moreover, a tree must include a single root and every node except the root must be dominated by exactly one other node. The sibling precedence must state a linear order among the children of a node. In addition, we assume that the trees considered are only finite. Note that the finiteness cannot be expressed by means of first-order logic, but we can establish an axiom stating that every satisfiable formula corresponds to at least one finite tree, cf. Backofen et al. (1995).
118
AdiPalm
Every satisfiable CLAT(X) formula that is compatible with the tree axioms specifies a non-empty set of (finite) trees. Therefore, the tree axioms correspond to the set of all ordered (finite) trees. Actually, we consider a tree as a CLAT(IO) structure, i.e. a domain to interpret a CLAT(£,) formula. In essence, CLATOC) structures must provide an interpretation for the node variables, the label predicates, the dominance relation and the precedence relation. Accordingly, we define a labeled ordered tree as a CLAT(-C) structure that satisfies the tree axioms. In addition, we claim that an ordered tree is finite. Definition
2
A CLAT(£) structure t
=
(DSF1, Δ 1 , π11, Α 1 ) denotes an
labeled ordered tree iff t satisfies the tree axioms and N* is finite. The set represents the nodes of the tree t, δ* C Ν 1 χ N 1 is the corresponding dominance relation and, analogously, π 4 C Κ 1 χ N 1 is the precedence relation. In addition, a tree includes the function α 1 : —> £ that associates each tree node with a label I € «C. Note that every tree t provides a particular interpretation of the nodes, the structural relations and the label predicates occurring in a CLAT(£) formula. The superscript t shall indicate this. By employing the language CLAT(£) we can formalize some structural notions and principles of G/B. The structural foundation of the trees described in G/B states the 'x-bar scheme', see e.g. Chomsky (1981). Basically, the x-bar scheme states that a leaf node uniquely projects its category upwards. The uniqueness asserts that the projection describes a line that does not meet any other projection line. In addition, the x-bar scheme assigns a 'level of projection' to each tree node. The level head denotes the underlying leaf of a projection line, the level bar denotes an intermediate node of the projection line and the level maximal denotes the unique ending point of a projection line. In general, we obtain structures of the shape exemplified in Figure 2. This tree describes the typical structure of an English main clause. For a category X, we mark the maximal projection by XP, an intermediate node by X, and the head node by Xo- In contrast to the standard definition of the xbar scheme, we propose a particular version that does not distinguish several levels of a projection. We assume only a predicate m a x indicating whether a node is the maximal element of projection. In essence, the x-bar scheme asserts the unique projection of category, i.e. every non-leaf node uniquely dominates the head of its projection. We employ the predicate m a x that indicates the end of a projection line to establish the unique projection. Actually, the non-maximal nodes mark the projection line. Hence, we achieve a unique projection line if every node has at most one non-maximal child or, alternatively, if every non-maximal node has only maximal siblings. Moreover, the x-bar scheme demands that a projection line includes only elements of the
The expressivity
of tree languages
119
IP NP
I
Ν
h ^ V P
I N0
1 V
1 Ν I N0 Figure 2.
An x-bar tree
same category. Finally, the root node must be maximal. According to these requirements, we employ some partial constraints specifying these properties. The monadic predicate u n i q u e (x) asserts the uniqueness of the projection by stating that a non-maximal node has at most only maximal siblings. For the projection line, we employ the predicate p r o j i j x ) . It states that a maximal node χ of the category catk dominates some leaf y such that y and the nodes ζ between χ and y are not maximal and they have the category cat^. The index ^ states that we actually require a version of p r o j ^ x ) for every category cat^ e.g. Ν, V, A, P, etc. Together with the predicate u n i q u e ( x ) the predicate p r o j k asserts a unique line of projection. Finally, the predicate m a x r o o t states that the root must be maximal. Sometimes it is assumed that the x-bar scheme restricts the trees to binary branching. Therefore, we define the predicate b i n a r y which states that at least the left-hand or the right-hand sibling is absent. unique(x)
:=
- m a x ( x ) =*· ~^3y [(π 5 (y, χ) V n s (x, y)) Λ - m a x ( i ) ) ]
projijx)
:=
(max(x) Λ c a t ^ x ) ) =£· 3y[6(x,y) A - 3 z [6(y,z)] A c a t k ( y ) A - m a x ( y ) Λ Vz [(δ(χ, ζ) Λ 6(z,y)) =*> (cat k (z) A - m a x ( z ) ) ] ]
maxroot(x)
:=
- , 3y [6(y, x)] => max(x)
b i n a r y (x)
:=
( - 3 y [n s (x, y)]) V ( - 3 y [n s (y, x)])
Altogether, the x-bar scheme asserts that every node χ must satisfy u n i q u e ( x ) , proj k (x), maxroot(x) and b i n a r y (x):
120 AdiPalm x-bar
:=
Vx [unique(x) Aprojic(x) A maxroot(x) Λ b i n a r y (χ)]
Within the principles of G/B we often employ certain structural relations, the command relations and the government relation for instance. The c-command relation between two nodes x and y states that either χ precedes y or y precedes χ and every node ζ that dominates χ must also dominate y. Similarly, the m-command considers pairs of nodes χ and y where either x precedes y or y precedes χ and every maximal node ζ that dominates χ must also dominate y. In a simplified version, the government relation denotes a restricted version of the m-command between χ and y where, additionally, every maximal node ζ dominating y also dominates x: c-comm{x,y)
:= (n(x,y) V n ( y , x ) ) AVz[6(z,x)
6(z,y)]
m-c0mm(x,y) := (n(x,y) V7i(y,x)) AVz[(max(z) Λ δ ( ζ , χ ) ) govern(x,y)
6(z,y)]
m-comm[x,y) A Vz [(max(z) A 6(z,y)) => δ(ζ,χ)]
Likewise, we can establish all structural notions occurring within G/Bprinciples. However, CLAT(-C) fails, in general, to capture the notion of binding since the finite label domain prevents us from using an arbitrary number of indices to represent references and co-indexation. We discuss this problem in Section 6. Nevertheless, in some cases we can infer the co-indexation from other (structural) relation, see e.g. Rogers (1995), Kracht (1995), so we need not specify it explicitly. Therefore, CLAT(-C) captures most principles of the G/B-framework.
3 Propositional dynamic logic for trees The basic idea of the formalism presented here is to replace the notion of structural relations with paths. Simply put, a path describes the route from a tree node to another one within the tree. In contrast to the structural relations employed in CLAT(£)> paths are directed. While we can use a structural relation in both directions, a path cannot 'look' back to the node where it has started from. Consequently, the notion of paths seems to be more restricted than the first-order formalism CLAT. However, we will be able to establish the contrast. The basic advantages of specifying principles in a path-based framework are the more compact representation and some built-in properties of trees. Actually, a path corresponds to a sequence of tree nodes that are linked by certain structural relations. For instance, we consider the path from a node no to a node n^ via the nodes n i , ri2, . . . , nic_i. Moreover, we assume the
The expressivity of tree languages
121
structural relation r^ between the nodes n ^ i and n t for 1 < i < k. The result is a path description of the following shape: „rio —> τι. π_ ι —> ri τΐ2 —>... Tt r k ηΐς —> In a more abstract view we ignore the intermediate nodes rii to n ^ - i and focus only on the structural relations. Hence, we achieve a more simple description no — n i e where the relation r = r j ο . . . ο r^ denotes the concatenation of the structural relations ri to Thus, r describes the path from no to n ^ . Consequently, we might regard a path corresponding to a structural relation r as a function r p a t K mapping the source node no to the target node "Π-ΐς ~ r p Q t H(no). However, the notion of function is too strong since it states the unique existence of the node η^. Actually, it is quite sufficient that some node njc exists which can be reached from no by the structural relation r, i.e. Bnjc [r(no, n^)]. Therefore, we may consider a path represented by r as a kind of restricted existential quantifier where the structural relation r restricts the domain of quantification. After these basic considerations, we turn to the kind of paths to employ within our approach. First of all, we assume that the relations ti to r^ used previously are only the dominance relation, the sibling precedence relation and their corresponding converse relations. Thus, we achieve the four path directions 'up', 'down', 'left', and 'right'. Nevertheless, the structural relations of CLAT(L) constitute unbounded, long-distance relations. In the path-based formalism considered now we employ simple atomic paths. In addition, we provide a method to establish more complex paths by composing simpler ones. Therefore, we introduce the atomic paths X, —>, Τ and n 2) RMv^Jini,^)
= = = = =
Rt(H(n2,n1)=nJt(ni,n2) 3 n e Ν* [ R t ( p ) ( n i , n ) Λ R t ( q ) ( n , r i 2 ) ] Rt(p)(nlln2) V Rt(q)(m,n2) (R^PiJ'ini.nz) (rii « n 2 ) Λ t , η (= φ
We use a labeled ordered tree t and a node η € X 1 to interpret a P D L j ( £ ) formula. The node η only satisfies a label statement I if η has the corresponding label I, i.e. a t ( n ) = I. For interpreting a path statement (p)cp we employ the structural relation R l (p) corresponding to the path p . The path statement considered is only true if there is a target node n p such that the relation R t ( p ) applies to η and n p and, in addition, the target node rip satisfies the target condition φ . Moreover, we interpret the Boolean operators like in first-order logic so we do not discuss them further. The second part of the interpretation regards the transformation of paths into a corresponding binary relation. In detail, atomic paths are straightforwardly interpreted by the corresponding immediate structural relations. The concatenation and disjunction of paths agrees with the corresponding operations on binary relations. However, the Kleene-star operator and the test are more crucial. For a program p* we employ the transitive closure of the relation R t (p). Finally, the test states a restricted version of the node equivalence which is only true if the node considered satisfies the test condition φ . Since P D L j (C) is unable to distinguish different tree nodes we interpret P D L j (£) formulae with respect to a particular tree node u . Nevertheless, there is a notion of interpreting P D L j (X) formulae that does not require to employ a node n . We consider rather all nodes of the trees. Consequently, we assume the following interpretation of P D L j (£) formula φ with respect to a tree t = δ*, α*): tbcf)
iff
Vn € N* [t,n, |= φ]
124
AdiPalm
Obviously, the general method of interpreting P D L j ( £ ) formulae agrees with the universally quantified principles used in G/B. We mainly employ P D L j ( £ ) as a path-based, intermediate representation of principles on labeled ordered trees. When interpreting statements including a path, we have already seen the basic method to transform P D L j ( £ ) formulae into the firstorder formalism CLAT(X). For comparing expressions of both formalisms we assume the existence of a standard translation s t x that translates a P D L j (—{ m a x - { < - ) T V -)T
m a x r o o t ' := binary'—
When we formulate the statement u n i q u e ' , we already assume the binary branching. So we only consider the immediate siblings of a node. Thus, the P D L j (£,)-version of the x-bar scheme constitutes the following formula: x-bar'
:=
u n i q u e Ά p r o j £ Λ maxroot' Λ binary'
Obviously, we can more or less straightforwardly transform the C L A T ( £ ) principles specified previously into P D L T (£). However, the general method is more complex, cf. Palm (1997), since it must deal with arbitrarily used quantifiers. The principles employed here strongly correspond to the normal form that separates the target condition and the intermediate condition for each structural relation. The basic contrast between CLAT(X) and P D L J ( £ ) means the different notion of structural relation and paths. A structural relation between two nodes only states an optional information since we do not
The expressivity of tree languages 127 need to specify it explicitly. For introducing another node, we must only employ a quantifier. In P D L j (£) we can only relate nodes by specifying an explicit path from one node to another one. So P D L j (£) provides a more explicit description of the tree structures specified. This clarity signifies an important step to establish the expressivity by means of formal grammars.
4 Propositional monadic logic for trees After transforming structural relations into paths, we reduce the paths employed in such a way that only atomic paths occur. Consequently, we must provide an alternative formulation for each compositional path operator, i.e. the concatenation, the disjunction, the test and the Kleene star, to achieve atomic paths only. In detail, we utilize a particular version of the propositional modal logic (PML) (see e.g. Bull and Segerberg 1984) to represent these 'short-path constraints'. Actually, the propositional modal logic states a simplified version of PDL where only atomic programs occur in front of a statement. In general, PDL does not include operations on programs but rather every kind of program states a certain modality or modal operator. For our particular purposes, we additionally assume that a statement preceded by a path includes no further paths. So we can only formulate constraints on a node and its immediate neighbors, i.e. its parent, its children, its left-hand sibling and its right-hand sibling, but not any other node. This particular version of PML is called the propositional monadic logic for trees ( P M L j (£, n)) with the extended label domain ϋ χ {0,1} n . An important difference to P D L j (£) means the additional component of auxiliary labels in {0,1 } n . We must introduce this auxiliary label component to express the former long-distance paths by means of local constraints. The auxiliary label component {0,1}n constitutes the domain of n-dimensional binary vectors, where the dimension η depends on the underlying P D L j (£) formula. If we do not require any auxiliary labels, we set η = 0. Otherwise, the label of a tree node consists of two parts, i.e. a label I € L of the former label domain and a binary vector χ € {0,1}η of the auxiliary label domain. Each component of the binary vector serves to encode a former long-distance relation. Therefore, we normally have access to a certain component rather than to the whole binary vector. So we include particular statements in the language P M L j (£, n) to have access to the value of a certain component. Besides this modification, the syntax of P M L j (£, n) states a proper sublanguage of P D L j (£) without complex paths.
128
AdiPalm
Definition 5 L denotes a finite set of labels and {0,1} n denotes the set of auxiliary labels, i.e. binary vectors with η components. The set CPo = {T ,!,*—,—>} states the set of paths and the set S = S ( £ , n ) states the set of P M L j (L, n) statements which is defined as follows: Boolean constants label statement auxiliary statement path statement Boolean combinations
Τ,Ι ί § I GS for every label I G -C E,i G § for every 1 < i < η (Po)
5,
Underspecification
Ai Φδ
in Tree Description
Grammars
183
Ai ψ
Αι Α2
Αι Λ αϊ Α*
Λ
An
α:
An
Figure 10.
T D G for {α^ · · · α * }
ψ α for α € Τ
Φ$ S,
S,
Si
Si
(η times) Figure 11.
T D G for { w n ; w G Τ*}
L n are not TALs. This can be shown with the pumping lemma for TALs proven by Vijay-Shanker (1987). Furthermore, together with the closure of TALs with respect to intersection with regular languages, the pumping lemma can also be used to show that for η > 3, LJ>opy are not TALs. These two examples show that there are TDLs that are no TALs. At a first glance, one might even suspect that TALs are a subset of TDLs. However, it is not at all obvious how to construct an equivalent TDG for
184 Laura Kallmeyer a given TAG. If nodes allowing adjunction are simply replaced by strong dominances (e.g. k] *2),p( a , where ρ represents pair formation in the derived alphabet. This example has to be taken with a grain of salt because in the parallel lambda calculus both the redex and its converted form are supposed to have the same meaning whereas in the context of formal language theory the simple fact that these expressions are different complex symbols embodies two competing proposals for syntactic analysis. In the last section of the paper we show how to solve the tension between the definitions for linguistic concepts in terms of structural configurations of the original trees and their counterparts in the lifted context. While the definitions that are informed by the original set-up are extensionally inadequate in the general case—they fail to refer to the context-sensitive dependencies— their lifted counterparts, living in an environment of first-order substitution, can be combined with adequate characterizations of those context-sensitive structures. This combination is made possible by the closure of call-by-value tree languages under deterministic bottom-up tree transducer mappings (see Engelfriet and Schmidt 1978). We have not attempted to present a defense of macro grammars or of a theory of structural notions embodied in this particular format. A detailed discussion of the (de-)merits of macrogrammatical analyses of a range of syntactic problems is contained in H.-P. Kolb's contribution to the present volume. It is worth emphasizing that our application of tree theory to contextsensitive structures is not intended as a justification for a particular form of syntactic analysis. This task remains to be done and we would be delighted if others investigated the structural restrictions that characterize a program of (derived) syntactic macrotheory. There are several sources that have influenced the ideas reported here. Apart
200
Uwe Mönnich
from the work on the logical characterization of language classes that was mentioned above the development in universal algebra that led to a uniform, signature free treatment of varieties has been our main inspiration. From the very beginning of this development it has been obvious to people working in this field that closed sets of operations are best presented as categories with finite products. When this presentation is retranslated into the language of universal algebra we are confronted with a signature whose only non-constant operators are symbols for target tupling and functional composition. Algebras with signatures of this type will play a major role in the paper and they will provide the technical foundation for extending the logical methods of descriptive complexity theory to context-sensitive phenomena. The first to see the potential for tree language theory of this type of lifted signature was Maibaum (1974). He showed in particular how to map contextfree into regular tree productions rules. Unfortunately, a substantial part of his results are wrong because he mistakenly assumed that for so-called call-byname derivations an unrestricted derivation in a lifted signature would leave the generated language unchanged. Engelfriet and Schmidt (1977, 1978) point out this mistake and give fixed-point characterizations for both call-byname and call-by-value context-free production systems. We hope that the present paper complements the denotational semantics for call-by-name tree languages by giving an operational analysis of the derivation process both on the level of the original and on the level of the lifted signature.
2 Preliminaries The purpose of this section is to fix notations and to present definitions for the basic notions related to universal algebra. The key notion is that of a derived algebra. As was indicated above, the realization that derived algebras allow for a different-presentation based on a lifted signature constitutes the main step in the process of restoring context-freeness. In this context, we will be involved in a generalization of formal language concepts. Many-sorted or heterogeneous algebras provide the proper formal environment to express this generalization. For expository purposes we will not give every definition in its most general form but keep to the single-sorted case where the reader can easily construct the extension to a situation where more than one sort matter. Definition 1 Let S be a set of sorts (categories). A many-sorted alphabet Σ is an indexed family (I W ) S | w e S*,s G S) of disjoint sets. A symbol in
I W ) S is called an operator of type (w, s), arity w, sort s and rank l(w). If
On cloning context-freeness 201 w = ε then f € I £ ) S is called a constant of sort s. l(w) denotes the length ofw. Note that a ranked alphabet in the traditional terminology can be identified with an S-sorted alphabet where S = {s}. The set I s n s is then the same as Σ η . In the way of explanation, let us point out that each symbol f in Z w s represents an operation taking η arguments, the ith argument being of sort •Wi, and yielding an element of sort s, where w = wi • · · w n . Alternative designation for many-sorted alphabets are many-sorted signatures or manysorted operator domains. We list some familiar examples of single-sorted signatures for further reference. Example 2 a) I 0 = { e } U V Σ2={~} Single-sorted signature of semi-groups, extended by a finite set of constants V. b) Σ 0 = { ε } I i = {a | α € V} Single-sorted signature of a monadic algebra. c) I 2 = {A,V} Single-sorted signature of lattices. As was mentioned above, a full description of the theory of a class of algebras of the same similarity type is given by the totality of the derived operations. These operations can be indicated by suitably constructed terms over the basic operators and a set of variables. Definition 3 For a many-sorted alphabet Σ, we denote by Τ (Σ) the family (Τ(Σ, s) | s € S) of trees of sort s over Σ. Τ(Σ, s) is inductively defined as follows: (i) For each sort s G S Itl,CT(I,s) (ii) For η > 1 and s G S, w G S*, if f € ti G T(Z,Wi),l(w) = n ,
and for 1 < i < n,
f(ti,...,tn) €T(I,s) Definition 4 For a many-sorted alphabet Σ and a family of disjoint sets Y = (Y s |s G S), the family Τ(Σ,Υ) is defined to be Τ(Σ(Υ)), where Σ(Υ) is the many-sorted alphabet with Σ(Υ) ε)5 = Σ ε ι 5 U Ys and for w φ ε, Σ(Υ)νν,5 = £w,s We call a subset £ of Τ(Σ, s) a tree language over Σ (of sort s).
202
Uwe Mönnich
Having described the syntax of the tree terms and having indicated their intended interpretation, it remains to specify the central notion of an algebra and to give a precise definition of the way in which the formal term symbols induce an operation on an algebra. Definition 5 Suppose that S is a set of sorts and that Σ is a many-sorted alphabet. A Σ-algebra 21 consists of an S-indexed family of sets A = (As)se$ and for each operator σ e Σ ^ , of a function σ A s where A w = Aw< χ · · · χ A W n and w = w , · · · w n . The family A is called the sorted carrier of the algebra 21 and is sometimes written |2l|. Different algebras, defined over the same operator domain, are related to each other if there exists a mapping between their carriers that is compatible with the basic structural operations. Definition 6 A Σ-homomorphism of Σ-algebras h. : 21 —> 05 is an indexed family of functions h s : A s —> B s , (s € S) such that for every operator σ of type (w, s) h j a a i c i ! , . . . , a n ) ) = a A s called a derived operation. In the place of Y we have used the set of sorted variables Y w := {yi )Wi 11 < i < l(w)}. The meaning of the derived operation t a is defined as follows: for ( α ϊ , . . . , α η ) € A w t a ( a i , . . . , a n ) = a(t) where Ö : Χ(Σ, Yw) —^ 21 is the unique homomorphism with ) — Qi. In intuitive terms, the evaluation of a Σ(Υ)-ίτεε t in a given Σ-algebra proceeds as follows. First, one assigns a value € A to every variable in Y. Then the operations of 21 are applied to these elements of A as directed by the structure of t. There is, though, another description of the action performed by the derived operations on an algebra 21. According to this conception the
On cloning context-freeness 203 derived operations of an algebra 21 are the mappings one gets from the projections ya by iterated composition with the primitive operations σ«* (σ € Σ). Given any Σ-algebra 21 we can describe the process of determining the set of derived operations within the framework of another algebra that is based on a different signature. In this new signature the symbols of the original alphabet Σ are now treated as constants, as are the projections. The only operations of non-zero arity are the composition functions and the functions of target tupling. By composition of operations is meant the construction of an operation h. of type (w, s) from given operations f of type (v, s) and gt of type (w, Vi) where (w, s), (v, s), ("νν,νι) e S* χ S. The constructed operation h. satisfies the rule H(a) = f ( g i ( a ) , . . . , g k ( a ) ) where k = |v| and α £ A w . If the operations of target tupling are among the basic operations that are denoted by the symbols of the new signature, the type of composition operations can be simplified to ({v, s), (w,v)), (w, s). Take again the operations gi with their types as just indicated. By their target tupling is meant the construction of an operation h. of type (w, v) that satisfies the rule h(a) = ( g i ( a ) , . . . , g k ( a ) ) where again α e A w and the outer parentheses on the right-hand side indicate the ordered k-tuple of values. Having introduced composition and target tupling and having indicated their intended interpretation the only missing ingredient that remains for us to introduce, before we can define the concept behind the title of the paper, is the collection of projection operations. The projection operations on a Cartesian product A w are the trivial operations π ^ satisfying π£.(α) = a W l A closed set of operations or, more briefly, a clone on a family of non-void sets A = (A s ) (s € S) is a set of operations on A that contains the projection operations and is closed under all compositions. The relevance of clones for the purposes of understanding the hidden structure of an algebra and of providing a signature free treatment for universal algebra was first realized by P. Hall. An alternative proposal under the name of algebraic theories and their algebras is due to J. B6nabou and W. Lawvere. We have chosen to follow the example of J. Gallier and of J. Engelfriet and Ε. M. Schmidt in using the style of presentation familiar from standard universal algebra. Our only departure
204 Uwe Mönnich from this tradition is the explicit inclusion of the operation of target tupling into an official definition of the clone of term operations. Definition 7 The clone of term operations of an Σ-algebra 21, denoted by Clo(A), is the smallest set of operations on the carrier A = (A s ) that contains the primitive operations of 21, the projections and is closed under composition and target tupling. The set of all operations of type (w, s) in Clo(A) is denoted by CloW)s
+
where w G S ' , v e S and s € S. Each 7tV is a projection operator of type (ε, (ν, V i ) ) , each ( ) w , v is a tupling operator and each S v , w , s is a substitution or composition operator of types ((w,vi) · · · (w,v n ), (w, v)), respectively ((v,s), (w,v), (w, s)) and each σ in I w , s becomes a constant operator in the derived alphabet ϋ ( Σ ) of type (ε, {w, s)). Example 9 We have seen above that trees form the sorted carrier of the Σ-algebra Τ(Σ, Y). What is of fundamental importance for the further development is the fact that trees with variables can also be seen as a D^)-algebra. The tree substitution algebra DX(Y) is a D^)-algebra whose carrier of sort (w, s) is the set of trees Τ(Σ, Y w ) s , i-e· the set of trees of sort s that may contain variables in Yw. In order to alleviate our notation we will denote this carrier by T(w, s). Carriers of sort (w, v) are v-tuples of carriers of sort (w, Vi) and are denoted by T(w, v). Each σ in Σ ^ is interpreted as the tree eac n L , w p · • · ι V n , w n )> h X i s interpreted as y i Vi , each ( ) W ) V ( t i nJ is interpreted as the formation of v-tuples ( t i , . . . , t n ) , where each ti is an element of T(w,Vi) and l(v) = k, and each S V ) w,s is interpreted as a composition or substitution of trees. An intuitive description of the composite S v,w t sj, I ( Z , Y ) (t,t') with t 6 T(v, s) and t ' = ( ^ , . . . , 1 * ) € T(w,v) is this: the composite is the term of type (w, s) that is obtained from t by substituting the term ti of sort Vi for the variable t)iiVi in t. The formal definition of the
On cloning context-freeness
205
composition operation relies on the unique homomorphism V : 1 ( Σ , Y v ) —> Y w ) that extends the function t ' : Y v —> T(w, v^) mapping yi )V . to ti. in T(w,Vi). Then for any t € T(v,s), we define S v ,w,s 0 i ( £ Y ) ( t , t ' ) the value o f t ' on the term t: S
v . w . . o t ( r . V ) (t, t ' ) : = i ' ( t ) = t [ t i , . . . , tn]
where the last term indicates the result of substituting ti for v)tiVi. Since the derived alphabet D ( I ) leads to a tree algebra X(D(Z)) in the same way that the alphabet Σ led to the algebra 1 ( 1 ) , there is a unique homomorphism β : T ( D ( I ) ) —> © 1 ( 1 ) . It was pointed out by Gallier (1984) that this homomorphism is very similar to β-conversion in the λ-calculus. The explicit specification of its action repeats in a concise form the description of the tree substitution algebra: ßw,s(f)
=
β>ν,,(πΤ)
=
m,wt
tk)
=
(ti
ßw.sfSv.w.sit.t'))
=
ßv,»(t)[ßw,v,(tl)
ßw,s(( )w,v(ti
tf(yi>W)(...,yniWn)
for σ € E W ) S if Wi = s
tk)
for ti € T("W, Vi) ßw,v k (t k )]
fort e T ( v , s ) , t i € T(w,Vi) and t ' = ( t i , . . . ,tk). Example 10 Suppose that the set of sorts S is a singleton and that Σ contains three symbols f, α and b where L s s > s = {f} and L c > s = {a, b} and s is the single sort in S. As is customary in the context of single-sorted alphabets we shall write the type (s n , s) as n. According to this notational convention the following figure displays a tree t in Τ ( ϋ ( Σ ) ) : $2,0 f
( )θ,2
Λ
Applying ßo to this tree it returns as a value the tree ßo(t) = f ( a , b ) in Τ(Σ). Displayed in tree form this last term looks as follows:
Λ
The unique homomorphism β from the derived tree algebra into the tree substitution algebra has a right inverse LIFT 1 = (LIFT* S )((w,s) € S* χ S),
206
Uwe Mönnich
where LIFT* s maps a I-tree in T ( I ( Y w ) , s ) into a D(I)-tree in T ( D ( I ) , (ε, (w, s))). Since we will have no occasion to apply the function LIFT to trees over a many-sorted alphabet we content ourselves with giving its recursive definitions for the single-sorted case: Definition 11 Suppose that I is a single-sorted or ranked alphabet. For k > 0, LIFT£ : T ( I , X i c ) —> T ( D ( I ) , (e,k>) is the function defined recursively as follows (where X^ = {xi 11 < i < k}): LIFT{(*i)
=
if
LIFTk(a)
=
So,k,i(ff) for σ 6 Σο
LIFT^(a(ti,... ,tn))
=
Sn,k)1(M
)k>n(LIFTi(t,),...,LIFT^t*))
for σ e Σ η . It should be obvious that for any tree t in Τ ( Σ , X]J ßk(LIFT£(t))=t. The reader will have noticed that the tupling operator was conspicuous by its absence in the recursive definition of the LIFT-function. According to our official definition for the carrier of the tree substitution algebra a term like a ( t i , . . . , t n ) is the result of composing the η-tuple ( t ] , . . . , t n ) of kary terms ti with the n-ary term σ ( χ ι , . . . , x n ) . This part of the "structural history" is suppressed in the third clause of the LIFT-specification. We shall adhere to this policy of eliminating this one layer in the configurational set-up of tree terms and we shall extend this policy to the level of explicit trees over the derived alphabet Ό ( Σ ) . This does not mean that we revoke the official, pedantic definition of the symbol set in a derived alphabet, but we shall make our strategy of notational alleviation type consistent by reading the type of each substitution operator SV)w,s as ((v, s ) ( w , v i ) · · · (w, v n ) , (w, s)) instead of ( ( v , s ) ( w , v ) , (w,s)>.
3
Context-free tree languages
The correspondence between trees in explicit form, displaying composition and projection labels, and their converted images as elements of a tree substitution algebra is an example of a situation which is characterized by a meaning preserving relationship between two algebras 21 and 55. Of particular interest to formal language theory is the situation where problems in an algebra 25 can be lifted to the tree level, solved there and, taking advantage of
On cloning context-freeness 207 the fact that trees can be regarded as denoting elements in an arbitrary algebra, projected back into their original habitat 55. This transfer of problems to the symbolic level would, of course, produce only notational variants as long as the lifted environment constitutes just an isomorphic copy of the domain in relation to which the problems were first formulated. One might suspect that β-conversion and its right inverse are a case of this type. Despite their suspicious similarity, trees in explicit form and their cousins which are the results of performing the operations according to the instructions suggested by the composition and projection symbols, are sufficiently different to make the transfer of problems to the explicit variants a worthwhile exercise. In intuitive terms, the difference between the two tree algebras is related to the difference between a first-order and a second-order substitution process in production systems. Let us view grammars as a mechanism in which local transformations on trees can be performed in a precise way. The central ingredient of a grammar is a finite set of productions, where each production is a pair of trees. Such a set of productions determines a binary relation on trees such that two trees t and t ' stand in that relation if t ' is the result of removing in t an occurrence of a first component in a production pair and replacing it by the second component of the same pair. The simplest type of such a replacement is defined by a production that specifies the substitution of a single-node tree to by another tree t]. Two trees t and t ' satisfy the relation determined by this simple production if the tree t ' differs from the tree t in having a subtree ti that is rooted at an occurrence of a leaf node to in t. In slightly different terminology, productions of this kind incorporate instructions to rewrite auxiliary variables as a complex symbol that, autonomously, stands for an element of a tree algebra. As long as the carrier of a tree algebra is made of constant tree terms the process whereby miliary variables are replaced by trees is analogous to what happens in string languages when a nonterminal auxiliary symbol is rewritten as a string of terminal and non-terminal symbols, independently of the context in which it occurs. The situation changes dramatically if the carrier of the algebra is made of symbolic counterparts of derived operations and the variables in production rules range over such second-level entities. As we have seen in the preceding sections, the tree substitution algebra provides an example for an algebra with this structure. The following example illustrates the gain in generative power to be expected from production systems determining relations among trees that derive from second-order substitution of operators rather than constants. Example 12 Let V be a finite vocabulary. It gives rise to a monadic signature Σ if all the members of V are assigned rank one and a new symbol ε
208
Uwe Mönnich
is added as the single constant of rank zero. For concreteness, let us assume that we are dealing with a vocabulary V that contains the symbols α and b as its only members. Trees over the associated monadic signature Σ = Σο U Σι where Σο = {ε} and Σι = { a , b } are arbitrary sequences of applications of the operators α and b to the constant ε. It is well known, as was pointed out above, that there is a unique homomorphism from these trees, considered as the carrier of a Σ-algebra to any other algebra of the same similarity type. In particular, there is a homomorphism into V* when α and b are interpreted as left-concatenation with the symbol α and b, respectively, and when ε is interpreted as the constant string of length zero. This homomorphism establishes a bijection between Τ(Σ) and V* (cf. Maibaum 1974). When combined with this bijective correspondence the following regular grammar generates the set of all finite strings over V. G = (I,5- | S I P> Σ0 Jo
= =
{ε} Σ, = { a , b } {S} 3"n = 0 for η > 1 Ρ = {S —> ε | a(S) |b(S)} .£(G,S)=I?
where we have identified Τ(Σ) with Σ*, Ρ stands for the finite set of productions and S stands for the only non-terminal symbol. V gives also rise to a binary signature Σ' if the members of V are assigned rank zero and two new symbols are added, ε of rank zero and ^ of rank two. Trees over this signature are non-associative arc-links between the symbols α and b. When α and b are interpreted as constant strings of length one, ε as constant string of length zero and the arc ~ as (associative) concatenation V* becomes an Σ'-algebra. Note that the unique homomorphism from Χ(Σ') to V* is not a bijection this time. When combined with this homomorphism the following grammar generates the string language { a n b n } . G' = t " , and trees t i , . . . , t m € Τ ( Σ ) such that t
=
t 0 [F(t,
tm)]
t'
=
to[t"[t-|,..., t m ]]
t ' is obtained from t by replacing an occurrence of a subtree F ( t i , . . . , t m ) by the tree t " [ t i , . . . , t m ] . By the inside-out restriction on the derivation scheme it is required that the trees t i , through t n be terminal trees. Recall from the preceding section that for m , u > 0, t G Τ ( Σ , X m ) and t i , . . . , t m € Τ (Σ, X n ) t[ti t m ] denotes the result of substituting ti for Xi in t. Observe that t [ t ] , . . . , t m ] is in Τ ( Σ , X n ) . As is customary Definition 15 We call
denotes the transitive-reflexive closure of =φ·. Suppose G = (Σ, J, S, P) is a context-free tree grammar. jC(G,S)={t€T(I)|S4
t}
the context-free inside-out tree language generated by G from S. We reserve a special definition for the case where 7 contains only function symbols of rank zero. Definition 16 A regular tree grammar is a tuple G = (Σ, 7, S, P), where Σ is a finite ranked alphabet of terminals, IF is a finite alphabet of function or nonterminal symbols of rank zero, S € 7 is the start symbol and Ρ C
On cloning context-freeness 211 χ Τ (Σ U J ) is a finite set of productions. The regular tree language generated by G is £ = {t e T ( i ) | S 4 t} Note that in the case of regular grammars the analogy with the conventional string theory goes through. There is an equivalence of the unrestricted, the rightmost and the leftmost derivation modes where the terms 'rightmost' and 'leftmost' are to be understood with respect to the linear order of the leaves forming the frontier of a tree in a derivation step. Very early in the development of (regular) tree grammars it was realized that there exists a close relationship between the families of trees generated by tree grammars and the family of context-free string languages. This fundamental fact is best described by looking at it from the perspective on trees that views them as symbolic representations of values in arbitrary domains. Recall the unique homomorphism from the introductory example of this section that mapped non-associative concatenation terms into strings of their nullary constituents. This homomorphism is a particular case of a mapping than can easily be specified for an arbitrary signature. Definition 17 Suppose Σ is multi-sorted or ranked alphabet. We call yield or frontier the unique homomorphism y that interprets every operator in Σ*,^ or Σ η with l(w) = η as the n-ary operation of concatenation. More precisely y (σ) y(a(ti,...,tn))
= =
σ y(ti)...y(tn)
for σ e Σε>5 (or Σ 0 ) for σ e Σ ^ (orΣ η )and t i G T ( L ) W l (or Τ(Σ))
Fact A (string) language is context-free iff it is the yield of a regular tree language. As was shown in the introductory example, the addition of macro operator variables increases the generative power of context-free tree grammars over monadic alphabets considerably. The following example demonstrates that the addition of n-ary macro operator variables leads to a significant extension with respect to arbitrary ranked alphabets. The string language of the following context-free tree language is not context-free. Example 18 Let us consider a context-free tree grammar G = (Σ, S, P) such that its frontier is the set of all cross-dependencies between the symbols
212 Uwe Mönnich a,c and b, d, respectively. The grammar G consists of the components as shown below: Σ0 = ο =
Ρ= ^
{ε, a,b, c, d} Z2 = {S} J 4 = ff} S F(a, e,c, ε) | F(e, b, ε, d) | e, F(^(a,xi),x2»^(c,x3),x4)l F(xil~(blX2),X3,~(dlX4))|
-(-(-(X 1 ( X 2 ) ( X3),X4) JC(GtS) = r
r
r
r
(
Q
)
}
...)f-(bi...)
. . . ) , - (
C |
. . . ) . . . ) . . . ) ) }
The number of occurrences of a's and c's and of b's and d's, respectively, has to be the same. By taking the frontier of the tree terms, we get the language L·' = { a n b m c n d m } . The language of the preceding example illustrates a structure that can actually be shown to exist in natural language. Take the following sentences which we have taken from Shieber's (1985) paper: Example 19 (i) a) b) (ii) a) b)
Jan säit das mer em Hans es huus hälfed aastriiche. John said that we helped Hans (to) paint the house. Jan säit das mer d'chind em Hans es huus lönd hälfed aastriiche. John said that we let the children help Hans paint the house.
The NP's and the V's of which the NP's are objects occur in cross-serial order. D'chind is the object of lönd, em Hans is the object of hälfe, and es huus is the object of aastriiche. Furthermore the verbs mark their objects for case: hälfe requires dative case, while lönd and aastriiche require the accusative. It appears that there are no limits on the length of such constructions in grammatical sentences of Swiss German. This fact alone would not suffice to prove that Swiss German is not a context-free string language. It could still be the case that Swiss German in toto is context-free even though it subsumes an isolable context-sensitive fragment. Relying on the closure of context-free languages under intersection with regular languages Huybregts (1984) and Shieber (1985) were able to show that not only the fragment exhibiting the cross-dependencies but the whole of Swiss German has to be assumed as non context-free.
On cloning context-freeness 213 Shieber intersects Swiss German with the regular language given in Example 20 in (iiia) to obtain the result in (iv). As is well known, this language is not context-free. Example 20 (iii)
a) Jan säit das mer (d'chind)* (em Hans)* händ wele (loa)* (hälfe)* aastriiche. b) John said that we (the children)* (Hans)* the house wanted to (let)* (help)* paint. (iv) Jan säit das mer (d'chind)n (em Hans)™· händ wele (laa)n (hälfe)™· aastriiche. Swiss German is not an isolated case that one could try to sidestep and to classify as methodologically insignificant. During the last 15 years a core of structural phenomena has been found in genetically and typologically unrelated languages that leaves no alternative to reverting to grammatical formalisms whose generative power exceeds that of context-free grammars. It has to be admitted that the use of macro-like productions is not the only device that has been employed for the purpose of providing grammar formalisms with a controlled increase of generative capacity. Alternative systems that were developed for the same purpose are e.g. tree adjoining grammars, head grammars and linear indexed grammars. Although these systems make highly restrictive claims about natural language structure their predictive power is closely tied to the individual strategy they exploit to extend the context-free paradigm. The great advantage of the tree oriented formalism derives from its connection with descriptive complexity theory. Tree properties can be classified according to the complexity of logical formulas expressing them. This leads to the most perspicuous and fully grammar independent characterization of tree families by monadic second-order logic. Although this characterization encompasses only regular tree sets the lifting process of the preceding section allows us to simulate the effect of macro-like productions with regular rewrite rules. Again, the device of lifting an alphabet into its derived form is not without its alternatives in terms of which a regular tree set can be created that has as value the intended set of tree structures over the original alphabet. Our own reason for resting with the lifting process was the need to carry through the "regularizing" interpretation not only for the generated language, but also for the derivation steps. A very simple example of a context-free tree grammar that specifies as its frontier the (non-context-free) string language { a n b n c n } will illustrate the
214
Uwe Mönnich
idea. It is presented together with its lifted version and with two production sequences of derivation trees. Example 21 Consider the context-free tree grammar G = (Σ, £F, S, P) which consists of the components as shown below: Σο
= =
ς2 73
{a.b.c} {S}
η
= =
{F}
F(~(a,xi),~(b>x2)>~(c>*3))| Applying the S-production once and the first F-production two times we arrive at the sequence of trees in Figure 1. The result of applying the terminal S
F
a
a
b
b
c
c
F
a
a
b
b
c
c
Figure 1. Derivation sequence in the CF tree grammar for {a n b n c n }
On cloning context-freeness
215
F-production to the last three trees in Figure 1 is shown in Figure 2.
Figure 2. Terminal trees corresponding to Figure 1
Transforming the grammar G with the help of the LIFT mapping of Definition 11 into its derived correspondent Gd produces a regular grammar. As will be recalled from the remarks after Definition 8, all symbols from the original alphabet become constant operators in the derived alphabet. In the presentation below the coding of type symbols is taken over from the last example in section 2. It relies upon the bijection between S* χ S and N, where S is a singleton. Let Ν be the set of sorts and let D(Z) be the derived alphabet. The derived grammar G D = ( D i l J . D i ^ . D i S i . D i P ) ) contains the following components: D(I)0 D(I)2
= =
=
{a, b, c}
η
{S}
D(I)n,k D(I)n
=
{S}
=
{π}
=
{F}
216
Uwe Mönnich
S S(F,a,b,c) F -» S ( ^ , S ( ^ 1 7 r 1 , n 2 ) , n 3 ) | SiF.Sr.nT.SiaJJ.Sr.^.SibJJ.Sr.ws.Sic))) The context will always distinguish occurrences of the start symbol S from occurrences of the substitution operator S. Sample derivations and two specimens of the generated language £ ( G d ) appear in Figures 3 and 4.
Figure 3. Lifted Derivations corresponding to Figure 1 The case illustrated by this example is characteristic of the general situation. An arbitrary context-free tree grammar G can be mapped into its derived counterpart Gd with the help of the LIFT transformation. The result of this transformation process, G d , is a regular grammar and therefore specifies a context-free language as the yield of its generated tree language £ ( G d ) · This follows directly from the fundamental fact, stated above, that a string language is context-free if and only if it is the leaf or frontier language of a regular tree language. The frontiers of -C(G) and of £ ( G d ) are obviously not the same languages. The yield of £ ( G d ) in particular consists of strings over the whole alphabet Σ extended by the set of projection symbols. Due to the fact, however, that the composition of the LIFT operation with the β operation is the identity on the elements of Τ (Σ, X), it is of considerable interest to know whether this close relationship between elements of Τ ( Ό ( Σ ) and of Τ(Σ,Χ) is preserved by the derivation process in the context-free grammar
On cloning context-freeness
217
s
s
s
s
s
Figure 4. Terminal derived trees corresponding to the example in Figure 2
and its regular counterpart. Before we prove a claim about this relationship in the proposition below a short historical remark appears to be apposite. The central theorem in Maibaum (1974) to the effect that every context-free tree language is the image under the operation β of an effectively constructed regular language is wrong because he confounded the inside-out with the outside-in derivation mode. In the course of establishing a fixed-point characterization for context-free tree grammars in either generation mode Engelfriet and Schmidt (1977) refer to this mistake and state as an immediate consequence of the fixed-point analysis of 10 context-free tree grammars within the space of the power-set tree substitution algebra that each IO context-free tree language £ is the image of a regular tree language D ( £ ) under the unique homomorphism from 1 ( D ( I ) ) into X>X(I, X) (see their Cor.4.12). This immediate consequence is but a restatement of the classical Mezei-Wright result that the equational subsets of an algebra are the homomorphic images of recognizable subsets in the initial term algebra. As formulated by Engelfriet & Schmidt, their correction of Maibaum's theorem has a distinctive declarative
218 UweMönnich content whereas the original claim had a clear operational meaning. It was based on the contention that the individual derivation steps of a context-free tree grammar and its derived counterpart correspond to each other. This is the point of the next lemma. Lemma 22 Suppose G = (Σ, CF, S, Ρ) is a context-free tree grammar and -C(G) its generated tree language. Then there is a derived regular tree grammar Gd = (D(I), D(?), D(S), D(P)> such thatL{G) is the image ofL{GD) under the unique homomorphism from the algebra of tree terms over D (ΣυίΓ) into the tree substitution algebra over the same alphabet. In particular, t ' is derived in G from t in k steps, i.e. t =>* t ' via the productions p i , . . . , p^ in Ρ if and only if there are productions \>\,... ,p£ in D ( P ) such that LIFT[t') is derived in Gd from LIFT\t) via the corresponding productions. Proof. The proof is based on the closure of inside-out tree languages under tree homomorphisms. The idea of using the LIFT operation for the simulation of derivation steps on the derived level can also be found in Engelfriet and Schmidt (1978). Let hn be a family of mappings hn. : Σ η — » Τ ( Ω , X) where Σ and Ω are two ranked alphabets. Such a family induces a tree homomorphism fi.: Τ ( Σ ) — ) Τ ( Ω ) according to the recursive stipulations: ίν(σ) fiia(ti,...,tn))
= =
ho(a) Kn(a)[ft(t,)
fi(tn)]
for σ £ Σ 0 for σ G Σ η
A production ρ in Ρ can be viewed as determining such a tree homomorphism f5 : Τ ( Σ U J ) — > Τ ( Σ U J ) by considering the family of mappings p n : Σ η U 3"n — > Τ ( Σ U 5 , X n ) where p n ( F ) = t for t e Τ ( Σ U 3\Xn) and P n ( f ) = f ( x i , · · · , Xn) for f φ Ρ in Σ U iF. By requiring that f5(xi) = Xi the mapping f) can be regarded as a D ( I U iF')-homomorphism from the tree substitution algebra Ό Χ ί Σ υ ί Γ , X) into itself, where we have set 7 ' : = 5F\{F}. By applying the LIFT-operation to the tree homomorphism f) we obtain its simulation : Τ ( Ό ( Σ U J ) ) —> Τ ( ϋ ( Σ U 3")) on the derived level: PDo(F) PDo(f) PDo«) PD. l + i (Sn.m)
=
LIFTn(pn(F)) f πΓ
for f φ F in Σ U 7
— S n ) t n ( x i , . . . , Xn+1,
Observe, that we have treated D ( I U 3") as a ranked alphabet. If we can
On cloning context-freeness 219 show that the diagram below commutes the claim in the lemma follows by induction:
I(D(IUJ))
T(D(IUJ))
ΦΙ(Συ?,Χ)
> CT(IU?,X)
The commutability is shown by the succeeding series of equations in which the decisive step is justified by the identity of β ο LIFT on the tree substitution algebra. Let f be in (Σ U £F)n: ß(0D(f))
=
ß(LIFTn(pn(f)))
= =
Pn(f) 0(f(xi,...,Xn))
=
WW)·
•
The preceding result provides an operational handle on the correspondence between the derivation sequences within the tree substitution algebra and the derived term algebra. As characterized, the correspondence is not of much help in finding a solution to the problem of giving a logical description of the exact computing power needed to analyze natural language phenomena. The formal definition of the β transformation, which mediates the correspondence, is of an appealing perspicuity, but the many structural properties that are exemplified in the range of this mapping make it difficult to estimate the definitional resources necessary to establish the result of the correspondence relation between input trees and their values in the semantic domain of the tree substitution algebra. We know from the classical result for regular tree languages that monadic second-order logic is too weak to serve as a logical means to define the range of the β mapping when it is applied to the space of regular tree languages. What does not seem to be excluded is the possibility of solving our logical characterization problem by defining the range of context-free tree languages within the domain of the regular languages. In this way, we would rest on firm ground and would take a glimpse into unknown territory by using the same logical instruments that helped us to survey our "recognizable" homeland.
220
Uwe Mönnich
4 Logical characterization Extending the characterization of grammatical properties by monadic secondorder logic has been our main motivation. For tree languages the central result is the following: a tree language is definable in monadic second-order logic if and only if it is a regular language. As is well known and as we will indicate below, a similar characterization holds for regular and context-free string languages. The examples and analyses of English syntactic phenomena that are presented in Rogers (1994) make it abundantly clear that monadic secondorder logic can be used as a flexible and powerful specification language for a wide range of theoretical constructs that form the core of one of the leading linguistic models. Therefore, the logical definability of structural properties that are coextensive with the empirically testified structural variation of natural languages, would be a useful and entirely grammar independent characterization of the notion of a possible human language. It follows from the cross-serial dependencies in Swiss German and related phenomena in other languages that monadic second-order logic, at least in the form that subtends the characterization results just mentioned, is not expressive enough to allow for a logical solution of the main problem of Universal Grammar: determining the range in cross-language variation. In many minds, this expressive weakness alone disqualifies monadic second-order logic from consideration in metatheoretic studies on the logical foundations of linguistics. Employing the results of the preceding section, we will sketch a way out of this quandary, inspired by a familiar logical technique of talking in one structure about another. To what extent one can, based on this technique, simulate transformations of trees is not yet fully understood. We shall sketch some further lines of research in the concluding remarks. The major aim of descriptive complexity theory consists in classifying properties and problems according to the logical complexity of the formulas in which they are expressible. One of the first results in this area of research is the definability of the classes of regular string and tree languages by means of monadic second-order logic. The use of this logic is of particular interest since it is powerful enough to express structural properties of practical relevance in the empirical sciences and it remains, in addition of its practical importance, effectively solvable. As a preparation for our logical description of the β transformation we shall recall some concepts needed for expressing the logical characterization of the class of regular languages. For the purpose of logical definability we regard strings over a finite alphabet as model-theoretic structures of a certain kind.
On cloning context-freeness 221 Definition 23 Let Σ be an alphabet and let τ(Σ) be the vocabulary { S(F, G), F —> DP(jack),
L2 = {5} G
VP(sleeps)}
Macros for Minimalism? 239 Tree grammars with F n = 0 , n / 0, are called regular. Since they always just substitute some tree for a leaf-node, it is easy to see that they can only generate recognizable sets of trees, a forteriori context-free string languages (Mezei and Wright 1967). If Fn , η φ 0, is non-empty, that is, if we allow the operatives to be parameterized by variables, however, the situation changes. These context-free (or macro-)tree grammars are capable of generating sets of structures, the yields of which belong to the class of mildly context-sensitive languages known as the indexed languages. In fact, they characterize this class exactly (Rounds 1970). As an illustration consider the following macro-tree grammar for the verbraising construction (2): (6) Σο (the Lexicon) Σι = {V, V e , VT, V®, DP, I } Fo = { V7?} F2={+V} P=^ k
Ί-ζ — {pro}, a d j } S = VR V = {x,y}
VR +V(x,y)
—> —>
+V(proj(DP,V e ),V) proj(x,adj(v,I))
+V(x,y)
—*
+ V(pro j (D P, p r o j (x, V^)), ad) (y, V r ))
plus the obvious rules for Σι "Spell-out" parameterization of a d j applies as before. As the sample derivation in figure 3 illustrates, the minimalist and the tree grammar generated structures are identical. Any context-free tree grammar Γ can be transformed into a regular tree grammar which characterizes a (necessarily recognizable) set of trees encoding the instructions necessary to convert them into the ones the original grammar generates (Maibaum 1974). This "LiFTing" is achieved by constructing a new, derived alphabet and translating the terms over the original signature into terms of the derived one via a primitive recursive procedure. (7) Definition For each τι > 0, Σ^ = {f'|f 6 Σ η } is a new set of symbols; Vic = {Χχ 11 < i < k}; for each η > 1 and each i, 1 > i > n , π-Hs a new symbol, the ith projection symbol of sort n; for each π > 0, k > 0 the new symbol cniic is the (ri, k)th composition symbol. Σ^ = Σ;υ{πΤ ι |1 > i > n } f o r n > 1 = {cn,k) for η, k > 0
Σ^ = Σ^ Ί-\ = % otherwise
240
Hans-Peter Kolb VR +V
ρτο)
V
pro)
pro)
adj
schwimmDP
V€
DP
Ve
V
I
Karl
ε
Karl
e
schwimm-
-t
ß:
ß':
+V
proj
pro) DP Maria
adj proj /\
V schwimmen
proj
VTC
A
*
pro) Vr lehr-
DP Maria
adj proj
proj
adj
VTe V e schwimmen
Vr lehr-
DP V e Karl e
DP V e Karl e
Figure 3. A (macro-)tree grammar derivation of VR
For k > 0, l i f t £ : T ( I , V k ) LIFTj^(Xi) = π·'
T ( I L , k ) is defined as follows: LIFT^(f) = Co.k(f') for f G l o
LIFlf(f(ti,... ,tn)) =Cn)k(f/,LIFT^(t1),...
,LIFT^(tn))
for f e Σ η , η > 1 Note, that this very general procedure allows the translation of any term over the original signature. Obviously, a rule of a grammar Γ is just another term over Σ Γ , but so is, e.g., any structure generated by Γ. Again simplifying things to the verge of inaccuracy—in particular, omitting all 0- and 1 -place composition symbols, (7) yields the following LiFTed grammar for (6): (8)
= I0
L\ = { V ' . V ^ . V ^ V ^ . D P M ' }
Σ\ = { p r o j ' , a d j ' , π 1 ( π 2 } f0={VR',+V'} S = VR'
= { Cu,k I c € C } V = 0
Macros for Minimalism ? 241
{
VR'
c(+V',c(proj',DP\Ve'),V')
Ί
+V'
ciproj'.Tn.ctadj'.Wi.I')) I = > ci+V'.ciproj'.DP'.ciproj'.nT.V*')), f c(adj>2,V0) J plus the rules for Σ^This grammar leads to structures, given in figure 4, which on first sight don't have much in common with the ones we are really after, i.e., the ones generated by (6). However, as mentioned before, there is a mapping h. from these structures onto structures interpreting the c 6 C and the π € Π the way the names we have given them suggests, viz. as compositions and projections, respectively. In fact, h. is the unique homomorphism into the appropriate tree substitution algebra. Moreover, as Mönnich (1999) has shown, the image of the set of F L structures under this homomorphism are exactly the structures generated by Γ.4 So in some sense the Γ 1 structures, which form a recognizable set and, therefore, can be described by a context-free grammar or, more importantly, by a set of MSO-formulae, already contain the intended structures. In what follows we will explore how this fact can be brought to bear on a MSO-based metatheoretical treatment of modern generative grammar.
3 Linguistic relations on explicit structures The minimal requirement for any direct use of the explicitly encoded, recognizable sets of structures is the definition on the derived trees of the linguistic relations that hold in the original structures. The first thing that comes to mind is something like the following: dominatesL / c-commands1- / . (9) x . . . . . . , 'ryinT is-in-the-checking-domain-of , , , dominates / c-commands / , , ,. , if h-lx] . . . . .. , . ' h(y) in K(T L ). is-in-the-checking-domain-of Actually, such a definition would be valid: from Mönnich (1999) we know that h. exists and since any tree can be described by a p -formula, so can L h.(T ). The definitions of c-commands, dominates and is-in-the-checkingdomain-of can either be taken directly from Rogers (1994) or easily derived from his definitions. (9) is not effective, however: There is no way to reconstruct T L from a set of dominanceL or c-commandfL statements alone.
242
Hans-Peter
Kolb
VR
+v
proj
DP
V£
Karl
ε
V schwimmproj
πι
c
proj
DP Karl
V ε
pro)
p r o j πι
c p r o j DP Maria
| c
a d j π^
Figure 4. A sample derivation using the LiFTed grammar
Macros for Minimalism ?
243
But even if we ensure the general well-formedness of Τ L by other means— definitions ä la (9) seem to introduce unnecessary complexity. A more interesting approach would be to "build h. into the definitions." The aim of such an undertaking is to define a set of relations R h , holding between the nodes η G Ν L of the explicit tree T L which carry a "linguistic" label I G L i F T ( l J n > 0 Σ η ) in such a way, that when interpreting dominatesh G R h as a tree order on the set of "linguistic" nodes and precedesG as the precedence relation on the resulting structure, ( { η | η G N L Λ ί ( η ) G L i F T ( ( J n > 0 Ση)}, dominates*1, precedesh) is in fact h.(T L ). What we are after is a syntactic interpretation (cf., e.g., Ebbinghaus et al. 1996:131ff) of Τ in T L . This involves finding a set of formulae Φ which maps the relevant relations R into T L in such a way that Φ - 1 ( T L ) = Ψϊ* 1 ) is the intended, "linguistic" structure corresponding to T L . Taking (following Rogers 1994) { < , < * , X } as the primitive structural relations on the intended trees and assuming that the corresponding { · * , < } are definable on the LiFTed structures, Φ is given by {(p^L = an appropriate restriction of the nodes of T L , φ Ψ Ψ χ 1 ) f ° r any explicit tree T L Again, since h exists, this construction is possible in principle. What complicates matters in our case are two additional requirements: In order to count as a solution for the expressivity problem Φ must be cast in M S Ο, and to pave the way towards an interface to structures generated by linguistic means, Φ should be effective, i.e., Φ applied to some linguistic tree should enable us to derive a(n equivalence class o f ) TL-structure(s). As a corollary, this requires that linguistic relations between two nodes can be determined locally in T L , i.e., without reference to the complete tree. If successful, one would replace < , · < * , U) = 3z (transw, (x, z) A transw2 (z> y)) = rr«w^(x,y)
trans'^ {"*•,'y = VX (Vv, w (v G X A transw{v, w) -
w e X)
A x e X -> y € X) The resulting formula uses only the edgen relations, the MSO-definable tests of the original automaton, and—for the Kleene-*-case—the closed sets constructed via (14). No recursion is involved: < is MSO-definable: (21)
x « y ξ fransvv«(x,y)
This approach can be extended to the other members of the dominance family of relations: is just 21^ with the transition (e —(L(x) }—> u) added to δ, and 21^* is 21^+ with F = {u}, and qfi n , as well as any transition referring to it, removed from Q and δ, respectively. The inductive translations of the corresponding walking languages transwM* and transw y ' andxLy thenx'Ly'. Conversely, i f x is not a leaf then χ Ly iff for all x' < x, x' Ly ; likewise for y. Given L, we say that χ precedes y if χ L y and for no ζ we have χ L ζ L y we say that χ immediately precedes y. If xi and xi are sisters and xi Lx2, then xi is called a left sister of χι- If in addition xi immediately precedes X2 then xi is called an immediate left sister ofxj. An ordered tree is a quadruple (T, r, χ and there exists a y such that χ < y < ζ and xRySz. By monotonicity of the relations we may assume y = fR(x). Then z = fs(y) = fs(f R (x)). Definition 3 A binary relation R on a lapelled (ordered) tree is a chain if there exist η and Q. C D, i < n, so that R = Co ο Ci ο . . . ο C n _ i , with Ci := k(Q., X). R is called definable if it is an intersection of finitely many chains. Theorem 4 The set of definable command relations is closed under intersection, union and relation composition. The proof of this theorem can be found in Kracht (1993). As a corollary we note that there is a smallest and a largest command relation. Both are tight. The smallest relation is obtained when we choose Ο := D, the entire set of labels. Then, χ O-commands y iff χ = r or the node immediately dominating x dominates y. This relation is called idc-command. The largest relation is obtained when Ο := 0 . Here, all nodes O-command all other nodes. (This relation is only of mathematical importance.) 2.5 The use of domains In Kracht (1993) it was claimed that relations in G/B theory are definable command relations. It was shown that the system of Köster (1986) can be reformulated using definable command relations. Here we will give some more examples, this time of some more canonical literature. First, the notion of c-command is of central importance. It is usually defined in two ways, depending on the authors. It is either identical to idc-command or identical to max-command. Here, the relation of max-command is obtained by choosing Ο to be the set of maximal (= phrasal) nodes. Indeed, with this choice given, χ max-commands y if all phrasal nodes properly dominating y also dominate
Adjunction structures and syntactic domains
265
y. In many cases of the literature it is also required that χ and y are incomparable. We call this the Non Overlapping Condition. It has been argued in Barker and Pullum (1990) that to include this condition into the definition of a command relation is not a good choice.1 Suffice it to say that from a mathematical point of view it is better to do without the Non Overlapping Condition. For the purposes of binding theory and other modules such as Case-Theory the relation of c-command (which we now take to be idc—command) is central. Indeed, it is also the smallest command relation, and used quite extensively as a diagnostic instrument to analyze the D-Structure of sentences, using evidence from binding theory (for example, see Haider 1992). In Baker (1988) it has been modified somewhat, but this modification will be automatically implemented in adjunction structures. Let us now turn to some more difficult questions, namely the nonlocal relations. Here, the most prominent one is subjacency. In its most primitive form it says that a constituent may not move across more than one bounding node. This condition can be rephrased easily in the present framework. Let BD be the set of labels corresponding to bounding nodes. Examples are BD = {S, NP} or BD = {S',NP}, in the LGB-terminology. The choice between these sets is empirical and does not touch on the question how to define subjacency. Now, the requirement on subjacency can be rephrased as a condition on the relation between the trace and the antecedent. Let BD be the relation of ßD-command. Then put SUB := BD ο BD We claim that y € χ SUB — that is, χ Si/ß-commands y — iff y is subjacent to x. For to check this, we need to look at the least node ζ such that ζ dominates both x and y. Suppose that χ is subjacent to y. Then in the set [x, z] — {x} at most two nodes carry a label from BD. Let f be the generating function of BD-command. Then the generating function of subjacency is f ο f. By definition, f(x) is either a node carrying a label in BD, or f(x) = r. Hence, for the node ζ defined above, ζ < f ο f(x). It follows that χ SUBcommands y. Now let conversely χ Si/ß-command y. Let again ζ be the least node dominating both χ and y. Then ζ < f ο f(x) and it is easy to see that at most two nodes of label e BD can be in the set [x, z] — {x}. So, y is subjacent to x. In Chomsky's Barriers System (see Chomsky 1986) this definition of movement domain has been attacked on the ground that it is empirically inade-
266
Marcus Kracht
quate. The definition that Chomsky gives for subjacency makes use of adjunction structures and an appropriate adaptation of the definition of command relations for those structures. We will return to that question. However, the use of adjunction is not necessary to get at most of the facts that the new notions and definitions are intended to capture. It would take too much time to prove this claim. We will be content here with outlining how different notions of movement domains can achieve the same effect. Crucially, the instrument we are using is that of composing relations. As in the definition of subjacency given above, the relation is defined from tight command relations by means of relation composition. This is no accident. It can be shown that tight command domains cannot be escaped by movement (whence the name). However, subjacency is intended to be a one-step nearness constraint, and it can clearly be violated in a successive step. Now consider the following sentence (1) [Von welcher Stadt]ι hast Du [den Beginn [der Zerstörung ti]] gesehen? [Of which city] j did you witness [the beginning of [the destruction f/]] ? CP
Figure 1. Wh-movement Here, in moving the ννΛ-phrase (pied-piping the preposition), two bounding nodes have been crossed. Indeed, to capture w/i-movement it seems more
Adjunction structures and syntactic domains 267 plausible not to count intervening nominal heads. If that is so, let us look for an alternative. It has been often suggested that the only escape hatch for a ννΛ-phrase is the specifier of comp. A wft-phrase always targets the next available spec-of-comp. If that spec is filled, movement is blocked. To implement this we take advantage of the fact that the complement of C° is IP. Hence we propose the following domain
WHM := IP ο CP Figure 1 illustrates a case of wh-movement, where a wh-phrase is moved from inside a verb phrase into spec-of-comp. The domain of the trace is the least CP above the least IP which is above the trace. In the present case, the domain is the clause containing that trace. From spec-of-comp, however, the domain would be the next higher clause! This readily accounts for the fact that in a subsequent movement step the wh-phrase may target the next higher spec-of-comp. (Of course, the present domain allows the constituent to move anywhere within that domain. We assume however that independent conditions will ensure that only this position is chosen, if at all it is available.) Although this definition may have its problems, too, what we have shown is that ideas proposed in the literature about movement can be succinctly rephrased using definable domains.
3 Adjunction structures An adjunction structure is a structure © = (S,r, c, if all segments of b properly dominate all segments of c. If b includes c, it also contains c. The following characterization of containment and inclusion can be given. Proposition 5
1. b > c iffb0 > c°. 2. b > c iffb0 > c° iffb° > c0.
3. b » c iff b0 > c° iffb0 > c0. We note the following. Corollary 6 < and are irreflexive and transitive. Proof. Since b° < b° cannot hold, < is irreflexive, by (1) of the previous theorem. Furthermore, let b > c > Ö. Then b° > c° > 0°. Hence, by transitivity, b° > 0°, from which b > Ö. Now we turn to b0 cannot hold, is irreflexive. Now let b » c » D. Then b0 > c0 > Ö0. By transitivity, b0 > ö 0 . Hence b > D . • It can also be shown that (1) if α < b c then α •C c and (2) if α < b < c and a < c then b c. As we will see later, this completely characterizes the properties of < and -C with respect to the block structure. We define Tb i.b fb lb
:= := := :=
{c : c > b} {c: c < b} {b}U{c:c»b} {b}U{c:c«b}
and call the adjunction structures based on the sets f b — {b} and f b — {b} the weak (strong) position and X b and j. b the weak (strong) constituent of b.
Adjunction structures and syntactic domains
269
Figure 2. A complex morphological head Proposition 7 Let (S, r, b° as well as 5° > b°. So, c° and 0° are comparable. Then either c° < c° = or c° > 5°. In the first case, c is contained in i , in the second case they are equal, and in the third case case c contains Ö. Now for the strong upper cones. Let c » b and Ö b. Then c0 > b° as well as D0 > b°. Therefore, c° and 9° are comparable. If they are equal c = D. Otherwise, let c0 > 0 o . Then, by linearity of the blocks, c0 > 0 ° , so that 0 is included in c. Similarly if > c0. • The adjunction structure below illustrates that j for all j and fR(y) > j for all j < t. R is strong if fR(y) » y for all f such that f « t , and fR(y) = r otherwise. Then it is clear what we understand by strong O-command and weak O-command. Is there a way to choose between strong and weak relations? The answer is indirect. We will show that if we opt for weak relations, they turn out to be node based, so the entire reasons of introducing adjunction structures in Chomsky (1986) disappear. If we opt for the strong relations, however, we get the intended effects of the barriers system. Let us say a command relation on an adjunction structure is node based if the node-trace is a command relation. We want to require the following property. Node Compatibility. A block based relation is a command relation only if its node trace is a command relation. This excludes the relations where constituents are strong constituents. For the node trace of a strong constituent is not necessarily a constituent. For let c be a two segment block. Assume that D is adjoined to c. Then the strong
284 Marcus Kracht cone of c does not contain D. However, c° is a member of the node trace of I c. So, the node trace of J. c is not a constituent. We now want to compute the node trace of the command relations of weak and strong O-command. Let Ο C D be a set of labels. Then let O. be the (node based) command relation based on the set of nodes which have label from Ο and are minimal segments in their block. Let O* be the (node based) command relation based on the set of nodes that have label Ο and are maximal segments. Finally, let Ο μ be the relation based on the nodes with label Ο which are both maximal and minimal. Denote by W(O) the node trace of weak O-command, and by S(O) the node trace of strong O command. Theorem 27 W(O) S(O) In particular, W(O) is tight.
= =
o· (Ο. ο θ * ) η Ο μ
Proof. Let (x,y) € W(O) and let χ G y and y € t). Then for the least 3 which is > y and has label in 0 , 3 > t) — if it exists. Assume 3 exists. Then 3° is also the least node with label in Ο which is a maximal segment and > y°. It follows that 30 > χ and also that 30 > y. Hence, (x,y) € 0 \ Assume that 3 does not exist. Then no block above y has a label in O. Then no node above χ has a label from O, and so (x,y) G W(O). Conversely, let (x,y) G O*. Let ζ be the least node with label from Ο which is a maximal segment. (If it does not exist, we are done, as can be seen easily.) Let 3 be its block. Then 3 is the least block > y with label from O. So, y weakly O-commands t). Hence (x,y) e W(O). Now for S(O). Let t). This shows the theorem. • As before, we can use a generating function for command relations. A block based command relation R is tight if it satisfies the postulates for tight relations. Tightness. If fn(y) is in the position of t) then fR(tj) = fR(y) or fR(y) is also in the position of fR(tj)·
Adjunction structures and syntactic domains 285 Clearly, among the tight relations we are interested in the analogues of κ(0, Θ) for Ο C D. We put (y, η) € λ ( 0 , 6 ) iff for all 3 » j : such that £(3) € Ο we have t) < 3. Definition 28 Let 6 be a labeled (ordered) adjunction structure with labels over D. A definable command relation over 6 is a command relation generated from relations of the form λ ( 0 , 6 ) , Ο C D, using intersection and relation composition. 5.2 K(ayne)-structures In Kayne (1994), Kayne proposes a constraint on adjunction structures which he calls the Linear Correspondence Axiom (LCA). This axiom connects precedence with antisymmetric c-command. Here, χ c-commands ν antisymmetrically if χ c-commands y but y does not c-command x. Kayne's theory is illustrative for the potential of adjunction structures, yet also for its dangers. The attractiveness of the proposal itself — namely, to link precedence with hierarchy — disappears as soon as one starts to look at the details. For the canonical definition of c-command does not yield the intended result. It is too restrictive. Hence, to make the theory work, a new definition has to be put in place of it, that takes constituents and positions to be strong. Although it too is restrictive (so that in the book Kayne has to go through many arguments to show that it is the right theory) it resorts to a definition of c command that we have actually discarded on theory internal reasons, since it takes the wrong notion of a constituent. Definition 29 χ ac-commands y if χ and y do not overlap, χ c-commands y but y does not c-command x. A nice characterization of c-command and ac-command can be given in the following way. Let μ(χ) be the mother of x. This is undefined if χ is the root. Lemma 30 (l)x c-commands y iff χ is the root or μ(χ) > y. (2) χ accommands y iff (a) χ and y do not overlap and (b) μ(χ) > μ(υ). Proof. (1) Suppose that χ c-commands y. If χ is not the root, μ(χ) is defined and μ(χ) > y, by definition of c-command. The converse is also straightforward. (2) Suppose that χ ac-commands y. Then neither χ nor y can be the root. Then μ(χ) > y. Now, μ(χ) = y or μ(χ) > y. μ(χ) = y cannot hold, since then y overlaps with x. So μ(χ) > y and hence μ(χ) > μ ^ ) . However, μ(χ) = μ(^) implies that y c-commands x, which is excluded. So, μ(χ) > ). Conversely, assume that χ and y do not overlap and that
286 Marcus Kracht μ(υ) < μ(χ). Then neither is the root and μ(χ) > y, since y < μ(υ). So, χ c-commands y. If y c-commands χ then μ(y) > x, which in combination with μ(y) < μ(χ) gives μ ^ ) = χ. This is excluded. Hence y does not c-command x. • Proposition 31
Antisymmetric c-command is irreflexive and transitive.
Proof. Irreflexivity follows immediately from the definition. Suppose that χ ac-commands y and that y ac-commands z. Clearly, none of the three is the root of the tree. Then μ(χ) > μ ^ ) > μ(ζ), from which μ(χ) > μ(ζ). Now suppose that χ overlaps with ζ. χ < ζ cannot hold, for then μ(χ) < μ(ζ). So, χ > ζ. Hence μ ^ ) must overlap with x, for also μ ^ ) > ζ. Since μ(χ) > μ ^ ) , we have χ > μ ^ ) and so χ > y. This is a contradiction, for χ does not overlap with y. • We will suspend the full definition of ac-command for adjunction structures and state first the LCA. After having worked out the consequences of the LCA for trees we will return to the definition of ac-command. Definition 32 Let Θ = (S,r, c and tj > D such that y ac-commands t). Thus the LCA can be phrased in the following form. Linear Correspondence Axiom, χ is a linear order on the leaves. Note that χ depends on the particular choice of the notion of c-command. We will play with several competing definitions and see how LCA constrains the structure of adjunction structures depending on the particular definition of c-command. Let us give a special name for the structures satisfying LCA. Definition 33 Let 6 be an adjunction structure, and X C S 2 a binary relation over S. Put *x :=«?,0) : (3o > y)(3w > rj)«ü,w) € X)} © is called a K(X)-structure if χ χ is a linear order on the leaves. Recall that a leaf is a block that contains a segment which is a leaf of the underlying tree. First we take X to be the notion of ac-command defined above. The result is quite dissimilar to those of Kayne (1994). The reason is that Kayne chooses a different notion, which we call sc-command. It is defined below. To see how K(AC)-structures look like, let us start with a limiting case,
Adjunction structures and syntactic domains 287 namely trees. Notice that in general, if κ ac-commands y and y > ζ then χ ac-commands ζ as well. Definition 34 Let Χ = (T, r, x, w > y such that ν ac-commands w. Theorem 35 (Kayne) A tree is a K(AC)-tree iff it is at most binary branching, and for every x, y i, yz, yi Φ vj2 such thaty ι -< xandy 2 •< x, either y 1 is a leaf or yi is a leaf, but not both. Proof Let X be a tree. Let χ and y be leaves. We claim that χ χ y and not y χ χ iff χ ac-commands y. Assume that χ χ y and not y χ x. Then χ and y do not overlap. (Otherwise, let u > χ and ν > y. Then u does not accommand ν since u and ν also overlap.) If χ does not ac-command y then either (Case 1) χ does not c-command y or (Case 2) y c-commands x. Case 2 is easily excluded. For then χ χ y simply cannot hold since every node dominating y must c-command x. Suppose Case 1 obtains. By definition of >4 and the remark preceding Definition 34 there is a u such that χ < u and u ac-commands y but u ^ y. Moreover, since y does not c-command χ there is a ν > y such that ν c-commands χ but ν ^ x. Hence, χ does not c-command v, otherwise χ c-commands y. Hence, ν ac-commands x. We now have: u ac-commands y, whence χ χ y, and ν ac-commands x, whence y χ χ. Contradiction. So, χ ac-commands y. If χ ac-commands y then by definition χ χ y. Moreover, if for some ν > y and some u > χ we have that ν ac-commands u, then ν ^ χ and u ^ y, from which follows that u = x, and u c-commands v, a contradiction. This proves our claim. Suppose now that 1 is a K(AC)-tree. Let x be a node with three (pairwise distinct) daughters, y 1, y2 and y3. Let 24. (i e {1,2,3}) be leaves such that 2a < Vi for all i. By LCA, the z\_ are linearly ordered by χ . Without loss of generality we assume ζ·\ χ ζι χ z$. Then z\ ac-commands Z2 and Z2 accommands Z3. Therefore, z\ = yi and Z2 = y2· But then Z2 ac-commands z\, a contradiction. So, any node has at most two daughters. Likewise, if χ has exactly two daughters then exactly one must be a leaf. Now assume that X satisfies all these requirements. We will show that it is a K(AC)-tree. First, any node is either a leaf or a mother of a leaf. Mothers of leaves are linearly ordered by >. (For if not, take mothers u and ν that do not overlap. Let w be their common ancestor. *w has at least two daughters of which neither is a leaf.) Now let χ χ y. Let u be the least node dominating χ and y. Then, u has two daughters, of which one is a leaf. This is easily seen to be x. Further, y is not a leaf. So, y χ χ cannot obtain. Hence we have χ χ y iff χ ac-commands
288 Marcus Kracht y. This is irreflexive and transitive. Now, finally, take a leaf χ. It has a mother μ(χ) (unless the tree is trivial). From what we have established, χ χ y for a leaf y iff μ(χ) > μ(υ). But the mothers of leaves are linearly ordered by >, as we have seen. • A few comments are in order. If ac-command would not require χ and y to be incomparable then χ ac-commands y if χ > ζ > y for some z. Then trees satisfying LCA would have height 2 at most. If we would instead use the relation of ec-command, where χ ec-commands y if χ c-commands y but χ ^ y then χ ac-commands y if χ -< y. So, trees satisfying LCA would again be rather flat. We would achieve the same result as above in Theorem 35, however, if the definition of c-command would be further strengthened as follows, χ cc-commands y if χ and y are incomparable and χ c-commands y; χ acc-commands y if χ cc-commands y but y does not cc-command x. (Simply note that acc-command is the same relation as ac-command.) Now let us go over to adjunction structures. The results we are going to provide shall in the limiting case of a tree return the characterizations above. Definition 36 In an adjunction structure, jc wc-commands t) if j ^ t) and for every block u > j w e have u > t). This is c-command as defined earlier for blocks, with the added condition that jc excludes t}. In the tree case this is like c-command, but the clause χ ^ y is added. Definition 37 In an adjunction structure, j: awc-commands t) if y and t) are «^-incomparable, y wc-commands t) but t) does not wc-command y. This is the same as ac-command in the tree case, hence the results carry over. First of all, we assume that adjunction structures do not contain segments which are non-branching and non-minimal. Recall that a morphological head of an adjunction structure is a maximal constituent in which no block includes another block. With respect to heads, the LCA is less strict on adjunction structures. The reason is that the exclusion of overlap is replaced by the condition that no block includes the other. It turns out that in adjunction structures the following type of morphological heads are admitted. Theorem 38 Let 21 be a morphological head and a K(AWC)-structure. Then 21 is right branching. That is to say, 21 = (T, u} and V := {tj : t) > o}. Clearly, U n V Φ 0 . Let b be the minimum of U n V . Case 1. u = b or ο = b. Without loss of generality ο = b. Then ο does not sc-command u, but u sccommands o. So, u xi o. Hence (a) holds. It is clear that (b) also holds. Case 2. u Φ b and t> Φ b. Then let c and 0 be 4 u does not hold. Case 2b. c is not an adjunct to b. Then both c and 0 are ^-daughters, and so by (2) one of them is a head. Assume without loss of generality that c is the head. Then Ö is not a head, again by (2). Furthermore, by (3), D is strictly complex. Thus ο t) for some t) < Ö. Furthermore, there is a a «^-daughter of t) and ϋ < 3. Then c asc-commands 3. Hence u >1 0. To show (b), we have to show that ο χ u does not obtain. But c sc-commands 0, so c sc-commands any block < 5. Since c is a head, any block of c sc-commands any block of 0. Therefore, no block of 0 can asc-command any block of c. This shows that ο χ u does not hold. The proof is complete. • These are the requirements as can be found in Kayne.
5.3 Movement invariance of domains Finally, we want to discuss an important feature of the new, block based definitions of domains, namely their invariance under movement. In fact, invariance holds with respect to more transformations than just movement. The simplest of them is deletion. Let Θ = (S, ' when the head
Representational Minimalism
309
e:s,(F°)
1 :(f),(F°),(b°)
Λ
0:(f),(F°),(a°)
e:(f)
0 : (a*)
Λ
tfa
tfo
Figure 3. The T-marker, evaluated.
is down and to the right (the right-hand daughter projects over the left, in Chomsky's terminology) and ' < ' when the head is down and to the left (the left-hand daughter projects over the right). Unlike Stabler, we distinguish XO-intemal structures, using the symbol V to mark XO's, whose internal head, here, is always to the left.3 Instead of deleting cancelled features we simply wrap them in parentheses to mark them as cancelled. Since we use only a single instance of each lexical item, we use the names for lexical items from Figure 1 as our indices on traces. (We will generally not distinguish heads from the lexical items of which they are instances, using the names for lexical items to name heads as well.) In slightly more formal terms, we can relate the tree in Figure 2 (let us call it a "T-Marker", recalling the terminology of Chomsky (1975/1955)) to the tree in Figure 3 by interpreting the operation symbols 'Merge' and 'Move' (roughly) in the following way. Given two trees Ti and T2,
< , if Ti is an XO, Merge ( T 1 ( T 2 ) = *
Ti
T2
> J T2
, otherwise. Ti
310 Thomas L. Cornell There are some other side conditions on the application of Merge, but they need not concern us here; see Cornell (1996) for details. For Move we first define an auxiliary function τ. This simply maps moved material to whatever residue it is to leave behind. In the simplest case, this will be a single, constant placeholder symbol, but in the event that the trace of a prior movement can intervene to block some other movement, it may need to preserve some information about the features that used to live there. In the most complex case, it will need to be some unique indicator identifying the trace with a single chain (e.g., a referential index). We leave this open as a parameter along which different formalisms can vary; in the particular case of the derivational system for / C ^ (Cornell 1996), the simplest trace function suffices. Against this background, then, we have that, given a tree Ti such that < head(Ti)
T3
where T3 contains a subtree T2 which can be attracted by the head of Τι, then: < Move(Ti) = head(Ti)
T2
where A\ X B is basically the result of replacing an occurrence of A in Β with x. (More accurately, it evaluates to the set of trees which would be just like Β if every occurrence of χ were replaced by A, that is, it is the inverse of the variable-substitution operation.) This interpretation of Move is restricted to head movement. A more complex interpretation is required if we wish to capture XP movement as well. Also we have not distinguished between movement to cancel strong vs. weak licensors; in -Cww-derivations movement is always to satisfy a strong feature. Feature checking is taken to be an automatic and immediate consequence of the construction of a checking configuration. In particular, it is consequent upon the merging in of a specifier and upon any instance of head movement. Finally, we have been completely vague about the meaning of "can be attracted". We will assume here a simple definition: the attracted head must bear a licensee feature which matches the next sequential licensor feature of
Representational Minimalism 311 head(Ti), and there must be no other head which c-commands it which also bears the same licensee feature. If we interpret feature checking as an operation which replaces a checked feature φ with ( ' and complement positions with the symbol ' < ' . We consider the (proper) phrasal positions in XP and XO to be attachment sites, and we consider chain positions in XP* and XO* to be the elements which are actually attached to these positions. So there is a rather deep sense in which this grammatical framework is "strongly lexicalist", in that the division of labor which we observe at the lexical level in the distinction between licensor and licensee features is projected into the syntax as the distinction between phrasal structures and chains. Besides the formation of these four head-extensions, our main structurebuilding operation is attachment, which we also interpret as creating an immediate dominance link. By contrast to the immediate dominance links which order positions within the phrasal extensions XO and XP, which relate positions which are all extensions of the same head, attachment assertions relate positions which are extensions of distinct heads. In particular: Condition 1 (Complementarity of Attachment) We require that all attachments be of chain positions χ extending some head α to phrase-structure positions y (properly) extending a distinct head β. Condition 2 (Uniformity of Attachment) In all attachments of chain positions χ to phrase-structure positions y, χ and y are of the same type (i.e., XO or XP). We consider syntactic structures, like the lexical items they are built out of, to be categorized sound-meaning pairs, so we will apply the projection functions Tipf and nif to them as well. We can use the projection functions in order to associate only the PF substructure or only the LF substructure with particular chain positions, in this way deriving a chain-formation system reflecting the various movement operations of Stabler (1997). So, for example, given a phrase aP, we can associate π ρ /(αΡ) with one chain position and then associate 7t//(aP) with a higher position, simulating covert "LF movement" of the aP. Intuitively, this allows us to construct XO and XP without regard to where they will be spelled out in the final syntactic structure. As promised, we thereby separate the issue of which chain position attaches where—which involves among other things the theory of Locality—from the problem of what goes in each chain position—which is the Spell-Out Problem.
318 Thomas L. Cornell 3.2 Feature driven licensing In this section we will try to develop a representational view of the feature driven theory of movement from Chomsky (1995). The essential idea of movement theory which we will attempt to adapt here is that movement can only take place if it puts a clashing pair of features in a checking configuration. A checking configuration is essentially any structural configuration involving a phrasal extension of a head α (αΟ, ctP) and a dependent chain position extending a distinct head β, unless that dependent is the structural complement. We will not attempt here to explain the curious shape of this definition,, but it is based on the descriptive observation that movement seems to extract structures out of the complement domain and into some other local relation to the head. So we define, by stipulation if necessary, the complement domain as the source of movements and all remaining phrasal positions in the extensions of a head as checking positions, that is, appropriate landing sites for movement. As a matter of terminology, we distinguish the attraction domain of a head from its checking domain. All movement is from the attraction domain of a head to a position in its checking domain. For our purposes the checking domain can be defined as follows. Definition 3 (Checking Domain) The checking domain of a head α is the set of positions properly extending ct in either 1. αΟ, or 2. as a specifier in αΡ. Within the more representational framework which we are developing here, we begin from a state in which we are given for each head α a complex of structural positions, namely the tuple (αΟ, αΡ, aO*, aP*), and a lexical item, that is, a complex of features consisting of licensors, licensees and interpretable features. In a sense we have already done our chain formation ("pre-syntactically" in the terminology of Brody (1995)), in that odd* and αΡ* already represent proposals for what chains there will be in the syntactic structure as a whole. What we are considering here is more like a theory which licenses these structures, that is, which selects pre-syntactic chain formation hypotheses which will actually work together to yield well formed syntactic structures. The licensing, or feature-distribution, problem is then that we must associate the proper grammatical resources with the constructions that consume them.
Representational Minimalism 319 3.2.1 Checking theory The core of our system is naturally going to be Checking Theory, which we take to define a map distributing formal features across a syntactic structure in such a way that all features are checked. We will implement this by associating formal features with positions. Intuitively, checking configurations are just attachment arcs, and licensor features belong at the target, licensees at the source of an attachment arc which puts them in a checking configuration. Our first definition will probably seem out of place here: at first glance it seems to have nothing to do with Checking Theory. Definition 4 (φ-Structure) Given a categorial feature φ, a φ-structure is a syntactic structure in which the only unchecked feature is φ. We take a φ-structure Σ to represent the judgment that π Ρ /(Σ) is a grammatical utterance of category φ, under the interpretation 7t//(I). When we define a language in this way, we implicitly impose upon it the condition that all (but one) features be placed in checking configurations. As already noted, we take checking configurations to be attachment links where the target is either a specifier position or a head-internal adjunct position, i.e., the target is in the checking domain of its head. We take (proper) XO and XP positions to correspond to lexical licensor features, and chain positions to correspond to licensee features, so the feature distribution function, in order to respect this interpretation, must assign licensees to chain positions and licensors to phrasal positions, in particular phrasal positions in the checking domain of the head. Condition 5 (Uniformity of Distribution) The distribution function must map licensees to chain positions and licensors to phrasal positions. In our terms, Chomsky's principle of Last Resort (Chomsky 1995) requires that every chain position which properly extends some aO or a P must be licensed by the possession of an appropriate licensee feature. We add to this a condition of Linear Resource Use, which states, in our terms, that the feature distribution function must be, in fact, a function, so that no feature can be used in more than one checking configuration.7 Condition 6 (Last Resort) For all proper chain positions χ in ccO* U aP*, the distribution function maps some licensee feature from the lexical item of atox. If we restrict our attention to φ-structures, it is clear that now all proper chain
320 Thomas L. Cornell positions will have to be attached to phrasal positions with appropriate licenser features. Also all positions in the checking domain of every head will have to host attached chain positions bearing appropriately clashing features. Note in passing that more than one feature can be checked in any given position: we do not require the distribution function to be one-to-one. Note also that the minimal chain positions are not subject to Last Resort, but still can appear in checking configurations. For example, in the structure associated with the tree in Figure 3 the heads a and b are extended trivially, so the sole position which exists in (a0,aP,a0*,aP*) or γΡ
< α
Λ
>
Ttyty)) (resp., {npf{z) > 7tp/(y))). Note that under this definition the projection functions preserve headedness information. If χ is a complement position in aP, then χ immediately dominates min(aP) = min(aO*) and a structural complement ζ G βΡ* (for some β). π//(χ) = (7t//(min(oc0*)) < πν(ζ)) Again, we treat npf analogously. If χ is a head-adjunction position, then it immediately dominates an XO position y which it extends and also an XO* position ζ which is attached to it. Then π,f{x) = (7i//(y) ο mf{z))
Yet again, we treat npf analogously. Note that y may be the head itself; we have already defined the action of the projection functions on lexical items. The only problem that remains is the determination of the PF position for
Representational
Minimalism
327
an arbitrary chain. As noted, we want this to be the "maximal strong chain position," that is, the position farthest from the minimal chain position which can be reached accross chain elements whose features have been checked against strong licensors. Definition 13 A chain extension is called strong if it creates a position in which a licensee is checked against a strong licensor. Definition 14 (Maximal Strong Position) The maximal strong position is defined as the maximal element in the closure of the minimal chain position under strong chain extension. This has the desired consequence that the position we want must be reachable along a continuous sequence of strong extensions, and also that it is the minimal chain position if the first extension is weak or nonexistent. On the other hand it is a relatively weak condition since it allows strong positions after weak positions, which would correspond in Chomsky's derivational terms to a derivation that would crash, since it would have had to carry a strong feature into the covert component. We could strengthen our definitions by requiring chains to have a continuously strong "prefix" followed by a continuously weak "suffix", but we do not do so at this point.
4 The copy language L w w revisited 4.1 Syntactic extension of the ^
lexicon
In general, the extension of heads from the numeration into pre-syntactic structures is non-deterministic. However, in the particular case of »Cyvyy, extension is deterministic. So in this section we show the "extended lexicon" for twwi which will make the proof in the following section that we have indeed defined a good deal easier. So, let us consider the particular case where we have a head which instantiates the lexical item f a , i.e., 0 : f*, F°,a° for example. The selectional features F° and a° will require the attachment of two dependent projections, so they require the construction of a phrasal spine with at least complement and specifier positions. No features of f a will require the attraction of an XP, so those will be all the positions which are required in f a P. The attachment of a specifier from some chain aP* will
328
Thomas L. Cornell
cancel the a° selection feature; that is, a° licensor will be mapped to the specifier position by the feature distribution map. The attachment of a f* complement will not allow the cancellation of the F° licensor, however. There are in principle two ways in which the F° licensor could then be cancelled. First, the complement f*P could attach one of its chain positions to a further specifier position of f a P. Alternatively the f*0 of the complement could attach one of its chain positions into f a 0 . We are not certain what should rule out movement of the complement to (non-thematic) specifier positions; this is a particular problem in a formalism which allows multiple specifiers. One possible solution might be to appeal to the relative simplicity of an XO as compared to an XP, invoking some sort of economy principle to the effect that only heads move unless some yet-to-be-clarified theory of pied-piping requires a larger structure to be displaced. This seems to follow the lines laid out in Chomsky (1995), according to which movement is only of individual features wherever possible. Another possible solution would be to develop the intuition that selection is a head-to-head phenomenon (cf. Koopman (1994), for example), implementing the intuition here via a requirement that selectional features are specifically XO-licensors, while for example case assignment features (if we made use of them) would be XPlicensors. The important point is that the corresponding derivational formalism sketched out in Cornell (1996) is subject to the same problem, and the same solution can most likely be applied in both cases. Certainly the two possibilities indicated above are equally applicable. So we assume without further argument that the only possibility available for checking the category feature of the complement against a selectional licensor is by head movement or, in our terms, by the attachment of an XO* position belonging to the complement into the XO of the host. Accordingly we construct f a 0 containing one proper extension of the head. That exhausts the stock of licensor features from the lexical item f a , so we cannot construct any more phrasal positions. This leaves us with the categorial licensee, the f* feature of f a itself. As a licensee, it must be associated with some position in either f a P* or f a 0*. So this question really boils down to the question of how far to extend f a P* and f a 0*. For example, if min(f a P*) is attached as a specifier somewhere, then that will cancel the f* feature, but if it is attached as a complement, then the f # feature can only be cancelled in a proper extension of either f a P* or f a 0*. We can in our particular case examine the lexicon and note that f* categories are always selected as complements, never as specifiers, so in this particular language the f* feature can not be cancelled in min(f a P*), and we will have
Representational Minimalism
329
to create a chain extension. Again we examine the lexicon for L·^ and note that all licensors f° or F° are selection features. 10 We have here assumed that selectional features will only attract XO's, so we extend f a 0* with one position and map the f* feature to that position. Given the identifications in (l)-(4), we can display the results as follows.
a0
I I r — F °
8,v. 62 adjunction, 277 adjunction structure, 267 left uniform, 282 natural, 279 ordered, 271 ordered homogeneous, 282 proper, 271 right uniform, 282 standard, 271 strictly complex, 290 adjunction tree, 276 ordered, 277 rigid, 280 algebra Σ-algebra, 202 alphabet derived, 204, 239 many-sorted, 200 append in iR(MSG), 61 assignment, 45 assignment tree, 45 attachment, 317 ATTRACT/MOVE, 2 5 2
attraction domain, 318 automaton push down, 148 tree-walking, 249 bar node, 118 binding, 77, 120 block, 267, 276
maximal, 274 bounded branching, 21 branch, 163 closed, 168 open, 168 satisfiable, 168 saturated, 168 schema, 163 branching factor bounded, 21 unbounded, 33 carrier, 202 category, 21, 52, 251, 306 slashed, 21 C F G ( £ ) , 95 chain, 148, 264, 304, 305, 311, 323 pre-syntactic, 253, 305, 315 chain formation, 312 checking configuration, 305, 310, 314, 318 domain, 318 child, 115 child language, 25 Chomsky hierarchy, 232 CLATOC), 114
clause, 163 schema, 164 clause union, 235 clone, 203 of term operations, 204 closure under intersection, 57 under renaming, 57 CLP-interpreter, 67 co-indexation, 120 coding of variables, 48
342
Index
combinatorial explosion, 50 command relation, 120, 263 ac-command, 285 asc-command, 291 awc-command, 288 c-command, 120 asymmetric directed, 47 category based, 53 definable, 264, 285 idc-command, 264 m-command, 120 sc-command, 291 strong, 283 tight, 263 wc-command, 288 weak, 283 complement position, 317 completeness, 62 composition of operations, 203 symbol, 239 concatenation, 127 condition Chain Link Condition, 324 Complementarity of Attachment, 317 intermediate, 124 Last Resort, 319 target, 123 Tree Condition, 322 Uniformity of Attachment, 317 Uniformity of Distribution, 319 constituent, 261, 272, 305, 313 strong, 268 weak, 268 constraint binding, 149 left, 134 movement, 149 node, 134 node type, 132 normal, 67 parent, 134 partial, 134 right, 134 short-path, 127 constraint language, 55
constraint-based grammar formalism, 113 containment, 251, 268, 276 context-free grammar, 21, 25 generalized, 35 context-free language, 26 coordination flat. 22, 33, 35 schema, iterating, 34 copy language, 145 lexicon, 307 primitives of, 306 coreference formula, 102 cross-serial dependencies, 145 crossing dependencies, 145, 307, 320 cut, 262 decidability constraint language, 57 of iR(MSO), 63 of WS2S, 48 definability (MS Ö) -relations, 65 inductive, 65 monadic second-order, 65 explicit, 65, 79 second-order, 65 inductive, 65 definable (sets of trees), 31 definite clause specification, 59 normal, 67 definiteness theorem, 60 deletion, 292 derivable by, 210 derivation, 232, 307, 312, 320 in TDGs, 175 description, 172 elementary, 173 description language, 180 description-based, 153 descriptive complexity theory, 213 domain, 263 attraction domain, 318 checking domain, 318 dominance, 22, 114,115,247 immediate, 22,117, 261 proper, 22,261 reflexive, 117
Index strong, 172 domination, 22, 261 domination structure, 261 ECFG(-C), 96 ECFGi (£), 109 empty node statement, 133 even-depth trees, 87, 89, 95,97 feature, 305,313,315,318 formal, 305 interpretable, 306, 315 orthographic, 306 strength of, 306, 315 strong, 310, 312 structure finite atomic, 162 weak, 312 feature checking, 305, 310, 311, 318 finitely presentable, 35 flat coordination, 22, 33, 35 F O ( A , £ ) , 90 FO(A + ,-C), 89, 92 formula (A, V,T)-formula, 160 schema, 163 free variables in MSG, 56 in !R(MSG), 58 frontier, 211 fusion, 294 generalized context-free grammar, 35 local set, 36 recognizable set context-free, 37 regular, 36 tree automaton, 36 generating function, 263 goal reduction, 62 government, 120 GPSG, 21, 33 grammar, 44, 69 p-index, 102 context-free, 21, 25, 95 1-extended, 109
extended, 96 positive, 25 indexed, 148 linear, 148 internal, 86, 103,104 universal, 84, 86,103 grammar formalism constraint-based, 113 head, 118, 303, 305, 311, 314, 315 morphological, 269 position, 314 homomorphism Σ-homomorphism, 202 HPSG, 21 hypergrammar, 35 immediate neighbors, 116 inclusion, 251, 268, 276 index, 145,310,311 inference rule, 163 inoperative, 238 isomorphism, 260 iterating coordination schema, 34 K-equivalent, 174 Kleene star, 127 -C 2 ,46 Lk,P, 21-24, 29-38,46, 56, 245 label, 262 assignment, 138 auxiliary, 127,128 predicate, 114, 116 statement, 122 label domain extended, 127 infinite, 145 labeled tree model, 221 labeling function, 262 language constraint, 55 R(MSO)-constraints, 58 M S Ο) -interpretation, 58 context-free, 26 grammar, 92,95 indexed, 148 logical
344
Index
monadic quantifier, 92 monadic second-order, 21,44 MSO, 21,44 multimodal prepositional, 92, 93 path, 33 principle, 86 Last Resort, 319 leaf, 25, 161,261,270 learnability, 22 lexical item, 305, 306 lexicon in £ 2 , 51 in iR(MSO), 72,73 licensee, 306,313,315 licensing, 318 licensor, 306, 313, 315 LIFT, 2 3 9
linear growth, 146 linear order, 261 linear ordering compatible, 261 linear precedence, 22 lists in MSO, 60 local relationship, 24 set, 21, 26 generalized, 36 tree, 24 locality, 304, 311,317 strong, 106 logic first-order, 115 monadic first order, 92 least fixpoint, 93 second order, 92, 221 transitive closure, 93 propositional dynamic, 94, 114,122 least fixpoint, 94 modal, 114, 122, 127 multimodal, 94 MERGE, 2 5 2 , 3 0 8 , 3 0 9 , 3 2 0 , 3 3 7
MLFP(A,£), 93 model
labeled tree model, 221 of S, 59 word model, 221 modularity, 84, 86,90,103 strong, 91 MONA tool. 51,68 monadic second-order language, 21,44 monostratal theory, 39 MOVE, 3 0 8 , 3 1 0 , 3 2 0 , 3 3 7
MSO, 55 MSO(A,£),92 MTC(A,£),93 name leaf, 172 marked, 173 minimal, 172 node address, 23 bar, 118 branching, 138 comparable, 261 disjunction, 138 head, 118 intermediate, 121 internal, 161 maximal, 118 overlapping, 262 preterminal, 261 source, 114, 121 target, 114, 121 terminal, 261 node compatibility, 283 node equivalence, 116 node trace, 274 non overlapping condition, 265 nop,146 normal constraints, 67 normal form disjunctive, 132 type, 132, 134 numeration, 303, 304, 314, 315 one-b-trees, 87,95, 96 OneB, 27 operation closed set of _s, 203
Index composition of _s, 203 derived, 202 projection, 203 operative, 238 IP-labeled tree, 24 parent, 115 parse in IR(MSG), 76 Johnson, 71 pure, 70 parser generate-and-test, 86 human, 86 principle-based, 86 parser generator, 86 parsing problem, SO, 69 path, 114, 120 atomic, 121 one-step, 136 statement, 122 path language, 33 PCP.63 PDL, 122 PDL(A,£), 94 PDL,(A,£),94 PDLr (£), 122 PDL} (£), 126 phrase, 304 phrase structure notation, 308 rules Fong, 74 nonlocal conditions, 76 PML, 127 Ρ Μ ί ( Δ + U Δ , £ ) , 94 ΡμΙ·(Δ,£,), 94 ΡΜΙ_(Δ,£), 109 PMLT ( £ , n ) , 127 pop Q , 146 position, 261, 315, 318 complement, 317 head, 314 specifier, 317 spell-out, 314 strong, 268 weak, 268
positive context-free grammar, 25 precede, 116 precedence, 22,114, 115,244 principle, 83, 84,113,115 checker, 86 checking, 85 compiler, 85, 87 G/B, 113 language, 85, 86 universal, 39 principles-and-parameters paradigm, 84 projection level of, 118 line, 118 operation, 203 symbol, 239 p u s h a , 146 rational trees disjunctive, 115 regular, 115 recognizable set, 27 generalized, 36 Rabin, 30 regular generalized context-free grammar, 35 local set, 36 recognizable set, 36 tree automaton, 36 regular tree-node relation, 249 relation between categories, 251 downward, 138 geometric, 250 linguistic, 251 rightward, 138 relational constraint language IR (MS Ο) -constraints, 58 MS Ο)-interpretation, 58 Remnant Topicalization, 323 renaming, 57 resource, 304, 305, 313, 319, 337 !R(MSG), 58 root, 115 rule equality, 166 equivalence, 166
345
346
Index
label, 166 logical, 165 resolution, 165 structural, 165 tree, 165 run (of an automaton), 27 S-answer, 62 satisfaction, 24 satisfiability (tree logic), 162 in3t(MSG), 63 in MSO, 45 schema, 294 branch, 163 clause, 164 coordination, iterating, 34 formula, 163 invariant under deletion, 294 invariant under fusion, 294 invariant under tagging, 294 segment, 267 set local, 21, 26 generalized, 36 recognizable, 27 generalized, 36 Rabin, 30 strongly context-free, 21 sibling, 116 left-hand, 116 right-hand, 116 sibling precedence, 117 immediate, 117 sister, 261 slashed category, 21 SnS, 29 weak, 29 solution of a constraint, 55 ScuS, 21 soundness, 62 source node, 114,121 specifier position, 317 spell-out, 304, 312, 317 position, 314 spine, 305, 311
stack, 146 action, 146 alphabet, 146 depending, 146 formalism functional, 147 predicate, 146 standard translation, 124 string, 261 associated, 262 language of a TDG, 182 length, 261 strongly context-free set, 21 strongly local, 106 strongly modular, 91 structure, 260 φ-Structure, 319 for Lj