212 96 797KB
English Pages 180 [188] Year 2008
Logics for Linguistic Structures
≥
Trends in Linguistics Studies and Monographs 201
Editors
Walter Bisang Hans Henrich Hock (main editor for this volume)
Werner Winter
Mouton de Gruyter Berlin · New York
Logics for Linguistic Structures
Edited by
Fritz Hamm Stephan Kepser
Mouton de Gruyter Berlin · New York
Mouton de Gruyter (formerly Mouton, The Hague) is a Division of Walter de Gruyter GmbH & Co. KG, Berlin.
앝 Printed on acid-free paper which falls within the guidelines 앪 of the ANSI to ensure permanence and durability.
Library of Congress Cataloging-in-Publication Data Logics for linguistic structures / edited by Fritz Hamm and Stephan Kepser. p. cm. ⫺ (Trends in linguistics ; 201) Includes bibliographical references and index. ISBN 978-3-11-020469-8 (hardcover : alk. paper) 1. Language and logic. 2. Computational linguistics. I. Hamm, Fritz, 1953⫺ II. Kepser, Stephan, 1967⫺ P39.L5995 2008 401⫺dc22 2008032760
ISBN 978-3-11-020469-8 ISSN 1861-4302 Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.d-nb.de. ” Copyright 2008 by Walter de Gruyter GmbH & Co. KG, D-10785 Berlin All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording or any information storage and retrieval system, without permission in writing from the publisher. Cover design: Christopher Schneider, Berlin. Printed in Germany.
Contents
Introduction Fritz Hamm and Stephan Kepser
1
Type Theory with Records and unification-based grammar Robin Cooper
9
One-letter automata: How to reduce k tapes to one Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz
35
Two aspects of situated meaning Eleni Kalyvianaki and Yiannis N. Moschovakis
57
Further excursions in natural logic: The Mid-Point Theorems Edward L. Keenan
87
On the logic of LGB type structures. Part I: Multidominance structures Marcus Kracht
105
Completeness theorems for syllogistic fragments Lawrence S. Moss
143
List of contributors
175
Index
179
Introduction Fritz Hamm and Stephan Kepser
Logic has long been playing a major role in the formalization of linguistic structures and linguistic theories. This is certainly particularly true for the area of semantics, where formal logic has been the major tool ever since the Fregean program. In the area of syntax it was the rising of principles based theories with the focus shifting away from the generation process of structures to defining general well-formedness conditions of structures that opened the way for logic. The naturalness by which many types of wellformedness conditions can be expressed in some logic or other led to different logics being proposed and used in diverse formalizations of syntactic theories in general and the field of model theoretic syntax in particular. The contributions collected in this volume address central topics in theoretical and computational linguistics, such as quantification, types of context dependence and aspects concerning the formalisation of major grammatical frameworks, among others GB, DRT and HPSG. All contributions have in common a strong preference for logic as the major tool of analysis. Two of them are devoted to formal syntax, three to aspects of logical semantics. The paper by Robin Cooper contributes both to syntax and semantics. We therefore grouped the description of the papers in this preface in a syntactic and a semantic section with Cooper’s paper as a natural interface between these two fields. The contribution by Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz belongs to the field of finite state automata theory and provides a method how to reduce multi tape automata to single tape automata. Multi tape finite state automata have many applications in computer science, they are particularly frequently used in many areas of natural language processing. A disadvantage for the usability of multi tape automata is that certain automata constructions, namely composition, projection, and cartesian product, are a lot more complicated than for single tape automata. A reduction of multi tape automata to single tape automata is thus desirable. The key construction by Ganchev, Mihov, and Schulz in the reduction is the definition of an automaton type that bridges between multi and single tape automata. The authors introduces so-called one-letter automata. These
2 Fritz Hamm and Stephan Kepser are multi tape automata with a strong restriction. There is only one type of transitions permitted and that is where only a single letter in the k-tuple of signature symbols in the transition differs from the empty word. In other words, all components of the tuple are the empty word with the exception of one component. Ganchev, Mihov, and Schulz show that one-letter automata are equivalent to multi tape automata. Interestingly, one-letter automata can be regarded as single tape automata over an extended alphabet which consists of complex symbols each of which is a k-tuple with exactly one letter differing from the empty word. One of the known differences between multi dimensional regular relations and one dimensional regular relations is that the latter are closed under intersection while the former are not. To cope with this difference, Ganchev, Mihov, and Schulz define a criterion on essentiality of a component in a kdimensional regular relation and show that the intersection of two k-dimensional regular relations is regular if the two relations share at most one essential component. This result can be extended to essential tapes of one-letter automata and the intersection of these automata. On the basis of this result Ganchev, Mihov, and Schulz present automata constructions for insertion, deletion, and projection of tapes as well as composition and cartesian product of regular relations all of which are based on the corresponding constructions for single tape automata. This way the effectiveness of one-letter automata is shown. It should hence be expected that this type of automata will be very useful for practical applications. The contribution by Marcus Kracht provides a logical characterisation of the datastructures underlying the linguistic frameworks Government and Binding and Minimalism. Kracht identifies so-called multi-dominance structures as the datastructures underlying these theories. A multi-dominance structure is a binary tree with additional immediate dominance relations which have a restricted distribution in the following way. All parents of additional dominance relations must be found on the path from the root to the parent of the base dominance relation. Additional dominance relations provide a way to represent movement of some component from a lower part in a tree to a position higher up. The logic chosen by Kracht to formalize multi-dominance structures is propositional dynamic logic (PDL), a variant of modal logic that has been used before on many occasions by Kracht and other authors to formalize linguistic theories. In this paper, Kracht shows that PDL can be used to axiomatize multi-dominance structures. This has the important and highly desirable
Introduction 3
consequence that the dynamic logic of multi-dominance structures is decidable. The satisfiability of a formula can be decided in in 2EXPTIME. In order to formalize a linguistic framework it is not enough to provide an axiomatisation of the underlying datastructures only. The second contribution of this paper is therefore a formalisation of important grammatical concepts and notions in the logic PDL. This formalisation is provided for movement and its domains, single movement, adjunction, and cross-serial dependencies. In all of these, care is taken to ensure that the decidability result of multi-dominance structures carries over to grammatical notions defined on these structures. It is thereby shown that large parts of the linguistic framework Government and Binding can be formalized in PDL and that this formalisation is decidable. The contribution by Robin Cooper shows how to render unification based grammar formalisms with type theory using record structures. The paper is part of a broader project which aims at providing a coherent unified approach to natural language dialog semantics. The type theory underlying this work is based on set theory and follows Montague’s style of recursively defining semantic domains. There are functions and function types available in this type theory providing a version of the typed λ-calculus. To this base records are added. A record is a finite set of fields, i.e., ordered pairs of a label and an object. A record type is accordingly a finite set of ordered pairs of a label and a type. Records and record types may be nested. The notions of dependent types and subtype relations are systematically extended to be applicable to record types. The main contribution of this paper is a type theoretical approach to unification phenomena. Feature structures of some type play an important role in almost all modern linguistic frameworks. Some frameworks like LFG and HPSG make this rather explicit. They also provide a systematic way to combine two feature structures partially describing some linguistic object. This combination is based on ideas of unification even though this notion need no longer be explicitely present in the linguistic frameworks. In a type theoretical approach, records and their types render feature structures in a rather direct and natural way. The type theoretical tools to describe unification are meet types and equality. Cooper assumes the existence of a meet type for each pair of types in his theory including record types. He provides a function that recursively simplifies a record type. This function is particularly applicable to record types which are the result of the construction of a meet record type and should be interpreted as the counterpart of unification in type
4 Fritz Hamm and Stephan Kepser theory with records. There are though important differences to feature structure unification. One of them is that type simplification never fails. If the meet of incompatible types was constructed, the simplification will return a distinguished empty type. The main advantage of this approach is that it provides a kind of intensionality which is not available for feature structures. This intensionality can be used, e.g., to distinguish equivalent types such as the source of grammatical information. It can also be used to assign different empty types with different ungrammatical phrases. This may provide a way to support robust parsing in that ungrammatical phrases can be processed and the the consistent parts of their record types may contain useful informations for further processing. Type theory with records also offers a very natural way to integrate sematic analyses into syntactic analyses based on feature structures.
The paper by Eleni Kalyvianaki and Yiannis Moschovakis contains a sophisticated application of the theory of referential intensions developed by Moschovakis in a series of papers (see for instance (Moschovakis 1989a,b, 1993, 1998)), and applied to linguistics in (Moschovakis 2006). Based on the theory of referential intensions the paper introduces two notion of context dependent meaning, factual content and local meaning, and shows that these notions solve puzzles in philosophy of language and linguistics, especially those concerning the logic of indexicals. Referential intension theory allows to define three notions of synonymy, namely referential synonymy, local synonymy, and factual synonymy. Referential synonymy, the strongest concept, holds between two terms A and B iff there referential intensions are the same; i.e., int(A) = int(B). Here the referential intension of an expression A, int(A) is to be understood as the natural algorithm (represented as a set–theoretical object) which computes the denotation of A with respect to a given model. Thus referential synonymy is a situation independent notion of synonymy. This contrasts with the other two notions of synonymy which are dependent on a given state a. Local synonymy is synonymy with regard to local meaning where the local meaning of an expression A is computed from the referential intension of A applied to a given state a. It is important to note that for the constitution of the local meaning of A the full meanings of the parts of A have to be computed. In this respect the concept of local meaning differs significantly from the notion factual content and for this reason from the associated notion of synonymy as well. This is best explained by way of an example.
Introduction 5
If in a given state a her(a) = Mary(a) then the factual content of the sentence John loves her is the same as the factual content of John loves Mary. The two sentences are therefore synonymous with regard to factual content. But they are not locally synonymous since the meaning of her in states other than a my well be different from the meaning of Mary. The paper applies these precisely defined notions to Kaplan’s treatment of indexicals and argues for local meanings as the most promising candidates for belief carriers. The paper ends with a brief remark on what aspects of meaning should be preserved under translation. The paper by Edward L. Keenan tries to identify inference patterns which are specific for proportionality quantifiers. For instance, given the premisses (1-a), (1-b) in (1) we may conclude (1-c). (1)
a. b. c.
More than three tenths of the students are athletes. At least seven tenths of the students are vegetarians. At least one student is both an athlete and a vegetarian.
This is an instance of the following inference pattern: (2)
a. b. c.
More than n/m of the As are Bs. At least 1 − n/m of the As are Cs. Ergo: Some A is both a B and a C.
Although proportionality quantifiers satisfy inference pattern (2), other quantifiers do so as well, as observed by Dag Westerst˚ahl. Building on (Keenan 2004) the paper provides an important further contribution to the question whether there are inference patterns specific to proportionality quantifiers. The central result of Keenan’s paper is the Mid-Point Theorem and a generalization thereof. The Mid-Point Theorem Let p, q be fractions with 0 ≤ p ≤ q ≤ 1 and p + q = 1. Then the quantifiers (BETW EEN p AND q) and (MORE T HAN p AND LESS T HAN q) are fixed by the postcomplement operation. The postcomplement of a generalized quantifier Q is that generalized quantifier which maps a set B to Q(¬B). The following pair of sentences illustrates this operation:
6 Fritz Hamm and Stephan Kepser (3)
a. b.
Exactly half the students got an A on the exam. Exactly half the students didn’t get an A on the exam.
The mid-point theorem therefore guarantees the equivalence of sentences (4-a) and (4-b); and analogously the equivalence of sentences formed with MORE T HAN p AND LESS T HAN q). (4)
a. b.
Between one sixth and five sixth of the students are happy. Between one sixth and five sixth of the students are not happy.
However, this and the generalization of the mid-point theorem are still only partial answers to the question concerning specific inference patterns for proportionality quantifiers, since non-proportional determiner exist which still satisfy the conditions of the generalized mid-point-theorem. The paper by Lawrence S. Moss studies syllogistic systems of increasing strength from the point of view of natural logic (for a discussion of this notion, see Purdy (1991)). Moss proves highly interesting new completeness results for these systems. More specifically, after proving soundness for all systems considered in the paper the first result states the completeness of the following two axioms for L (all) a syllogistic fragment containing only expressions of the form All X are Y:
All X are X
All X are Z All Z are Y All X are Y
In addition to completness the paper studies a further related but stronger property, the canonical model property. A system which has the canonical model property is also complete, but this does not hold vice versa. Roughly, a model M is canonical for a fragment F , a set Γ of sentences in F and a logical system for F if for all S ∈ F , M |= S iff Γ S. A fragment F has the canonical model if every set Γ ⊆ F has a canonical model. The canonical model property is a rather strong property. Classical propositional logic, for instance, does not have this property, but the fragment L (all) has it. Some but not all of the systems in the paper have the canonical model property. Other system studied in Moss’ paper include Some X are Y, combinations of this system with L (all) and sentences involving proper names, systems with Boolean combinations, a combination of L (all) with There are at least s many X as Y, logical theories for Most and Most + Some. The largest logical system for which completeness is proved adds ∃≥ to the theory
Introduction 7
L (all, some, no, names) with Boolean operations, where ∃≥ (X ,Y ) is considered true in case X contains more elements than Y . Moss’ paper contains two interesting digressions as well. The first is concerned with sentences of the form All X which are Y are Z, the second with most. For instance, Moss proves that the following two axioms are complete for most. Most X are Y Most X are X
Most X are Y Most Y are Y
Moreover, if Most X are Y does not follow from a set of sentences Γ then there exists a model of Γ with cardinality ≤ 5 which falsifies Most X are Y. All papers collected in this volume grew out of a conference in honour of Uwe M¨onnich which was held in Freudenstadt in November 2004. Since this event four years elapsed. But another important date is now imminent, Uwe’s birthday. Hence we are in the lucky position to present this volume as a Festschrift for Uwe M¨onnich on the occasion of his 70th birthday.
T¨ubingen, July 2008
Fritz Hamm and Stephan Kepser
References Keenan, Edward L. 2004 Excursions in natural logic. In Claudia Casadio, Philip J. Scott, and Robert A.G. Seely, (eds.), Language and Grammar: Studies in Mathematical Linguistics and Natural Language. Stanford: CSLI. Moschovakis, Yiannis 1989a The formal language of recursion. The Journal of Symbolic Logic 54: 1216–1252.
8 Fritz Hamm and Stephan Kepser 1989b
A mathematical modeling of pure recursive algorithms. In Albert R. Meyer and Michael Taitslin, (eds.), Logic at Botik ’89, LNCS 363. Berlin: Springer. 1993 Sense and denotation as algorithm and value. In Juha Oikkonen and Jouko V¨aa¨ n¨anen, (eds.), Logic Colloquium ’90. Natick, USA: Association for Symbolic Logic, A.K. Peters, Ltd. 1998 On founding the theory of algorithms. In Harold Dales and Gianluigi Oliveri, (eds.), Truth in Mathematics. Oxford University Press. 2006 A logical calculus of meaning and synonymy. Linguistics and Philosophy 29: 27–89. Purdy, William C. 1991 A logic for natural language. Notre Dame Journal of Formal Logic 32: 409–425.
Type Theory with Records and unification-based grammar Robin Cooper
Abstract We suggest a way of bringing together type theory and unification-based grammar formalisms by using records in type theory. The work is part of a broader project whose aim is to present a coherent unified approach to natural language dialogue semantics using tools from type theory.
1.
Introduction
Uwe M¨onnich has worked both on the use of type theory in semantics and on formal aspects of grammar formalisms. This paper suggests a way of bringing together type theory and unification as found in unification-based grammar formalisms like HPSG by using records in type theory which provide us with feature structure like objects. It represents a small offering to Uwe to thank him for many kindnesses over the years sprinkled with insights and rigorous comments. This work is part of a broader project whose aim is to present a coherent unified approach to natural language dialogue semantics using tools from type theory. We are seeking to do this by bringing together Head Driven Phrase Structure Grammar (HPSG) (Sag et al. 2003), Montague semantics (Montague 1974), Discourse Representation Theory (DRT) (Kamp and Reyle 1993; van Eijck and Kamp 1997, and much other literature), situation semantics (Barwise and Perry 1983) and issue-based dialogue management (Larsson 2002) into a single type-theoretic formalism. A survey of our approach to the semantic theories (i.e., Montague semantics, DRT and situation semantics) and HPSG can be found in (Cooper 2005b). Other work in progress can be found on http://www.ling.gu.se/˜ cooper/records. We give a brief summary here: Record types can be used as discourse representation structures (DRSs). Truth of a DRS corresponds to there being an object of the appropriate record type and this gives us the effect of simultaneous binding of discourse referents (corresponding to labels in records) familiar from the
10 Robin Cooper semantics of DRSs in (Kamp and Reyle 1993). Dependent function types provide us with the classical treatment of donkey anaphora from DRT in a way corresponding to the type theoretic treatment proposed by M¨onnich (1985), Sundholm (1986) and Ranta (1994). At the same time record types can be used as feature structures of the kind found in HPSG since they have recursive structure and induce a kind of subtyping which can be used to mimic unification. Because we are using a general type theory which includes records we have functions available and a version of the λ-calculus. This means that we can use Montague’s λ-calculus based techniques for compositional interpretation. From the HPSG perspective this gives us the advantage of being able to use “real” variable binding which can only be approximately simulated in pure unification based systems. From the DRT perspective this use of compositional techniques gives us an approach similar to that of Muskens (1996) and work on λ-DRT (Kohlhase et al. 1996). In this paper we will look at the notion of unification as used in unificationbased grammar formalisms like HPSG from the perspective of the type theoretical framework. This work has been greatly influenced by work of Jonathan Ginzburg (for example, Ginzburg in prep, Chap. 3). In Section 2 we will give a brief informal introduction to our view of type theory with records. The version of type theory that we discuss has been made more precise in (Cooper 2005a) and in an implementation called TTR (Type Theory with Records) which is under development in the Oz programming language. In Section 3 we will discuss the notion of subtype which records introduce (corresponding to the notion of subsumption in the unification literature). We will then, in Section 4, propose that linguistic objects are to be regarded as records whereas feature structures are to be regarded as corresponding to record types. Type theory is “function-based” rather than “unification-based”. However, the addition of records to type theory allows us to get the advantages of unification without having to leave the “function-based” approach. We show how to do this in Section 5 treating some classical simple examples which have been used to motivate the use of unification. Section 6 deals with the way in which unification analyses are used to allow the extraction of linguistic generalizations as principles in the style of HPSG. The conclusion (Section 7) is that by using record types within a type theory we can have the advantages of unification-based approaches together with an additional intensionality not present in classical unification approaches and without the disadvantage of leaving the “function-based” approach which is necessary in order to deal adequately with semantics (at least).
TTR and unification-based grammar 11
2.
Records in type theory
In this section1 we give a very brief intuitive introduction to the kind of type theory we are employing. A more detailed and formal account can be found in (Cooper 2005a) and work in progress on the project can be found on http://www.ling.gu.se/˜ cooper/records. While the type theoretical machinery is based on work carried out in the Martin-L¨of approach (Coquand et al. 2004; Betarte 1998; Betarte and Tasistro 1998; Tasistro 1997) we are making a serious attempt to give it a foundation in standard set theory using Montague style recursive definitions of semantic domains. There are two main reasons for this. The first is that we think it important to show the relationship between the Montague model theoretic tradition which has been developed for natural language semantics and the proof-theoretic tradition associated with type theory. We believe that the aspects of this kind of type theory that we need can be seen as an enrichment of Montague’s original programme. The second reason is that we are interested in exploring to what extent intuitionistic and constructive approaches are appropriate or necessary for natural language. For example, we make important use of the notion “propositions as types” which is normally associated with an intuitionistic approach. However, we suspect that our Montague-like approach to defining the type theory to some extent decouples the notion from intuitionism. We would like to see type theory as providing us with a powerful collection of tools for natural language analysis which ultimately do not commit one way or the other to philosophical notions associated with intuitionism. The central idea of records and record types can be expressed informally as follows, where T (a1 , . . . , an ) represents a type T which depends on the objects a1 , . . . , an . If a1 : T1 , a2 : T2 (a1 ), . . . , an : Tn (a1 , a2 , . . . , an−1 ), a record: ⎤ ⎡ = a1 l1 ⎢ l2 = a2 ⎥ ⎥ ⎢ ⎥ ⎢ ... ⎥ ⎢ ⎣ ln = an ⎦ ... is of type: ⎡ : T1 l1 ⎢ l2 : T2 (l1 ) ⎢ ⎣ ... : Tn (l1 , l2 , . . . , ln−1 ) ln
⎤ ⎥ ⎥ ⎦
12 Robin Cooper A record is to be regarded as a finite set of fields , a , which are ordered pairs of a label and an object. A record type is to be regarded as a finite set of fields , T which are ordered pairs of a label and a type. The informal notation above suggests that the fields are ordered with types being dependent on previous fields in the order. This is misleading in that we regard record types as sets of fields on which a partial order is induced by the dependency relation. Dependent types give us the possibility of relating the values in fields to each other and play a crucial role in our treatment of both feature structures and semantic objects. Both records and record types are required to be the graphs of functions, that is, if , α and , β are members of a given record or record type then = . A record r is of record type R just in case for each field , T in R there is a field , a in r (i.e., with the same label) and a is of type T . Notice that the record may have additional fields not mentioned in the type. Thus a record will generally belong to several record types and any record will belong to the empty record type. This gives us a notion of subtyping which we will explore further in Section 3. Let us see how this can be applied to a simple linguistic example. We will take the content of a sentence to be modelled by a record type. The sentence a man owns a donkey corresponds to a record type: ⎤ ⎡ x : Ind ⎥ ⎢ c1 : man(x) ⎥ ⎢ ⎥ ⎢ y : Ind ⎥ ⎢ ⎣ c2 : donkey(y) ⎦ c3 : own(x,y) A record of this type will be: ⎤ ⎡ x = a ⎢ c1 = p1 ⎥ ⎥ ⎢ ⎢ y = b ⎥ ⎥ ⎢ ⎣ c2 = p2 ⎦ c3 = p3 where a, b are of type Ind, individuals p1 is a proof of man(a) p2 is a proof of donkey(b) p3 is a proof of own(a, b).
TTR and unification-based grammar 13
Note that the record may have had additional fields and still be of this type. The types ‘man(x)’, ‘donkey(y)’, ‘own(x,y)’ are dependent types of proofs (in a convenient but not quite exact abbreviatory notation – we will give a more precise account of dependencies within record types in Section 3). The use of types of proofs for what in other theories would be called propositions is often referred to as the notion of “propositions as types”. Exactly what type ‘man(x)’ is depends on which individual you choose in your record to be labelled by ‘x’. If the individual a is chosen then the type is the type of proofs that a is a man. If another individual d is chosen then the type is the type of proofs that d is a man, and so on. What is a proof? MartinL¨of considers proofs to be objects rather than arguments or texts. For nonmathematical propositions proofs can be regarded as situations or events. For useful discussion of this see (Ranta 1994, p. 53ff). We discuss it in more detail in (Cooper 2005a). There is an obvious correspondence between this record type and a discourse representation structure (DRS) as characterised in (Kamp and Reyle 1993). The characterisation of what it means for a record to be of this type corresponds in an obvious way to the standard embedding semantics for such a DRS which Kamp and Reyle provide. Records (and record types) are recursive in the sense that the value corresponding to a label in a field can be a record (or record type)2 . For example, ⎡ ⎡ ⎤ ⎤ ff = a f = ⎢ f = ⎣ ⎦ ⎥ gg = b ⎢ ⎥ ⎥ r=⎢ ⎢ g = c ⎥ ⎣ ⎦ g = a g = h = h = d is of type ⎡ ⎢ f ⎢ R=⎢ ⎢ ⎣ g
⎡ : :
⎣ f g h
:
ff gg
: T3 g : h
: : : :
T1 T2
⎤ ⎤
⎦ ⎥ ⎥ ⎥ ⎥ ⎦ T1 T4
given that a : T1 , b : T2 , c : T3 and d : T4 . We can use path-names in records and record types to designate values in particular fields, e.g. ⎡ ⎤ ff = a f = ⎦ gg = b r.f = ⎣ g = c R.f.f.ff = T1
14 Robin Cooper The recursive nature of records and record types will be important later in the paper when we use record types to correspond to linguistic feature structures. Another important aspect of the type theory we are using is that types themselves can also be treated as objects.3 A simple example of how this can be exploited is the following representation for a girl believes that a man owns a donkey. This is a simplified version of the treatment discussed in (Cooper 2005a). ⎡ ⎤ x : Ind ⎢ c1 : girl(x) ⎥ ⎢ ⎡ ⎤ ⎥ ⎢ ⎥ y : Ind ⎢ ⎥ ⎢ c3 : man(y) ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ ⎥) ⎥ ⎢ c2 : believe(x, ⎢ z : Ind ⎢ ⎥ ⎥ ⎢ ⎣ c4 : donkey(z) ⎦ ⎦ ⎣ c5 : own(y, z) The treatment of types as first class objects in this way is a feature which this type theory has in common which situation theory and it is an important component in allowing us to incorporate analyses from situation semantics in our type theoretical treatment. The theory of records and record types is embedded in a general type theory. This means that we have functions and function types available giving us a version of the λ-calculus. We can thus use Montague’s techniques for compositional interpretation. For example, we can interpret the common noun donkey as a function which maps records r of the type x:Ind (i.e. records which introduce an individual labelled with the label ‘x’) to a record type dependent on r. We notate the function as follows:
λr: x:Ind ( c:donkey(r.x) ) The type of this function is
P = ( x:Ind )RecType This corresponds to Montague’s type e,t (the type of functions from individuals (entities) to truth-values). In place of individuals we use records introducing individuals with the label ‘x’ and in place of truth-values we use record types which, as we have seen above, correspond to an intuitive notion of proposition (in particular a proposition represented by a DRS). Using the power of the λ-calculus we can treat determiners Montague-style as functions which take two arguments of type P and return a record type. For example, we represent the indefinite article by
TTR and unification-based grammar 15
λR1 :( x:Ind )RecType
λR2 :( x:Ind )RecType ⎡ par : (⎣ restr : scope :
x : Ind R1 @ par R2 @ par
⎤ ⎦)
Here we use F @ a to represent the result of applying function F to argument a. The type theory includes dependent function types. These can be used to give a classical treatment of universal quantification corresponding to DRT’s ‘⇒’. For example, an interpretation of every man owns a donkey can be the following record type: ⎤ ⎤ ⎡ ⎡ y : Ind ⎣ f : (r : x : Ind ) ⎣ c2 : donkey(y) ⎦ ⎦ c1 : man(x) c3 : own(r.x,y) Records of the type ⎡ y x : Ind (r : ) ⎣ c2 c1 : man(x) c3
: : :
⎤ Ind donkey(y) ⎦ own(r.x,y)
map records r of type x : Ind c1 : man(x) to records of type ⎤ ⎡ y : Ind ⎣ c2 : donkey(y) ⎦ c3 : own(r.x,y) Our interpretation of every man owns a donkey requires that there exist a function of this type. Why do we use the record type with the label ‘f’ rather than the function type itself as the interpretation of the sentence? One reason is to achieve a uniform treatment where the interpretation of a sentence is always a record type. Another reason is that the label gives us a handle which can be used to anaphorically refer to the function. This can, for example, be exploited in so-called paycheck examples (Karttunen 1969) such as Everybody receives a paycheck. Not everybody pays it into the bank immediately, though. The final notion we will introduce which is important for the modelling of HPSG typed feature structures as record types is that of manifest field.
16 Robin Cooper This notion is introduced in (Coquand et al. 2004). It builds on the notion of singleton type. If a : T , then Ta is a singleton type and b : Ta iff b = a. A manifest field in a record type is one whose type is a singleton type, e.g.
x : Ta written for convenience as
x=a : T This notion allows record types to be “progressively instantiated”, i.e. intuitively, for values to be specified within a record type. A record type that only contains manifest fields is completely instantiated and there will be exactly one record of that type. We will allow dependent singleton types, where a in Ta can be represented by a path in a record type. Manifest fields are important for the modelling of HPSG-style unification in type theory with records. 3.
Dependent types and the subtype relation
We are now in a position to give more detail about the treatment of dependencies in record types. Dependent types within record types are treated as pairs consisting of functions and sequences of path-names providing corresponding to the arguments required by the functions. Thus the type on p. 12 corresponding to a man owns a donkey is in official, though less readable, notation: ⎤ ⎡ x : Ind ⎥ ⎢ c1 : λv :Ind(man(v)), x ⎥ ⎢ ⎥ ⎢ y : Ind ⎥ ⎢ ⎦ ⎣ c2 : λv :Ind(donkey(v)), y c3 : λv :Ind(λw :Ind(own(v, w))), x,y This enables us to give scope to dependencies outside the object in which the dependency occurs. Thus if, on the model of the type for a girl believes that a man owns a donkey on p. 14, we wish to construct a type corresponding to a girl believes that she owns a donkey (where she is anaphorically related to a girl), this can be done as follows: ⎤ ⎡ x : Ind ⎥ ⎢ c1 : λv :Ind(girl(v)), x ⎢ ⎤ ⎥ ⎡ ⎥ ⎢ z : Ind ⎥ ⎢ ⎢ c2 : λu :Ind(believe(u, ⎣ c4 : λv :Ind(donkey(v)), z ⎦)), ⎥ ⎥ ⎢ ⎦ ⎣ c5 : λv :Ind(own(u, v)), z x
TTR and unification-based grammar 17
There are two kinds of path-names which can occur in such types: relative path-names which are constructed from labels, 1 . . . . .n , and absolute pathnames which refer explicitly to the record, r in which the path-name is to be evaluated, r.1 . . . . .n . A dependent record type is a set of pairs of the form , T where is a label and T is either a type, a dependent record type or a pair consisting of a function and a sequence of path-names as characterized above. An anchor for a dependent record type T of the form ⎡ ⎤ 1 : T1 ⎣ ... ⎦ : Tn n is a record, h, such that for each Ti of the form f , π1 , . . . , πm , πi is either an absolute path or a path defined in h, and for each Ti which is a dependent record type, h is also an anchor for Ti . In addition we require that the result of anchoring T with h as characterized below is well-defined, i.e., that the anchor provides arguments of appropriate types to functions and provides objects of appropriate types for the construction of singleton types as required by the anchoring. The result of anchoring T with h, T [h] is obtained by replacing 1. each Ti in T of the form f , π1 , . . . , πm with f (π1 [h]) . . . (πm [h]) (where πi [h] is the value of h.πi if πi is a relative path-name and the value of πi if πi is an absolute path-name) 2. each Ti in T which is a dependent record type with Ti [h] 3. each basic type which is the value of a path π in T and for which h.π is defined, with Ta , where a is the value of h.π, i.e. the singleton type obtained from T and the value of h.π. A dependent record type T is said to be closed just in case each path which T requires to be defined in an anchor for T is defined within T itself. It is the closed dependent record types which belong to our type universe. If T is a closed dependent record type then r : T if and only if r : T [r]. Let us return to our type above for a girl believes that she owns a donkey. An anchor for this type is
x = m (where m is an object of type Ind) and the result of anchoring the type with this record is
18 Robin Cooper ⎡
x=m ⎢ c1 ⎢ ⎢ ⎢ ⎣ c2
⎤
: :
Ind girl(m)
:
z believe(m, ⎣ c4 c5
⎡
: : :
⎥ ⎤ ⎥ ⎥ Ind ⎥ ⎦ λv :Ind(donkey(v)), z ) ⎦ λv :Ind(own(m, v)), z
Notice that the anchor has no effect on dependencies with scope within the argument type corresponding to that she owns a donkey but only the dependency with scope external to it. We now turn our attention to the subtype relation. Record types introduce a notion of subtype which corresponds to what is known as subsumption in the unification literature. The subtype relation can be characterized model theoretically as follows: If T1 and T2 are types, then T1 is a subtype of T2 (T1 T2 ) just in case {a | a :M T1 } ⊆ {a | a :M T2 } for all models M. where the right-hand side of this equivalence refers to sets in the sense of classical set theory and models as defined in (Cooper 2005a). These models assign sets of objects to the basic types and sets of proofs to proof-types constructed from predicates with appropriate arguments, e.g. if k is a woman according to model M then M will assign a non-empty set of proofs to the type woman(k). Such models correspond to sorted first-order models. If this notion of subtype is to be computationally useful, we need some way of computing whether two types stand in the subtype relation without having to compute the sets of objects which belong to those types in all possible models. Thus we define another relation c which is computed without reference to the models. The approach taken to this in the implementation TTR is to instantiate (dependent) record types, R, recursively as an anchor for R introducing arbitrary formal objects guaranteed to be of the appropriate type. Basic types and types constructed with predicates are instantiated to arbitrary formal objects guaranteed to be of the type (in the implementation, pairings of gensym atoms with the type); singleton types are instantiated to the object used to define the type; record type structures are instantiated to records containing the instantiations of the types (or anchors for the dependent record types) in each field; similar instantiations are given for other complex types. Thus the instantiation of the record type
TTR and unification-based grammar 19
⎡
f ⎢ g=b ⎢ ⎣ h
: :
T1 T 2
:
⎤ i j
can be represented as: ⎡ f = a0#T1 ⎢ g = b ⎢ ⎣ i = h = j =
: :
⎥ ⎥ ⎦ r1 (f,g) r2 (g,f) ⎤ ⎥ ⎥ ⎦ a1#r1 (a0#T1 , b) a2#r2 (b, a0#T1 )
We use Inst(T ) to represent such an instantiation of type T . T1 c T2 just in case Inst(T1 ) : T2 . One advantage of this approach is that the computation of the subtype relation will be directly dependent on the of-type relation. If T1 is a record type containing a superset of the fields of the record type T2 then T1 c T2 as desired for the modelling of subsumption in unification systems. Thus, for example, ⎤ ⎡ f:T1 ⎣g:T2 ⎦ c f:T1 g:T2 h:T3 This method of computing the subtype relation appears to be sound with respect to the models but not necessarily complete since it does not take account of the logic associated with predicates. For example, later in the paper we will make use of an equality predicate. The predicate eq is such that a : eq(T, x, y) iff a = x, y , x, y : T , and x = y. Now consider that the type ⎡ ⎤ x:T ⎣y:T ⎦ c:r(x) is not a subtype of ⎡ ⎤ x:T ⎣y:T ⎦ c:r(y) whereas according to the model theoretic definition ⎡ ⎤ ⎡ ⎤ x:T x:T ⎢y:T ⎥ ⎢ ⎥ ⎣y:T ⎦ ⎣c:r(x) ⎦ c:r(y) d:eq(T ,x,y)
20 Robin Cooper since anything of the first type must also be of the second type. The instantiation of the first type ⎡ ⎤ x=a0#T ⎢y=a1#T ⎥ ⎢ ⎥ ⎣c=a2#r(a0#T ) ⎦ d=a3#eq(T, a0#T, a1#T ) will not, however, be computed as being of the second type unless we take account of the import of the equality predicate. This is easily fixed, for example by normalizing the instantiation so that all arbitrary objects which are required to be identical are represented by the same symbols, in this case, for example, substituting a0 for all occurrences of a1: ⎡ ⎤ x=a0#T ⎢y=a0#T ⎥ ⎢ ⎥ ⎣c=a2#r(a0#T ) ⎦ d=a3#eq(T, a0#T, a0#T ) This will then have the desired effect on the computation of the subtype relation. However, there is no guarantee that it will always be possible to give a complete characterization of the subtype relation if the logic of the predicates is incomplete. But we should not let this stop us exploiting those inferences about subtyping which we can draw in a computational implementation. 4.
Records as linguistic objects
We will consider linguistic objects to be records. Here is a simple linguistic object which might correspond to the word man. ⎡ ⎤ phon = [man] ⎢ cat ⎥ = n ⎢ ⎡ ⎤ ⎥ ⎢ ⎥ num = sg ⎢ ⎥ ⎣ ⎦ ⎣ agr ⎦ gen = masc = pers = third It is a record with three fields. The field for phon(ology) has as value a (singleton) list of words (following the traditional HPSG simplifying assumption about phonology). For the cat(egory) field we will use atomic categories like n(oun), although nothing in our approach excludes the complex categories normally used in HPSG analyses. We include three agr(eement) features: num(ber), in this case with the value s(in)g(ular), gen(der), in this
TTR and unification-based grammar 21
case with the value masc(uline) and pers(on) in this case with the value third (person). Not all words correspond to a single linguistic object. For example, the English word fish can be singular or plural and masculine, feminine or neuter. This means that there will be six records corresponding to the single record for man. Here are three of them: ⎡ ⎤ phon = [fish] ⎢ cat ⎥ = n ⎢ ⎡ ⎤ ⎥ ⎢ ⎥ num = sg ⎢ ⎥ ⎣ ⎦ ⎣ agr ⎦ gen = neut = pers = third ⎡ ⎤ phon = [fish] ⎢ cat ⎥ = n ⎢ ⎤ ⎥ ⎡ ⎢ ⎥ num = pl ⎢ ⎥ ⎣ agr = ⎣ gen = neut ⎦ ⎦ pers = third ⎡ ⎤ phon = [fish] ⎢ cat ⎥ = n ⎢ ⎡ ⎤ ⎥ ⎢ ⎥ num = sg ⎢ ⎥ ⎣ ⎦ ⎣ agr ⎦ gen = masc = pers = third Now let us consider types of linguistic objects. Nouns correspond to objects which have a phonology, the category n and the agreement features for number, gender and person. That is we can define a record type Noun as follows: ⎡ ⎤ phon : Phon ⎢ cat=n : Cat ⎥ ⎢ ⎡ ⎤ ⎥ ⎢ ⎥ num : Number Noun ≡ ⎢ ⎥ ⎣ agr : ⎣ gen : Gender ⎦ ⎦ pers : Person where: Phon ≡ [Lex] (i.e. type of list of objects of type Lex) the, a, fish, man, men, swim, swims, swam, . . . : Lex n, det, np, v, vp, s, . . . : Cat sg, pl : Number masc, fem, neut : Gender first, second, third : Person We can further define types for determiners, verbs and agreement:
22 Robin Cooper ⎡
phon ⎢ cat=det ⎢ Det ≡ ⎢ ⎢ ⎣ agr ⎡
phon ⎢ cat=v ⎢ V≡⎢ ⎢ ⎣ agr ⎡
num ⎣ Agr ≡ gen pers
: : :
: :
Phon Cat ⎡ num ⎣ gen pers
Phon Cat ⎡ num ⎣ gen pers
: : : :
: : : ⎤
⎤ : : :
⎥ ⎤ ⎥ ⎥ Number ⎥ ⎦ ⎦ Gender Person ⎤
⎥ ⎤ ⎥ ⎥ Number ⎥ ⎦ ⎦ Gender Person
Number Gender ⎦ Person
Now we can define the type man. ⎡ phon=[man] ⎢ cat=n ⎢ Man ≡ ⎢ ⎢ ⎣ agr
of linguistic objects corresponding to the word : : :
Phon Cat ⎡ num=sg ⎣ gen=masc pers=third
⎤ : : :
⎥ ⎤ ⎥ ⎥ Number ⎥ ⎦ ⎦ Gender Person
This type identifies a unique linguistic object (namely the record corresponding to man which we introduced above). It is a singleton (or fully specified) type. It is also a subtype (or specification) of Noun in the sense that if an object is of type Man it is also of type Noun. We define the type for the plural men in a similar way. ⎤ ⎡ phon=[men] : Phon ⎥ ⎢ cat=n : Cat ⎢ ⎡ ⎤ ⎥ ⎥ ⎢ num=pl : Number Men ≡ ⎢ ⎥ ⎣ ⎦ ⎦ ⎣ agr gen=masc : Gender : pers=third : Person The type Fish corresponding to the noun fish is a less specified type, however: ⎡ ⎤ phon=[fish] : Phon ⎢ cat=n ⎥ : Cat ⎢ ⎡ ⎤ ⎥ ⎥ num : Number Fish ≡ ⎢ ⎢ ⎥ ⎣ ⎦ ⎣ agr ⎦ gen : Gender : pers=third : Person
TTR and unification-based grammar 23
The objects which are of this type will be the six records which we identified earlier. Fish is also a subtype of Noun. We can also define two types IndefArt and DefArt corresponding to the indefinite and definite articles which have different degrees of specification: ⎡
phon=[a] ⎢ cat=det ⎢ IndefArt ≡ ⎢ ⎢ ⎣ agr ⎡
phon=[the] ⎢ cat=det ⎢ DefArt ≡ ⎢ ⎢ ⎣ agr
: :
Phon Cat ⎡ num=sg ⎣ gen pers=third
: : :
Phon Cat ⎡ num ⎣ gen pers=third
:
⎤ ⎥ ⎤ ⎥ ⎥ Number ⎥ Gender ⎦ ⎦ Person ⎤
: : :
⎥ ⎤ ⎥ ⎥ Number ⎥ ⎦ ⎦ Gender Person
: : :
Both of these are subtypes of Det. IndefArt is specified for both number and person whereas DefArt is only specified for person. Similar differences in specification arise in verbs:4 ⎡
phon=[swims] ⎢ cat=v ⎢ Swims ≡ ⎢ ⎢ ⎣ agr ⎡
phon=[swim] ⎢ cat=v ⎢ Swim ≡ ⎢ ⎢ ⎣ agr ⎡
phon=[swam] ⎢ cat=v ⎢ Swam ≡ ⎢ ⎢ ⎣ agr
: : : : :
Phon Cat ⎡ num=pl ⎣ gen pers=third
: : : :
⎤
Phon Cat ⎡ num=sg ⎣ gen pers=third
Phon Cat ⎡ num ⎣ gen pers=third
These three types are all subtypes of V.
: : :
⎥ ⎤ ⎥ ⎥ Number ⎥ ⎦ ⎦ Gender Person ⎤ ⎥ ⎤ ⎥ ⎥ Number ⎥ Gender ⎦ ⎦ Person ⎤
: : :
: : :
⎥ ⎤ ⎥ ⎥ Number ⎥ ⎦ ⎦ Gender Person
24 Robin Cooper 5.
A type theoretical approach to unification phenomena
The types that we introduced in Section 4 lay the basis for a kind of unification phenomenon in language which has been discussed in the classical literature on unification approaches to natural language grammar (e.g. Shieber 1986). The sentence (1) is underspecified with respect to number. (1)
The fish swam.
Note, however, that either all the words are singular or all the words are plural. It cannot be the case that fish is regarded as singular while swam is regarded as plural, for example. This is because there are requirements that the determiner and the noun agree in number and that the subject noun-phrase and the verb agree in number. In a unification-based grammar this is expressed by requiring that the number features of the relevant phrases unify. In the terms of our previous discussion it means that there are two linguistic objects corresponding to (1) rather than eight. Note that the sentences in (2) are all singular. (2) a. a fish swam. b. the man swam. c. the fish swims. However, the “source” of the singularity is different in each case. It is the fact that a, man, and swims respectively are specified for singular, together with the requirements that all the number features unify which have as a consequence that all the words are specified for singular in the single linguistic object corresponding to each of these sentences. Unification is regarded as a useful tool in the linguistic analysis because it reflects the lack of “directionality” of the agreement phenomenon, that is, as long as one of the words is specified for number they all have to be specified for the same number. Unification is traditionally regarded as partial, that is, it can fail and this is used to explain why the strings of words in (3) are not sentences of English, that is, they do not correspond to linguistic objects allowed by the grammar of English. (3) a. *a fish swim. b. *the man swim. c. *a men swims.
TTR and unification-based grammar 25
On our type theoretical view the intuitive notion of unification is related to meet (as in the meet, or conjunction, of two types) and equality. In the definition of our type theory in Cooper(2005b) we introduce meet-types in the following way: If T1 and T2 are types, then T1 ∧ T2 is also a type. a : T1 ∧ T2 iff a : T1 and a : T2 If T1 and T2 are record types then there will always be a record type (not a meet) T3 which is equivalent to T1 ∧ T2 (in the sense that a : T3 iff a : T1 ∧ T2 ). Let us consider some examples:
f:T1 f:T1 ∧ g:T2 ≈ g:T2
f:T1 ∧ f:T2 ≈ f:T1 ∧ T2 Below we present some informal pseudocode for a function μ which will simplify meets of records types, returning an equivalent record type. The algorithm is similar in essential respects to the graph unification algorithm used in classical implementations of feature based grammar systems (Shieber 1986). One important respect in which it differs from the classical unification algorithm is that it never fails. In cases where the corresponding unification would have failed it will return a record type which is equivalent to the distinguished empty type ⊥.5 Another way in which it differs from the classical unification algorithm is that it applies to all types, reducing meets of records types to non-meet types and recursively performing this reduction within record types and otherwise returning the original type. The algorithm that is informally presented here is a simplification of the one that is implemented in TTR which has some additional special cases and also has an additional level of complication for the handling of dependent types, using the technique of environments which we referred to in Section 3. In order to understand the intention of this pseudocode it is important to remember that record types are considered to be finite sets of ordered pairs (representing the fields) as described above in Section 2. When we write Map(T , λl, v[Φ]) we mean that each field , T in T is to be replaced by the result of applying the function λl, v[Φ] to and T . When we say that T. is defined we mean that for some T , , T ∈ T . We assume that the type theory will define an incompatibility relation which holds between certain basic types such that if T1 and T2 are incompatible then there will be no a such that a : T1 and a : T2 . For example, one might require that all basic types are pairwise incompatible.
26 Robin Cooper μ(T ) = if for some T1 , T2 , T = T1 ∧ T2 then let T1 = μ(T1 ) T2 = μ(T2 ) in if T1 T2 then T1
elseif T2 T1 then T2
elseif T1 and T2 are incompatible, then ⊥ elseif T1 and T2 are record types then Map(T1 , λl, v[if T2 .l is defined then l, μ(v ∧ T2 .l) else l, v ]) ∪ (T2 − {l, v ∈ T2 | T1 .l is defined}) else T1 ∧ T2
end end elseif T is a record type, then Map(T , λl, v[l, μ(v) ]) else T end
If we know a : T1 and b : T2 and in addition know a = b then we know a : T1 ∧ T2 and a : μ(T1 ∧ T2 ). Intuitively, equality of objects corresponds to meet (or “unification”) of types. This can be expressed in terms of the following rules of inference.
a : T1 b : T2 a=b a : T1 ∧ T2 a : T1 ∧ T2 a : μ(T1 ∧ T2 ) We can exploit this in characterizing a type NP which allows noun-phrases consisting of a determiner and a noun which agree in number.
TTR and unification-based grammar 27
NP ≡ ⎡ ⎤ phon=append(daughters.first.phon, daughters.rest.first.phon):Phon ⎢cat=np:Cat ⎥ ⎢ ⎥ ⎡ ⎤ ⎢ ⎥ first : Det ⎢ ⎥ ⎢daughters:⎣ ⎥ ⎦ first : Noun ⎢ ⎥ rest : ⎢ ⎥ rest=nil : [Sign] ⎢ ⎥ ⎣agr=daughters.rest.first.agr:Agr ⎦ c:eq(Number, daughters.first.agr.num, daughters.rest.first.agr.num) In the definition of NP, Sign is to be thought of as a recursively defined type defined by: 1. if a : Det then a : Sign 2. if a : Noun then a : Sign 3. if a : NP then a : Sign . . . (similarly for other word and phrase types) n. no object is of type Sign except as required by the above clauses The predicate eq is as defined in Section 3. The type NP can be further specified to a type where the first daughter is of type DefArt and the second daughter is of type Man since these types are subtypes of Det and Noun respectively. ⎤ ⎡ phon=append(daughters.first.phon, daughters.rest.first.phon):Phon ⎥ ⎢cat=np:Cat ⎢ ⎤⎥ ⎡ ⎤ ⎡ ⎥ ⎢ phon=[the]:Phon ⎥ ⎢ ⎥ ⎥ ⎢cat=det ⎥ ⎢ ⎢ :Cat ⎥⎥ ⎢ ⎢ ⎢ ⎡ ⎤⎥ ⎥ ⎥ ⎥ ⎢ first:⎢ ⎢ num : Number ⎥ ⎥⎥ ⎢ ⎢ ⎢ ⎥⎥ ⎢ ⎢ ⎣agr : Gender ⎦⎦ :⎣ gen ⎥⎥ ⎢ ⎢ ⎥⎥ ⎢ ⎢ pers=third : Person ⎢ ⎢ ⎡ ⎡ ⎤⎤⎥⎥ ⎥⎥ ⎢daughters:⎢ phon=[man]:Phon ⎥⎥ ⎢ ⎢ ⎥ ⎢ ⎢cat=n:Cat ⎥ ⎥⎥ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎡ ⎤⎥⎥⎥⎥ ⎥⎥⎥ ⎢ ⎢ ⎢ num=sg : Number ⎥ :⎢ ⎢ rest:⎢first ⎢ ⎢ ⎥⎥⎥⎥ ⎢ ⎢ ⎢ ⎣agr:⎣ gen=masc : Gender ⎦⎦⎥⎥⎥ ⎥⎥⎥ ⎢ ⎢ ⎢ ⎢ ⎦⎦⎥ ⎣ ⎣ pers=third : Person ⎥ ⎢ ⎥ ⎢ rest=nil:[Sign] ⎥ ⎢ ⎦ ⎣agr=daughters.rest.first.agr:Agr c:eq(Number, daughters.first.agr.num, daughters.rest.first.agr.num) Note that this type represents that the singularity of the phrase has its source
28 Robin Cooper in man. Similarly we can create the type corresponding to a fish where the source of the singularity is the determiner. ⎡ ⎤ phon=append(daughters.first.phon, daughters.rest.first.phon):Phon ⎢cat=np:Cat ⎥ ⎢ ⎡ ⎡ ⎤ ⎤⎥ ⎢ ⎥ phon=[a]:Phon ⎢ ⎥ ⎢ ⎢ ⎢cat=det :Cat ⎥ ⎥⎥ ⎢ ⎢ ⎢ ⎥⎥ ⎡ ⎤⎥ ⎢ ⎢ first:⎢ ⎥⎥ num=sg : Number ⎥ ⎢ ⎢ ⎢ ⎥ ⎥⎥ ⎢ ⎢ ⎥⎥ ⎣ ⎦ ⎣agr ⎦ gen : Gender : ⎢ ⎢ ⎥⎥ ⎢ ⎢ ⎥⎥ pers=third : Person ⎢ ⎢ ⎡ ⎡ ⎤⎤⎥⎥ ⎢daughters:⎢ ⎥⎥ phon=[fish]:Phon ⎢ ⎢ ⎥⎥ ⎢ ⎢ ⎢ ⎢cat=n:Cat ⎥⎥⎥⎥ ⎢ ⎢ ⎢ ⎢ ⎡ ⎤⎥⎥⎥⎥ ⎢ ⎢ ⎢ ⎥⎥⎥ num : Number ⎥ :⎢ ⎢ ⎢ rest:⎢first ⎢ ⎥⎥⎥⎥ ⎢ ⎢ ⎢ ⎥⎥ ⎣agr:⎣ gen : Gender ⎦⎦⎥ ⎢ ⎢ ⎢ ⎥⎥⎥ ⎢ ⎣ ⎣ ⎦⎦⎥ pers=third : Person ⎢ ⎥ ⎢ ⎥ rest=nil:[Sign] ⎢ ⎥ ⎣agr=daughters.rest.first.agr:Agr ⎦ c:eq(Number, daughters.first.agr.num, daughters.rest.first.agr.num) A difference between our record types and feature structures is that the record types preserve the information of the source of the number information in these examples whereas in feature structures this information is lost once the feature structures have been unified. An additional difference is that we are able to form types corresponding to ungrammatical phrases such as *a men. ⎤ ⎡ phon=append(daughters.first.phon, daughters.rest.first.phon):Phon ⎥ ⎢cat=np:Cat ⎢ ⎤⎥ ⎡ ⎤ ⎡ ⎥ ⎢ phon=[a]:Phon ⎥ ⎢ ⎥⎥ ⎢cat=det :Cat ⎥ ⎢ ⎢ ⎥⎥ ⎢ ⎢ ⎢ ⎡ ⎤⎥ ⎥⎥ ⎢ first:⎢ ⎢ num=sg : Number ⎥ ⎥⎥ ⎢ ⎥ ⎢ ⎢ ⎥⎥ ⎢ ⎢ ⎣ gen ⎦⎦ ⎣agr : Gender : ⎥⎥ ⎢ ⎢ ⎥⎥ ⎢ ⎢ pers=third : Person ⎢ ⎢ ⎡ ⎡ ⎤⎤⎥⎥ ⎥⎥ ⎢daughters:⎢ phon=[men]:Phon ⎥⎥ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢cat=n:Cat ⎥⎥⎥⎥ ⎢ ⎢ ⎢ ⎢ ⎡ ⎤⎥⎥⎥⎥ ⎥⎥⎥ ⎢ ⎢ ⎢ num=pl : Number ⎥ :⎢ ⎢ ⎥⎥⎥⎥ ⎢ rest:⎢first ⎢ ⎢ ⎢ ⎢ ⎣agr:⎣ gen=masc : Gender ⎦⎦⎥⎥⎥ ⎥⎥⎥ ⎢ ⎢ ⎢ ⎢ ⎦⎦⎥ ⎣ ⎣ pers=third : Person ⎥ ⎢ ⎥ ⎢ rest=nil:[Sign] ⎥ ⎢ ⎦ ⎣agr=daughters.rest.first.agr:Agr c:eq(Number, daughters.first.agr.num, daughters.rest.first.agr.num)
TTR and unification-based grammar 29
This is a well-formed type but one that cannot have any elements since sg and pl are not identical. This type is thus equivalent to ⊥. Such types might be usefully exploited in robust parsing. Note that even though the type is empty it contains a great deal of information about the phrase. In particular if our types were to include information about the meaning or content of a phrase it might be possible to extract information about the meaning of a phrase even though it does not actually correspond to any well-formed linguistic object. This could potentially be exploited in a way similar to that suggested by Fouvry (2003) for weighted feature structures. 6.
Using unification to express generalizations
Unification is also exploited in unification grammars to extract generalities from individual cases. For example, the noun-phrase agreement phenomenon that we discussed in Section 5 requires that the agreement features on the noun be the same as those on the NP. This is an instance of the head feature principle which requires that the agreement features of the mother be the same as those of the head daughter. If we identify the head daughter in phrases then we can extract this principle out by creating a new type HFP which corresponds to the head feature principle. HFP ≡ hd-daughter agr=hd-daughter.agr
: :
Sign Agr
We also define a new version of the type NP, NP , which identifies the head daughter but does not contain the information corresponding to the head feature principle.
NP ⎡ ≡ ⎤ phon=append(daughters.first.phon, daughters.rest.first.phon) :Phon ⎢cat=np :Cat ⎥ ⎢ ⎥ ⎢hd-daughter=daughters.rest.first :Noun ⎥ ⎢ ⎥ ⎡ ⎤ ⎢ ⎥ first : Det ⎢ ⎥ ⎢daughters:⎣ ⎥ ⎦ first : Noun ⎢ ⎥ rest : ⎢ ⎥ rest=nil : [Sign] ⎢ ⎥ ⎣agr :Agr ⎦ c:eq(Number, daughters.first.agr.num, daughters.rest.first.agr.num)
The record type that characterizes noun-phrases is now μ(NP ∧ HFC).
30 Robin Cooper 7.
Conclusions
We have shown how a type theory with records gives us a notion of subtyping corresponding to subsumption in the unification literature and a way of reducing meets of record types to record types which is similar to the graph unification used in unification-based grammar formalisms. Using record types instead of feature structures gives us a kind of intensionality which is not available in feature structures. This intensionality allows us to distinguish equivalent types which preserve information which is lost in the unification of feature structures, such as the source of grammatical information associated with a phrase. This intensionality can also be exploited by associating empty types with ungrammatical phrases. Such types may contain information which could be used in robust parsing. While it may appear odd to refer to this property of as “intensionality” in the context of parsing, we do so because it is the same kind of intensionality which is important for our approach to the semantic analysis of attitudes such as know and believe. The type theory provides us with a level of abstraction which permits us to make generalizations across phenomena in natural language that have previously been treated by separate theories. Finally, this approach to unification is embedded in a rich “function-based” type theoretical framework which provides us with the kind of tools that are needed for semantics while at the same time allowing us to import unification into our semantic analysis. Acknowledgements This work was supported by Swedish Research Council projects numbers 2002-4879 Records, types and computational dialogue semantics and 20054211 Library-based Grammar Engineering. I am grateful to Thierry Coquand, Dan Flickinger, Jonathan Ginzburg, Erhard Hinrichs, Bengt Nordstr¨om and Aarne Ranta for discussion in connection with this work and to an anonymous referee for this volume for making a number of useful suggestions.
Notes 1. This section contains revised material from (Cooper 2005a). 2. There is a technical sense in which this recursion is non-essential. These records could also be viewed as non-recursive records whose labels are sequences of atomic labels. See (Cooper 2005a) for more discussion.
TTR and unification-based grammar 31 3. In order to do this safely we stratify the types. We define the type system as a family of type systems of order n for each natural number n. The idea is that types which are not defined in terms of other types are of order 0 and that types which are defined in terms of types of order n are of order n + 1. We will not discuss this in detail here but rely on the discussion in (Cooper 2005a). In this paper we will suppress reference to order in the specification of our types. 4. We are making the simplifying assumption that all the verb forms represented here are third person. 5. Any record type which has ⊥ in one of its fields will be such that there are no records of that type and thus the type will be equivalent to ⊥.
References Barwise, Jon and John Perry 1983 Situations and Attitudes. Bradford Books. Cambridge, Mass.: MIT Press. Betarte, Gustavo 1998 Dependent Record Types and Algebraic Structures in Type Theory. Ph.D. thesis, Department of Computing Science, G¨oteborg University and Chalmers University of Technology. Betarte, Gustavo and Alvaro Tasistro 1998 Extension of Martin-L¨of’s type theory with record types and subtyping. In Giovanni Sambin and Jan Smith, (eds.), Twenty-Five Years of Constructive Type Theory, number 36 in Oxford Logic Guides. Oxford: Oxford University Press. Cooper, Robin 2005a Austinian truth, attitudes and type theory. Research on Language and Computation 3: 333–362. 2005b Records and record types in semantic theory. Journal of Logic and Computation 15(2): 99–112. Coquand, Thierry, Randy Pollack, and Makoto Takeyama 2004 A logical framework with dependently typed records. Fundamenta Informaticae XX: 1–22. Fouvry, Frederik 2003 Constraint relaxation with weighted feature structures. In IWPT 03, International Workshop on Parsing Technologies. Nancy (France). Gabbay, Dov and Franz Guenthner, (eds.) 1986 Handbook of Philosophical Logic, Vol. III. Dordrecht: Reidel. Ginzburg, Jonathan in prep Semantics and interaction in dialogue. Draft available from http://www.dcs.kcl.ac.uk/staff/ginzburg/papers.html.
32 Robin Cooper Kamp, Hans and Uwe Reyle 1993 From Discourse to Logic. Dordrecht: Kluwer. Karttunen, Lauri 1969 Pronouns and variables. In Robert I. Binnick, Alice Davison, Georgia M. Green, and Jerry L. Morgan, (eds.), Papers from the Fifth Regional Meeting of the Chicago Linguistic Society, 108–115. Department of Linguistics, University of Chicago, Chicago, Illinois. Kohlhase, Michael, Susanna Kuschert, and Manfred Pinkal 1996 A type-theoretic semantics for λ-DRT. In Paul Dekker and Martin Stokhof, (eds.), Proceedings of the 10th Amsterdam Colloquium, 479–498. ILLC, Amsterdam. Larsson, Staffan 2002 Issue-based Dialogue Management. Ph.D. thesis, University of Gothenburg. M¨onnich, Uwe 1985 Untersuchungen zu einer konstruktiven Semantik f¨ur ein Fragment des Englischen. Habilitationsschrift, Universit¨at T¨ubingen. Montague, Richard 1974 Formal Philosophy: Selected Papers of Richard Montague. New Haven: Yale University Press. Ed. and with an introduction by Richmond H. Thomason. Muskens, Reinhard 1996 Combining Montague semantics and discourse representation. Linguistics and Philosophy 19(2): 143–186. Ranta, Aarne 1994 Type-Theoretical Grammar. Oxford: Clarendon Press. Sag, Ivan A., Thomas Wasow, and Emily M. Bender 2003 Syntactic Theory: A Formal Introduction. Stanford: CSLI Publications, 2nd edition. Shieber, Stuart 1986 An Introduction to Unification-Based Approaches to Grammar. Stanford: CSLI Publications. Sundholm, G¨oran 1986 Proof theory and meaning. In Gabbay and Guenthner (1986), chapter 8, 471–506. Tasistro, Alvaro 1997 Substitution, record types and subtyping in type theory, with applications to the theory of programming. Ph.D. thesis, Department of Computing Science, University of Gothenburg and Chalmers University of Technology. van Benthem, Johan and Alice ter Meulen, (eds.) 1997 Handbook of Logic and Language. North Holland and MIT Press.
TTR and unification-based grammar 33 van Eijck, Jan and Hans Kamp 1997 Representing discourse in context. In van Benthem and ter Meulen (1997), chapter 3, 179–237.
One-letter automata: How to reduce k tapes to one Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz
Abstract The class of k-dimensional regular relations has various closure properties that are interesting for practical applications. From a computational point of view, each closure operation may be realized with a corresponding construction for k-tape finite state automata. While the constructions for union, Kleene-star and (coordinate-wise) concatenation are simple, specific and non-trivial algorithms are needed for relational operations such as composition, projection, and cartesian product. Here we show that all these operations for k-tape automata can be represented and computed using standard operations on conventional one-tape finite state automata plus some trivial rules for tape manipulation. As a key notion we introduce the concept of a one-letter k-tape automaton, which yields a bridge between k-tape and one-tape automata. We achieve a general and efficient implementational framework for n-tape automata.
1.
Introduction
Multi-tape finite state automata and especially 2-tape automata have been widely used in many areas of computer science such as Natural Language Processing (Karttunen et al. 1996; Mohri 1996; Roche and Schabes 1997) and Speech Processing (Mohri 1997; Mohri et al. 2002). They provide an uniform, clear and computationally efficient framework for dictionary representation (Karttunen 1994; Mihov and Maurel 2001) and realization of rewrite rules (Gerdemann and van Noord 1999; Kaplan and Kay 1994; Karttunen 1997), as well as text tokenization, lexicon tagging, part-of-speech disambiguation, indexing, filtering and many other text processing tasks (Karttunen et al. 1996; Mohri 1996; Roche and Schabes 1995, 1997). The properties of k-tape finite state automata differ significantly from the corresponding properties of 1-tape automata. For example, for k ≥ 2 the class of relations recognized by k-tape automata is not closed under intersection and complement. Moreover there is no general determinization procedure for k-tape automata. On the other side the class of relations recognized by k-tape finite state automata is closed under a number of useful relational operations like composition, cartesian product, projection, inverse etc. It is this latter property that
36 Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz makes k-tape automata interesting for many practical applications such as the ones listed above. There exist a number of implementations for k-tape finite state automata (Karttunen et al. 1996; Mohri et al. 1998; van Noord 1997). Most of them are implementing the 2-tape case only. While it is straightforward to realize constructions for k-tape automata that yield union, Kleene-star and concatenation of the recognized relations, the computation of relational operations such as composition, projection and cartesian product is a complex task. This makes the use of the k-tape automata framework tedious and difficult. We introduce an approach for presenting all relevant operations for k-tape automata using standard operations for classical 1-tape automata plus some straightforward operations for adding, deleting and permuting tapes. In this way we obtain a transparent, general and efficient framework for implementing k-tape automata. The main idea is to consider a restricted form of k-tape automata where all transition labels have exactly one non-empty component representing a single letter. The set of all k-tuples of this form represents the basis of the monoid of k-tuples of words together with the coordinate-wise concatenation. We call this kind of automata one-letter automata. Treating the basis elements as symbols of a derived alphabet, one-letter automata can be considered as conventional 1-tape automata. This gives rise to a correspondence where standard operations for 1-tape automata may be used to replace complex operations for k-tape automata. The paper is structured as follows. Section 2 provides some formal background. In Section 3 we introduce one-letter k-tape automata. We show that classical algorithms for union, concatenation and Kleene-star over one-letter automata (considered as 1-tape automata) are correct if the result is interpreted as a k-tape automaton. Section 4 is central. A condition is given that guarantees that the intersection of two k-dimensional regular relations is again regular. For k-tape one-letter automata of a specific form that reflects this condition, any classical algorithm for intersecting the associated 1-tape automata can be used for computing the intersection of the regular relations recognized by the automata. Section 5 shows how to implement tape permutations for one-letter automata. Using tape permutations, the inverse relation to a given k-dimensional regular relation can be realized. In a similar way, Section 6 treats tape insertion, tape deletion and projection operations for k-dimensional regular relations. Section 7 shows how to reduce the computation of composition and cartesian product of regular relations to intersections of the kind discussed in Section 4. plus tape insertion and projection. In Sec-
One-letter automata 37
tion 8 we add some final remarks. We comment on problems that may arise when using k-tape automata and on possible solutions. 2.
Formal Background
We assume that the reader is familiar with standard notions from automata theory (see, e.g., (Aho et al. 1983; Roche and Schabes 1995)). In the sequel, with Σ we denote a finite set of symbols called the alphabet, ε denotes the empty word, and Σε := Σ ∪ {ε}. The length of a word w ∈ Σ∗ is written |w|. If L1 , L2 ⊆ Σ∗ are languages, then L1 · L2 := {w1 · w2 | w1 ∈ L1 , w2 ∈ L2 } denotes their concatenation. Here w1 ·w2 is the usual concatenation of words. Recall that Σ∗ , ·, ε is the free monoid with set of generators Σ. If v = v1 , . . . , vk and w = w1 , . . . , wk are two k-tuples of words, then v w := v1 · w1 , . . . , vk · wk denotes the coordinate-wise concatenation. With εˆ we denote the k-tuple ε, . . . , ε . The tuple (Σ∗ )k , , εˆ is a monoid that can be described as the k-fold cartesian product of the free monoid Σ∗ , ·, ε . As set of generators we consider Σˆ k := {ε, . . . , a, . . . , ε | 1 ≤ i ≤ k, a ∈ Σ}. ↑ i
Note that the latter monoid is not free, due to obvious commutation rules for generators. For relations R ⊆ (Σ∗ )k we define R0 := {εˆ }, Ri+1 := Ri R, R∗ :=
∞ [
Ri
(Kleene-star).
i=0
Let k ≥ 2 and 1 ≤ i ≤ k. The relation R (i) := {w1 , . . . , wi−1 , wi+1 , . . . , wk | ∃v ∈ Σ∗ : w1 , . . . , wi−1 , v, wi+1 , . . . , wk ∈ R} is called the projection of R to the set of coordinates {1, . . . , i − 1, i + 1, . . . , k}.
38 Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz If R1 , R2 ⊆ (Σ∗ )k are two relations of the same arity, then R1 R2 := {v w | v ∈ R1 , w ∈ R2 } denotes the coordinate-wise concatenation. If R1 ⊆ Σ∗ k and R2 ⊆ Σ∗ l are two relations, then R1 × R2 := {w1 , . . . , wk+l | w1 , . . . , wk ∈ R1 , wk+1 , . . . , wk+l ∈ R2 } is the cartesian product of R1 and R2 and R1 ◦ R2 := {w1 , . . . , wk+l−2 | ∃w : w1 , . . . , wk−1 , w ∈ R1 , w, wk , . . . , wk+l−2 ∈ R2 } is the composition of R1 and R2 . Further well-known operations for relations are union, intersection, and inversion (k = 2). Definition 1 The class of k-dimensional regular relations over the alphabet Σ is recursively defined in the following way: – ∅ and {v} for all v ∈ Σˆ k are k-dimensional regular relations. – If R1 , R2 and R are k-dimensional regular relations, then so are – R1 R2 , – R1 ∪ R2 , – R∗ . – There are no other k-dimensional regular relations. Note 1 The class of k-dimensional regular relations over a given alphabet Σ is closed under union, Kleene-star, coordinate-wise concatenation, composition, projection, and cartesian product. For k ≥ 2 the class of regular relations is not closed under intersection, difference and complement. Obviously, every 1-dimensional regular relation is a regular language over the alphabet Σ. Hence, for k = 1 we obtain closure under intersection, difference and complement. Definition 2 Let k be a positive integer. A k-tape automaton is a six-tuple A = k, Σ, S, F, s0 , E , where Σ is an alphabet, S is a finite set of states, F ⊆ S
One-letter automata 39
is a set of final states, s0 ∈ S is the initial state and E ⊆ S × (Σε )k × S is a finite set of transitions. A sequence s0 , a1 , s1 , . . . , sn−1 , an , sn , where s0 is the initial state, si ∈ S and ai ∈ (Σε )k for i = 1, . . . , n, is a path for A iff si−1 , ai , si ∈ E for 1 ≤ i < n. The k-tape automaton A recognizes v ∈ (Σ∗ )k iff there exists a path s0 , a1 , s1 , . . . , sn−1 , an , sn for A such that sn ∈ F and v = a1 a2 . . . an−1 an . With R(A) we denote the set of all tuples in (Σ∗ )k recognized by A, i.e., R(A) := {v ∈ (Σ∗ )k | A recognizes v}. For a given k-tape automaton A = k, Σ, S, F, s0 , E the generalized transition relation E ∗ ⊂ S × (Σ∗ )k × S is recursively defined as follows: 1. s, ε, . . . , ε , s ∈ E ∗ for all s ∈ S, 2. if s1 , v, s ∈ E ∗ and s , a, s2 ∈ E, then s1 , v a, s2 ∈ E ∗ , for all v ∈ (Σ∗ )k , a ∈ (Σε )k , s1 , s , s2 ∈ S. Clearly, if A is a k-tape automaton, then R(A) = {v ∈ (Σ∗ )k | ∃ f ∈ F : s0 , v, f ∈ E ∗ }. Note 2 By a well-known generalization of Kleene’s Theorem (see Kaplan and Kay (1994)), for each k-tape automaton A the set R(A) is a k-dimensional regular relation, and for every k-dimensional regular relation R , there exists a k-tape automaton A such that R(A ) = R . 3.
Basic Operations for one-letter automata
In this section we introduce the concept of a one-letter automaton. One-letter automata represent a special form of k-tape automata that can be naturally interpreted as one-tape automata over the alphabet Σˆ k . We show that basic operations such as union, concatenation, and Kleene-star for one-letter automata can be realized using the corresponding standard constructions for conventional one-tape automata. Definition 3 A k-tape finite state automaton A = k, Σ, S, F, s0 , E is a oneletter automaton iff all transitions e ∈ E are of the form e = s, ε, . . . , ε , a, ε , . . . , ε , s ↑ ↑ ↑
i−1
for some 1 ≤ i ≤ k and a ∈ Σ.
i
i+1
↑ k
40 Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz Proposition 1 For every k-tape automaton A we may effectively construct a k-tape one-letter automaton A such that R(A ) = R(A). Proof. First we can apply the classical ε-removal procedure in order to construct an εˆ -free k-tape automaton, which leaves the recognized relation unchanged. Let A¯ = k, Σ, S, F, s0 , E be an εˆ -free k-tape automaton such that ¯ Then we construct A = k, Σ, S , F, s0 , E using the following R(A) = R(A). algorithm: S = S, E = ∅ FOR s ∈ S DO: FOR s, a1 , a2 , . . . , ak , s ∈ E DO LET I = {i ∈ N | ai = ε} (I = {i1 , . . . , it }); LET S
= {si1 , . . . , sit−1 }, SUCH THAT S
∩ S = ∅; S = S ∪ S
; E = E ∪ {si j , ε, . . . , ε, ai j , ε, . . . , ε , si j+1 | 0 ≤ j ≤ t − 1, ↑
ij
si0 = s and sit = s }; END; END. Informally speaking, we split each transition with label a1 , a2 , . . . , ak with t > 1 non-empty coordinates into t subtransitions, introducing t − 1 new intermediate states. Corollary 1 If R ⊆ (Σ∗ )k is a k-dimensional regular relation, then there exists a k-tape one-letter automaton A such that R(A) = R. Each k-tape one-letter automaton A over the alphabet Σ can be considered ˆ over the alphabet Σˆ k . Conversely, as a one-tape automaton (denoted by A) every ε-free one-tape automaton over the alphabet Σˆ k can be considered as a k-tape automaton over Σ. Formally, this correspondence can be described using two mappings. Definition 4 The mapping ˆ maps every k-tape one-letter automaton A = k, Σ, S, F, s0 , E to the ε-free one-tape automaton Aˆ := Σˆ k , S, F, s0 , E .
One-letter automata 41
The mapping ˇ maps a given ε-free one-tape automaton A = Σˆ k , S, F, s0 , E to the k-tape one-letter automaton Aˇ := k, Σ, S, F, s0 , E . Obviously, the mappings ˆ and ˇ are inverse. From a computational point of view, the mappings merely represent a conceptual shift where we use another alphabet for looking at transitions labels. States and transitions are not changed. Definition 5 The mapping φ : Σˆ ∗k → Σ∗ k : a1 · · · an → a1 · · · an , ε → εˆ is called the natural homomorphism between the free monoid Σˆ ∗k , ·, ε and the monoid Σ∗ k , , εˆ . It is trivial to check that φ is in fact a homomorphism. We have the following connection between the mappingsˆ ,ˇand φ. Lemma 1 Let A = k, Σ, S, F, s0 , E be a k-tape one-letter automaton. Then 1. Aˇˆ = A. ˆ 2. R(A) = φ(L(A)). ˆ Furthermore, if A is an ε-free one-tape automaton over Σˆ k , then Aˇ = A . Thus we obtain the following commutative diagram: A
ˇ ˆ
R R(A)
Aˆ L
φ
ˆ L(A)
We get the following proposition as a direct consequence of Lemma 1 and the homomorphic properties of the mapping φ. Proposition 2 Let A1 and A2 be two k-tape one-letter automata. Then we have the following: 1. R(A1 ) ∪ R(A2 ) = φ(L(Aˆ 1 ) ∪ L(Aˆ 2 )). 2. R(A1 ) R(A2 ) = φ(L(Aˆ 1 ) · L(Aˆ 2 )). 3. R(A1 )∗ = φ(L(Aˆ 1 )∗ ).
42 Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz Algorithmic constructions From Part 1 of Proposition 2 we see the following. Let A1 and A2 be two k-tape one-letter automata. Then, to construct a one-letter automaton A such that R(A) = R(A1 ) ∪ R(A2 ) we may interpret Ai as a one-tape automaton Aˆ i (i = 1, 2). We use any union-construction for onetape automata, yielding an automaton A such that L(A ) = L(Aˆ 1 ) ∪ L(Aˆ 2 ). Removing ε-transitions and interpreting the resulting automaton A
as a ktape automaton A := Aˇ
we receive a one-letter automaton such that R(A) = R(A1 ) ∪ R(A2 ). Similarly Parts 2 and 3 show that “classical” algorithms for closing conventional one-tape automata under concatenation and Kleene-star can be directly applied to k-tape one-letter automata, yielding algorithms for closing k-tape one-letter automata under concatenation and Kleene-star.
4.
Intersection of one-letter automata
It is well-known that the intersection of two k-dimensional regular relations is not necessarily a regular relation. For example, the relations R1 = {an bk , cn | n, k ∈ N} and R2 = {as bn , cn | s, n ∈ N} are regular, but R1 ∩R2 = {an bn , cn | n ∈ N} is not regular since its first projection is not a regular language. We now introduce a condition that guarantees that the classical construction for intersecting one-tape automata is correct if used for k-tape one-letter automata. As a corollary we obtain a condition for the regularity of the intersection of two k-dimensional regular relations. This observation will be used later for explicit constructions that yield composition and cartesian product of one-letter automata. A few preparations are needed. Definition 6 Let v = b1 . . . bn be an arbitrary word over the alphabet Ξ, i.e., v ∈ Ξ∗ . We say that the word v is obtained from v by adding the letter b iff v = b1 . . . b j bb j+1 . . . bn for some 0 ≤ j ≤ n. In this case we also say that v is obtained from v by deleting the symbol b. Proposition 3 Let v = a1 . . . an ∈ Σˆ ∗k and φ(v) = a1 a2 · · ·an = w1 , . . . , wk . Let also a = ε, . . . , b, . . . , ε ∈ Σˆ k . Then, if v is obtained from v by adding the ↑ i
letter a, then φ(v ) = w1 , . . . , wi−1 , w i , wi+1 , . . . , wk and w i is obtained from wi by adding the letter b.
One-letter automata 43
Definition 7 For a regular relation R ⊆ (Σ∗ )k the coordinate i (1 ≤ i ≤ k) is inessential iff for all w1 , . . . , wk ∈ R and any v ∈ Σ∗ we have w1 , . . . , wi−1 , v, wi+1 , . . . , wk ∈ R. Analogously, if A is a k-tape automaton such that R(A) = R we say that tape i of A is inessential. Otherwise we call coordinate (tape) i essential. Definition 8 Let A be a k-tape one-letter automaton and assume that each coordinate in the set I ⊆ {1, . . . , k} is inessential for R(A). Then A is in normal form w.r.t. I iff for any tape i ∈ I we have: 1. ∀s ∈ S, ∀a ∈ Σ : s, ε, . . . , a, . . . , ε , s ∈ E, ↑ i
/ E. 2. ∀s, s ∈ S, ∀a ∈ Σ : (s = s) ⇒ s, ε, . . . , a, . . . , ε , s ∈ ↑ 1
↑ i
↑ k
Proposition 4 For any k-tape automaton A and any given set I of inessential coordinates of R(A) we may effectively construct a k-tape one-letter automaton A in normal form w.r.t. I such that R(A ) = R(A). Proof. Let A = k, Σ, S, F, s0 , E . Without loss of generality we can assume that A is in one-letter form (Proposition 1). To construct A = k, Σ, S, F, s0 , E we use the following algorithm: E = E FOR s ∈ S DO FOR i ∈ I DO FOR a ∈ Σ DO IF ((s, ε, . . . , ε, a, ε, . . . , ε , s ∈ E ) & (s = s)) THEN ↑
E
=
i
E \ {s, ε, . . . , ε, a, ε, . . . , ε , s }; ↑ i
E = E ∪ s, ε, . . . , ε, a, ε, . . . , ε , s ; ↑ i
END; END; END. The algorithm does not change any transition on an essential tape. Transitions between distinct states that affect an inessential tape in I are erased. For
44 Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz each state we add loops with all symbols from the alphabet for the inessential tapes in I. The correctness of the above algorithm follows from the fact that for any inessential tape i ∈ I we have w1 , . . . , wi , . . . , wn ∈ R(A) iff w1 , . . . , ε, . . . , wn ∈ R(A). Corollary 2 Let R ⊆ (Σ∗ )k be a regular relation with a set I of inessential coordinates. Then there exists a k-tape one-letter automaton A in normal form w.r.t. I such that R(A) = R. The following property of k-tape automata in normal form will be useful when proving Lemma 2. Proposition 5 Let A = k, Σ, S, F, s0 , E be a k-tape one-letter automaton in normal form w.r.t. the set of inessential coordinates I. Let i0 ∈ I and let ˆ Then for any a = ε, . . . , b, . . . , ε ∈ Σˆ k and any word v = a1 . . . an ∈ L(A). ↑ i0
ˆ v ∈ Σˆ ∗k obtained from v by adding a we have v ∈ L(A). Proof. The condition for the automaton A to be in normal form w.r.t. I yields that for all s ∈ S the transition s, a, s is in E, which proves the proposition. Now we are ready to formulate and prove the following sufficient condition for the regularity of the intersection of two regular relations. With K we denote the set of coordinates {1, . . . , k}. Lemma 2 For i = 1, 2, let Ai be a k-tape one-letter automaton, let Ii ⊆ K denote a given set of inessential coordinates for Ai . Let Ai be in normal form w.r.t. Ii (i = 1, 2). Assume that |K \(I1 ∪ I2 )| ≤ 1, which means that there exists at most one common essential tape for A1 and A2 . Then R(A1 ) ∩ R(A2 ) is a regular k-dimensional relation. Moreover R(A1 )∩R(A2 ) = φ(L(Aˆ 1 )∩L(Aˆ 2 )). Proof. It is obvious that φ(L(Aˆ 1 ) ∩ L(Aˆ 2 )) ⊆ R(A1 ) ∩ R(A2 ), because if a1 . . . an ∈ L(Aˆ 1 ) ∩ L(Aˆ 2 ), then by Lemma 1 we have a1 · · · an ∈ R(A1 ) ∩ R(A2 ). We give a detailed proof for the other direction, showing that R(A1 ) ∩ R(A2 ) ⊆ φ(L(Aˆ 1 ) ∩ L(Aˆ 2 )). For the proof the reader should keep in mind that the transition labels of the automata Ai (i = 1, 2) are elements of Σˆ k , which means that the sum of the lengths of the words representing the components is exactly 1.
One-letter automata 45
Let w1 , w2 , . . . , wk ∈ R(A1 ) ∩ R(A2 ). Let j0 ∈ K be a coordinate such that for each j0 = j ∈ K we have j ∈ I1 or j ∈ I2 . Let E1 = K \ I1 . Recall that for i ∈ E1 , i = j0 always i ∈ I2 is an inessential tape for A2 . Then by the definition of inessential tapes the tuples w 1 , . . . , w k and w
1 , . . . , w
k , where ε, if i ∈ I ε, if i ∈ E1 and i = j0 1 w i = w
i = wi , if i ∈ E1 wi , otherwise respectively are in R(A1 ) and R(A2 ). Then there are words v = a 1 . . . a n ∈ L(Aˆ 1 ) v
= a
1 . . . a
m ∈ L(Aˆ 2 ) such that φ(v ) = w 1 , . . . , w k and φ(v
) = w
1 , . . . , w
k . Note that n = ∑ki=1 |w i | and m = ∑ki=1 |w
i |. Furthermore, w j0 = w j0 = w
j0 . Let l = |w j0 |. We now construct a word a1 a2 . . . ar ∈ L(Aˆ 1 ) ∩ L(Aˆ 2 ) such that a1 a2 . . . ar = w1 , . . . , wk , which imposes that r = n + m − l. Each letter ai is obtained copying a suitable letter from one of the sequences a 1 . . . a n and a
1 . . . a
m . In order to control the selection, we use the pair of indices ti ,ti
(0 ≤ i < n + m − l), which can be considered as pointers to the two sequences. The definition of ti ,ti
and ai proceeds inductively in the following way. Let t0 = t0
:= 1. Assume that ti and ti
are defined for some 0 ≤ i < n + m − l. We show how
. We distinguish four cases: and ti+1 to define ai+1 and the indices ti+1 1. if ti = n + 1 and ti
= m + 1 we stop; else
:= ti + 1, 2. if at = ε, . . . , b, . . . , ε for some j = j0 , then ai+1 := at , ti+1 ↑
i
:= t
, ti+1 i
i
j
3. if at = ε, . . . , b, . . . , ε or ti = n + 1, and at
= ε, . . . , c, . . . , ε for some ↑
i
i
j0
↑ j
:= t
+ 1. := ti , and ti+1 j = j0 , then ai+1 := at
, ti+1 i i
4. if
at
i
=
at
i
= ε, . . . , b, . . . , ε for some b ∈ Σ, then ai+1 := at , ti+1 := ↑
i
j0
:= t
+ 1. ti + 1 and ti+1 i
From an intuitive point of view, the definition yields a kind of zig-zag construction. We always proceed in one sequence until we come to a transition
46 Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz that affects coordinate j0 . At this point we continue using the other sequence. Once we have in both sequences a transition that affects j0 , we enlarge both indices. From w j0 = w
j0 = w j0 it follows immediately that the recursive definition stops exactly when i + 1 = n + m − l. In fact the subsequences of a 1 . . . a n and a
1 . . . a
m from which w j0 is obtained must be identical. Using induction on 0 ≤ i ≤ n + m − l we now prove that the word a1 . . . ai is obtained from a 1 . . . at −1 by adding letters in Σˆ k which have a non-ε symbol i in an inessential coordinate for R(A1 ). The base of the induction is obvious. Let the statement be true for some 0 ≤ i < n + m − l. We prove it for i + 1: The word a1 . . . ai ai+1 is obtained from a1 . . . ai by adding the letter ai+1 = ε, . . . b, . . . , ε , and according to the induction hypothesis a1 . . . ai is obtained
↑ j from a 1 , . . . at i −1 by adding letters with a non-ε symbol in a coordinate in I1 .
− 1 = t , hence = ti + 1 and ti+1 If j ∈ E1 (Cases 2 and 4), then ai+1 = at , ti+1 i i a1 . . . ai ai+1 is obtained from a 1 , . . . at −1 at −1 by adding letters satisfying i i+1
the above condition. On the other side, if j ∈ I1 (Case 3) we have ai+1 =
:= ti , which means that a 1 . . . at −1 = a 1 . . . at −1 and ai+1 is a at
and ti+1 i i i+1 letter satisfying the condition. Thus a1 . . . ai ai+1 is obtained from a 1 . . . at −1 i+1 adding letters which have non-ε symbol on an inessential tape for A1 , which means that the statement is true for i + 1. Analogously we prove for 0 ≤ i ≤ n + m − l that a1 . . . ai is obtained from a
1 . . . at
−1 by adding letters in Σˆ k which have a non-ε symbol in an inessential i coordinate for R(A2 ). By Proposition 5, a1 . . . an+m−l ∈ L(Aˆ 1 ) and a1 . . . an+m−l ∈ L(Aˆ 2 ). From Proposition 3 we obtain that a1 · · · an+m−l = u1 , . . . , uk , where ui = w i if i ∈ E1 and ui = w
i otherwise. But now remembering the definition of w i and w
i we obtain that a1 · · · an+m−l = w1 , . . . , wk , which we had to prove. Corollary 3 If R1 ⊆ (Σ)k and R2 ⊆ (Σ)k are two k-dimensional regular relations with at most one common essential coordinate i (1 ≤ i ≤ k), then R1 ∩ R2 is a k-dimensional regular relation. Algorithmic construction From Lemma 2 we see the following. Let A1 and A2 be two k-tape one-letter automata with at most one common essential tape i. Assume that both automata are in normal form w.r.t. the sets of inessential tapes. Then the relation R(A1 ) ∩ R(A2 ) is recognized by any ε-free 1-tape automaton A accepting L(Aˆ 1 ) ∩ L(Aˆ 2 ), treating A as a k-tape one-letter automaton A = Aˇ .
One-letter automata 47
5.
Tape permutation and inversion for one-letter automata
In the sequel, let Sk denote the symmetric group of k elements. Definition 9 Let R ⊆ (Σ∗ )k be a regular relation, let σ ∈ Sk . The permutation of coordinates induced by σ, σ(R), is defined as σ(R) := {wσ−1 (1) , . . . , wi , . . . , wσ−1 (k) | w1 , . . . , wk ∈ R}. ↑
σ(i)
Proposition 6 For a given k-tape one-letter automaton A = k, Σ, S, F, s0 , E , let σ(A) := k, Σ, S, F, s0 , σ(E) where σ(E) := {s, ε, . . . , ai , . . . , ε , s | s, ε, . . . , ai , . . . , ε , s ∈ E}. ↑
↑
σ(i)
i
Then R(σ(A)) = σ(R(A)). Proof. Using induction over the construction of E ∗ and σ(E)∗ we prove that for all s ∈ S and w1 , . . . , wk ∈ (Σ∗ )k we have ⇔
s0 , w1 , . . . , wk , s ∈ E ∗ s0 , wσ−1 (1) , . . . , wi , . . . , wσ−1 (k) , s ∈ σ(E)∗ . ↑
σ(i)
“⇒”. The base of the induction is obvious since s0 , ε, . . . , ε , s0 ∈ E ∗ ∩ σ(E)∗ . Now suppose that there are transitions s0 , w1 , . . . , wi−1 , w i , wi+1 , . . . , wk , s ∈ E ∗ s, ε, . . . , a, . . . , ε , s ∈ E. ↑ i
Then, by induction hypothesis, s0 , wσ−1 (1) , . . . , w i , . . . , wσ−1 (k) , s ∈ σ(E)∗ . ↑
σ(i)
The definition of σ(E) shows that s, ε, . . . , a , . . . , ε , s ∈ σ(E). Hence ↑
σ(i)
s0 , wσ−1 (1) , . . . , w i a, . . . , wσ−1 (k) , s ∈ σ(E)∗ , ↑
σ(i)
48 Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz which we had to prove. “⇐”. Follows analogously to “⇒”.
Corollary 4 Let R ⊆ (Σ∗ )k be a k-dimensional regular relation and σ ∈ Sk . Then also σ(R) is a k-dimensional regular relation. Algorithmic construction From Proposition 6 we see the following. Let A be a 2-tape one-letter automaton. If σ denotes the transposition (1, 2), then automaton σ(A) defined as above recognizes the relation R(A)−1 . 6.
Tape insertion, tape deletion and projection
Definition 10 Let R ⊆ (Σ∗ )k be a k-dimensional regular relation. We define the insertion of an inessential coordinate at position i (denoted R ⊕ (i)) as R ⊕ (i) := {w1 , . . . , wi−1 , v, wi , . . . , wk | w1 , . . . , wi−1 , wi , . . . , wk ∈ R, v ∈ Σ∗ }. Proposition 7 Let A = k, Σ, S, F, s0 , E be a k-tape one-letter automaton. Let A := k + 1, Σ, S, F, s0 , E where E :=
{s, ε, . . . , a, . . . , ε, ε , s | s, ε, . . . , a, . . . , ε , s ∈ E} ↑
↑
i
↑ i
k+1
↑ k
∪ {s, ε, . . . , ε, a , s | s ∈ S, a ∈ Σ}. ↑
k+1
Then R(A ) = R(A) ⊕ (k + 1). Proof. First, using induction on the construction of E ∗ we prove that for all s ∈ S and w1 , . . . , wk , wk+1 ∈ (Σ∗ )k+1 we have s0 , w1 , . . . , wk , wk+1 , s ∈ E ∗
⇔
s0 , w1 , . . . , wk , ε , s ∈ E ∗ .
“⇒”. The base of the induction is obvious. Assume there are transitions s0 , w1 , . . . , wi−1 , w i , wi+1 , . . . , wk , wk+1 , s ∈ E ∗ s, ε, . . . , a, . . . , ε , s ∈ E . ↑ i
First assume that i ≤ k. By induction hypothesis, s0 , w1 , . . . , wi−1 , w i , wi+1 , . . . , wk , ε , s ∈ E ∗ .
One-letter automata 49
Using the definition of E ∗ we obtain s0 , w1 , . . . , wi−1 , w i a, wi+1 , . . . , wk , ε , s ∈ E ∗ . If i = k + 1, then s = s . We may directly use the induction hypothesis to obtain s0 , w1 , . . . , wk , ε , s ∈ E ∗ . “⇐”. Let s0 , w1 , . . . , wk , ε , s ∈ E ∗ . Let wk+1 = v1 . . . vn ∈ Σ∗ where vi ∈ Σ. The definition of E shows that for all vi (1 ≤ i ≤ n) there exists a transition s , ε, . . . , ε, vi , s ∈ E . Hence s0 , w1 , . . . , wk , wk+1 , s ∈ E ∗ . ↑
k+1
To finish the proof observe that the definition of E yields s0 , w1 , . . . , wk , s ∈ E ∗
s0 , w1 , . . . , wk , ε , s ∈ E ∗ .
⇔
Corollary 5 If R ⊆ (Σ∗ )k is a regular relation, then R ⊕ (i) is a (k + 1)dimensional regular relation. Proof. The corollary directly follows from Proposition 7 and Proposition 6 having in mind that R ⊕ (i) = σ(R ⊕ (k + 1)) where σ is the cyclic permutation (i, i + 1, . . . , k, k + 1) ∈ Sk+1 . It is well-known that the projection of a k-dimensional regular relation is again a regular relation. The following propositions show how to obtain a (k − 1)-tape one-letter automaton representing the relation R (i) (cf. Section 2) directly from a k-tape one-letter automaton representing the relation R. Proposition 8 Let A = k, Σ, S, F, s0 , E be a k-tape one-letter automaton. Let A := k − 1, Σ, S, F, s0 , E be the (k − 1)-tape automaton where for i ≤ k − 1 we have s, ε, . . . , a, . . . , ε , s ∈ E
↑ i
↑
⇔
s, ε, . . . , a, . . . , ε , s ∈ E ↑ 1
k−1
↑
↑
i
k
and furthermore s, ε, . . . , ε , s ∈ E
↑
k−1
Then R(A ) = R(A) (k).
⇔
∃ ak ∈ Σk : s, ε, . . . , ε , ak , s ∈ E. ↑
k−1
↑ k
50 Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz Note 3 The resulting automaton A is not necessarily a one-letter automaton because A may have some εˆ -transitions. It could be transformed into a oneletter automaton using a standard ε-removal procedure. Proof of Proposition 8. It is sufficient to prove that for all w1 , . . . , wk−1 , wk ∈ (Σ∗ )k and s ∈ S we have s0 , w1 , . . . , wk−1 , wk , s ∈ E ∗
s0 , w1 , . . . , wk−1 , s ∈ E ∗ .
⇔
Again we use an induction on the construction of E ∗ and E ∗ . “⇒”. The base is trivial since s0 , ε, . . . , ε , s0 ∈ E ∗ and ↑ k
s0 , ε, . . . , ε , s0 ∈ E ∗ . ↑
k−1
Let s0 , w1 , . . . , w j , . . . , wk , s ∈ E ∗ and s, ε, . . . , a, . . . , ε , s ∈ E for some ↑ j
1 ≤ j ≤ k. First assume that j < k. The induction hypothesis yields s0 , w1 , . . . , w j , . . . , wk−1 , s ∈ E ∗ . Since s, ε, . . . , a, . . . , ε , s ∈ E
↑
↑
j
k−1
we have s0 , w1 , . . . , w j a, . . . , wk−1 , s ∈ E ∗ . If j = k, then the induction hypothesis yields s0 , w1 , . . . , wk−1 , s ∈ E ∗ . We have s, ε, . . . , ε , s ∈ E , ↑
k−1
hence s0 , w1 , . . . , wk−1 , s ∈ E ∗ . “⇐”. Similarly as “⇒”.
Corollary 6 If R ⊆ (Σ∗ )k is a regular relation, then R (i) is (k − 1)-dimensional regular relation. Proof. The corollary follows directly from R (i) = (σ−1 (R)) (k), where σ is the cyclic permutation (i, i + 1, . . . , k) ∈ Sk and Proposition 6. Algorithmic construction The constructions given in Proposition 8 and 6 together with an obvious εˆ -elimination show how to obtain a one-letter (k − 1)-tape automaton A for the projection R (i), given a one-letter k-tape automaton A recognizing R.
One-letter automata 51
7.
Composition and cartesian product of regular relations
We now show how to construct composition and cartesian product (cf. Section 2) of regular relations via automata constructions for standard 1-tape automata. Lemma 3 Let R1 ⊆ (Σ∗ )n1 and R2 ⊆ (Σ∗ )n2 be regular relations. Then the composition R1 ◦ R2 is a (n1 + n2 − 2)-dimensional regular relation. Proof. Using Corollary 5 we see that the relations R 1 := (. . . ((R1 ⊕ (n1 + 1)) ⊕ (n1 + 2)) ⊕ . . .) ⊕ (n1 + n2 − 1) R 2 := (. . . ((R2 ⊕ (1)) ⊕ (2)) ⊕ . . .) ⊕ (n1 − 1) are (n1 + n2 − 1)-dimensional regular relations. Using the definition of ⊕ we see that the essential coordinates for R 1 are in the set E1 = {1, 2, . . . , n1 } and those of R 2 are in the set E2 = {n1 , n1 + 1, . . . , n1 + n2 − 1}. Therefore R 1 and R 2 have at most one common essential coordinate, namely n1 . Corollary 3 shows that R = R 1 ∩ R 2 is a (n1 + n2 − 1)-dimensional regular relation. Since coordinates in E1 (resp. E2 ) are inessential for R 2 (resp. R 1 ) we obtain w 1 , . . . , w n1 −1 , w, w
n1 +1 , . . . , w
n1 +n2 −1 ∈ R 1 ∩ R 2 w 1 , . . . , w n1 −1 , w ∈ R1 & w, w
n1 +1 , . . . , w
n1 +n2 −1 ∈ R2 .
⇔
Using the definition of and Corollary 2 we obtain that R (n1 ) is a (n1 + n2 − 2)-dimensional regular relation such that w 1 , . . . , w n1 −1 , w
n1 +1 , . . . , w
n1 +n2 −1 ∈ R (n1 ) ∃w ∈ Σ∗ : w 1 , . . . , w n1 −1 , w, w
n1 +1 , . . . , w
n1 +n2 −1 ∈ R.
⇔
↑
n1
Combining both equivalences we obtain ⇔
w 1 , . . . , w n1 −1 , w
n1 +1 , . . . , w
n1 +n2 −1 ∈ R (n1 ) ∃w ∈ Σ∗ : w 1 , . . . , w n1 −1 , w ∈ R1 & w, w
n1 +1 , . . . , w
n1 +n2 −1 ∈ R2 ,
i.e. R (n1 ) = R1 ◦ R2 .
Lemma 4 Let R1 ⊆ (Σ∗ )n1 and R2 ⊆ (Σ∗ )n2 be regular relations. Then the cartesian product R1 × R2 is a (n1 + n2 )-dimensional regular relation over Σ.
52 Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz Proof. Similarly as in Lemma 3 we construct the (n1 + n2 )-dimensional regular relations R 1 := (. . . ((R1 ⊕ (n1 + 1)) ⊕ (n1 + 2)) ⊕ . . .) ⊕ (n1 + n2 ) R 2 := (. . . ((R2 ) ⊕ (1)) ⊕ (2)) ⊕ . . .) ⊕ (n1 ). The coordinates in {1, 2, . . . , n1 } are inessential for R 2 and those in {n1 + 1, . . . , n1 + n2 } are inessential for R 1 . Therefore R 1 and R 2 have no common essential coordinate and, by Corollary 3, R := R 1 ∩ R 2 is a (n1 + n2 )dimensional regular relation. Using the definition of inessential coordinates and the definition of ⊕ we obtain ⇔
w 1 , . . . , w n1 , w
n1 +1 , . . . , w
n1 +n2 ∈ R w 1 , . . . , w n1 ∈ R1 & w
n1 +1 , . . . , w
n1 +n2 ∈ R2 ,
which shows that R = R1 × R2 .
Algorithmic construction The constructions described in the above proofs show how to obtain one-letter automata for the composition R1 ◦ R2 and for the cartesian product R1 × R2 of the regular relations R1 ⊆ (Σ∗ )n1 and R2 ⊆ (Σ∗ )n2 , given one-letter automata Ai for Ri (i = 1, 2). In more detail, in order to construct an automaton for R1 ◦ R2 we 1. add n2 −1 final inessential tapes to A1 and n1 −1 initial inessential tapes to A2 in the way described above (note that the resulting automata are in normal form w.r.t. the new tapes), 2. intersect the resulting automata as conventional one-tape automata over the alphabet Σˆ n1 +n2 −1 , obtaining A, 3. remove the n1 -th tape from A and apply an ε-removal, thus obtaining A , which is the desired automaton. In order to construct an automaton for R1 × R2 we 1. add n2 final inessential tapes to A1 and n1 initial inessential tapes to A2 in the way described above, 2. intersect the resulting automata as normal one-tape automata over the alphabet Σˆ n1 +n2 , obtaining A, which is the desired automaton.
One-letter automata 53
At the end we will discuss the problem of how to represent identity relations as regular relations. First observe that the automaton A := 2, Σ, S, F, s0 , E where Σ := {a1 , . . . , an }, S := {s0 , s1 , . . . , sn }, F := {s0 } and E := {s0 , ai , ε si | 1 ≤ i ≤ n} ∪ {si , ε, ai s0 | 1 ≤ i ≤ n} accepts R(A) = {v, v | v ∈ Σ∗ }. The latter relation we denote with Id Σ . Proposition 9 Let R1 be 1-dimensional regular relation, i.e., a regular language. Then the set Id R1 := {v, v | v ∈ R1 } is a regular relation. Moreover Id R1 = (R1 ⊕ (2)) ∩ Id Σ . 8.
Conclusion
We introduced the concept of a one-letter k-tape automaton and showed that one-letter automata can be considered as conventional 1-tape automata over an enlarged alphabet. Using this correspondence, standard constructions for union, concatenation, and Kleene-star for 1-tape automata can be directly used for one-letter automata. Furthermore we have seen that the usual relational operations for k-tape automata can be traced back to the intersection of 1-tape automata plus straightforward operations for adding, permuting and erasing tapes. We have implemented the presented approach for implementation of transducer (2-tape automata) representing rewrite rules. Using it we have successfully realized Bulgarian hyphenation and tokenization. Still, in real applications the use of one-letter automata comes with some specific problems, in particular in situations where the composition algorithm is heavily used. In the resulting automata we sometimes find a large number of paths that are equivalent if permutation rules for generators are taken into account. For example, we might find three paths with label sequences a, ε , a, ε , ε, b ε, b , a, ε, , a, ε, a, ε, , ε, b , a, ε, , all representing the tuple aa, b . In the worst case this may lead to an exponential blow-up of the number of states, compared to the classical construction for n-tape automaton. We currently study techniques to get rid of superfluous paths. In many cases, equivalences of the above form can be recognized and used for eliminating states and transitions. The extension and refinement of these methods is one central point of current and future work.
54 Hristo Ganchev, Stoyan Mihov, and Klaus U. Schulz References Aho, Alfred V., John E. Hopcroft, and Jeffrey D. Ullman 1983 Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley. Gerdemann, Dale and Gertjan van Noord 1999 Transducers from rewrite rules with backreferences. In Proceedings of the 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL 99), 126–133. Kaplan, Ronald and Martin Kay 1994 Regular models of phonological rule systems. Computational Linguistics 20(3): 331–279. Karttunen, Lauri 1994 Constructing lexical transducers. In Proceedings of the 15th International Conference on Computational Linguistics. Coling 9, 406–411. Kyoto, Japan. 1997 The replace operator. In Emmanuel Roche and Yves Schabes, (eds.), Finite-State Language Processing, 117–147. Cambridge, MA: MIT Press. Karttunen, Lauri, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller 1996 Regular expressions for language engineering. Journal of Natural Language Engineering 2(4): 305–328. Mihov, Stoyan and Denis Maurel 2001 Direct construction of minimal acyclic subsequential transducers. In Proceedings of the Conference on Implementation and Application of Automata CIAA’2000, LNCS 2088, 217–229. Berlin: Springer. Mohri, Mehryar 1996 On some applications of finite-state automata theory to natural language processing. Journal of Natural Language Engineering 2: 1–20. 1997 Finite-state transducers in language and speech processing. Computational Linguistics 23(2): 269–311. Mohri, Mehryar, Fernando Pereira, and Michael Riley 1998 A rational design for a weighted finite-state transducer library. In Derick Wood and Sheng Yu, (eds.), Proceedings of the Second International Workshop on Implementing Automata (WIA ’97), LNCS 1436, 144–158. Berlin: Springer. Mohri, Mehryar, Fernando C. N. Pereira, and Michael Riley 2002 Weighted finite-state transducers in speech recognition. Computer Speech and Language 16(1): 69–88. Roche, Emmanuel and Yves Schabes 1995 Deterministic part-of-speech tagging with finite state transducers. Computational Linguistics 22(2): 227–253.
One-letter automata 55 1997
Introduction. In Emmanuel Roche and Yves Schabes, (eds.), FiniteState Language Processing, 1–66. Cambridge, MA: MIT Press. van Noord, Gertjan 1997 FSA utilities: A toolbox to manipulate finite-state automata. In Derick Wood Darrell Raymond and Sheng Yu, (eds.), Automata Implementation, LNCS 1260. Berlin: Springer.
Two aspects of situated meaning Eleni Kalyvianaki and Yiannis N. Moschovakis
Abstract We introduce two structural notions of situated meaning for natural language sentences which can be expressed by terms of Montague’s Language of Intensional Logic. Using the theory of referential intensions, we define for a sentence at a particular situation its factual content and its local meaning which express different abstract algorithms that compute its reference at that situation. With the use of characteristic examples, we attempt to show the distinctive roles of these two notions in any theory of meaning and to discuss briefly their relation to indexicality, propositional attitudes and translation.
1.
Introduction If a speaker of the language can rationally believe A and disbelieve B in the same situation, then the sentences A and B do not have the same meaning—they are not synonymous.
The principle is old (Frege 1892), and it has been used both as a test for theories of meaning and a source of puzzles about belief and synonymy. We think that at least some of the puzzles are due to a confusion between two plausible and legitimate but distinct understandings of situated meaning, the factual content and the (referential) local meaning. Consider, for example, the sentences A ≡ John loves Mary, and B ≡ John loves her, in a state (situation) a in which ‘her’ refers to Mary. They express the same information about the world in state a (they have the same factual content at that state); but they do not have the same meaning in that state, as they are not interchangeable in belief contexts: one may very well believe A but disbelieve B in a, because she does not know that ‘her’ refers to Mary. We will give precise, mathematical definitions of factual content and local meaning for the fragments of natural language which can be formalized in the
58 Eleni Kalyvianaki and Yiannis N. Moschovakis Language of Intensional Logic of Montague (1973), within the mathematical theory of referential intensions; this is a rigorous (algorithmic), structural modeling of meanings for the typed λ-calculus developed in (Moschovakis 2006), and so the article can be viewed as a contribution to the formal “logic of meaning”. We think, however, that some of our results are relevant to the discussion of these matters in the philosophy of language and in linguistics, and, in particular, to Kaplan’s work on the logic of indexicals. We will discuss briefly some of these connections in Section 5. 2.
Three formal languages
There are three (related) formal languages that we will deal with, the Language of Intensional Logic LIL of Montague (1973); the Two-sorted Typed λ-calculus Ty2 of Gallin (1975); and the extension Lλar of Ty2 by acyclic recursion in (Moschovakis 2006). We describe these briefly in this section, and in the next we summarize equally briefly the theory of referential intensions in Lλar , which is our main technical tool.1 All three of these languages start with the same, three basic types e : entities,
t : truth values,
s : states,
and, for the interpretation, three fixed, associated, non-empty sets Te = the entities,
Ts = the states,
Tt = the truth values = {0, 1, er}, (1)
where 1 stands for truth, 0 for falsity and er for “error”.2 The types are defined by the recursion3 σ :≡ e | s | t | (σ1 → σ2 ), (2) and a set Tσ of objects of type σ is assigned to each σ by adding to (1) the recursive clause4 T(σ→τ) = the set of all functions f : Tσ → Tτ .
(3)
For each type σ, there is an infinite sequence of variables of type σ v0σ , v1σ , . . . which range over Tσ . It is also useful for our purpose here to assume a fixed (finite) set K of typed constants as in Table 1, which we will use to specify the terms of all three languages. Each constant c ∈ K stands for some basic word of natural
Two aspects of situated meaning 59
John, I, he, thetemp Names, indexicals5 Sentences it rains Common nouns man Extensional intransitive verbs run Intensional intransitive verbs rise Extensional transitive verbs love The definite article the Propositional connectives &, ∨ (Basic) necessity operator de dicto modal operators Yesterday, Today de re modal operators Yesterday1 , Today1
: : : : : : : : : : :
e t e→t e→t (s → e) → t e → (e → t) (e → t) → e t → (t → t) (s → t) → t (s → t) → t (s → (e → t)) → (e → t)
Table 1. Some constants with their LIL-typing.
language or logic and is assigned a type τ which (roughly) corresponds to its grammatical category; notice though, that common nouns and extensional intransitive verbs are assigned the same type (e → t), because they take an argument of type e and (intuitively) deliver a truth value, as in the simple examples of rendering (formalization) in LIL, render
John is running −−→ run(John),
render
John is a man −−→ man(John).
For the fixed interpretation, each constant c of type σ is assigned a function from the states to the objects of type σ:6 if c : σ, then den(c) = c : Ts → Tσ .
(4)
Thus John is interpreted in each state a by John(a), the (assumed) specific, unique entity which is referred to by ‘John’ in state a.7 The de re modal operator Yesterday1 is interpreted by the relation on properties and individuals Yesterday1 (a)(p)(x) ⇐⇒ p(a− )(x), where for each state a, a− is the state on the preceding day, and similarly for Today1 . Starting with these common ingredients, the languages LIL, Ty2 and Lλar have their own features, as follows. 2.1.
The language of intensional logic LIL
Montague does not admit s as a full-fledged primitive type like e and t, but uses it only as the name of the domain in the formation of function types.
60 Eleni Kalyvianaki and Yiannis N. Moschovakis This leads to the following recursive definition of types in LIL: σ :≡ e | t | (s → σ2 ) | (σ1 → σ2 ).
(LIL-types)
We assume that all constants in the fixed set K are of LIL-type. The terms of LIL are defined by the recursion A :≡ x | c | A(B) | λ(x)(B) | ˇ(A) | ˆ(A)
(LIL-terms)
subject to some type restrictions, and each A is assigned a type as follows, where A : σ ⇐⇒ the type of A is σ. (LIL-T1) x ≡ viσ for some LIL-type σ and some i, and x : σ. (LIL-T2) c is a constant (of some Montague type σ), and c : σ. (LIL-T3) A : (σ → τ), B : σ and A(B) : τ. (LIL-T4) x ≡ viσ for some LIL-type σ and some i, B : τ and λ(x)(B) : (σ → τ). (LIL-T5) A : (s → τ) and ˇ(A) : τ. (LIL-T6) A : τ and ˆ(A) : (s → τ). In addition, the free and bound occurrences of variables in each term are defined as usual, and A is closed if no variable occurs free in it. A sentence is a closed term of type t. The constructs ˇ(A) and ˆ(A) are necessary because LIL does not have variables over states, and express (roughly) application and abstraction on an implicit variable which ranges over “the current state”. This is explained by the semantics of LIL and made explicit in the Gallin translation of LIL into Ty2 which we will describe in the next section. Semantics of LIL As usual, an assignment π is a function which associates with each variable x ≡ viσ some object π(x) ∈ Tσ . The denotation of each LIL-term A : σ is a function denLIL (A) : Assignments → (Ts → Tσ ) which satisfies the following, recursive conditions, where a, b range over the set of states Ts :8
Two aspects of situated meaning 61
(LIL-D1) denLIL (x)(π)(a) = π(x). (LIL-D2) denLIL (c)(π)(a) = c(a), as in (4).
(LIL-D3) denLIL (A(B))(π)(a) = denLIL (A)(π)(a) denLIL (B)(π)(a) . (LIL-D4) denLIL (λ(x)(B))(π)(a) = t → denLIL (B)(π{x := t})(a) , where x : σ and t ranges over the objects in Tσ (with σ a LIL-type).
(LIL-D5) denLIL (ˇ(A))(π)(a) = den(A)(π)(a) (a). (LIL-D6) denLIL (ˆ(A))(π)(a) = (b → denLIL (A)(π)(b)) (= denLIL (A)(π)). Consider the following four, simple and familiar examples which we will use throughout the paper to illustrate the various notions that we introduce; these are sentences whose denotations are independent of any assignment π, and so we will omit it. render
(5) John loves her −−→ love(John, her)
denLIL love(John, her) (a) = love(a)(John(a), her(a))
render John loves himself −−→ λ(x)love(x, x) (John) (6)
denLIL λ(x)love(x, x) (John) (a) = (t → love(a)(t,t))(John(a)) = love(a)(John(a), John(a)) The President is necessarily American
render −−→ ˆ(American(the(president))) (7)
denLIL (ˆ(American(the(president)))) (a)
= Nec(a) b → American(b)(the(b)(president(b))) I was insulted yesterday −−→ Yesterday1 (ˆbe insulted, I)
denLIL Yesterday1 (ˆbe insulted, I) (a) render
(8)
= Yesterday1 (a)(denLIL (ˆbe insulted)(a), denLIL (I)(a)) = Yesterday1 (a)(b → be insulted(b), I(a))
62 Eleni Kalyvianaki and Yiannis N. Moschovakis The temperature is ninety and rising
render −−→ λ(x)[ninety(ˇx) & rise(x)] (ˆ(thetemp)) (9) For (9), a computation similar to those in the examples above gives the correct, expected denotation. 2.2.
The two-sorted, typed, λ-calculus Ty2
The assumption that every term is interpreted in “the current state” and the lack of state variables are natural enough when we think of the terms of LIL as rendering expressions of natural language, but they are limiting and technically awkward. Both are removed in the two-sorted typed λ-calculus Ty2 , whose characteristic features are that it admits all types as in (2), and interprets terms of type σ by objects in Tσ . We fix a set of constants K G = {cG | c ∈ K} in one-to-one correspondence with the constants K of LIL9 . In accordance with the interpretation (rather than the formal typing) of LIL, if c : σ, then cG : (s → σ) and den(cG )(a) = cG (a) = c(a)
(a ∈ Ts ),
i.e., the object cG in Ty2 which interprets cG is exactly the object c which interprets c in LIL. The terms of Ty2 are defined by the recursion A :≡ x | cG | A(B) | λ(x)(B)
(Ty2 -terms)
where now x can be a variable of any type as in (2), and the typing of terms is obvious. Assignments interpret all variables, including those of type s, and denotations are defined naturally: (Ty2 −D1) den(x)(π) = π(x). (Ty2 −D2) den(cG )(π) = cG . (Ty2 −D3) den(A(B))(π) = den(A)(π) (den(B)(π)). (Ty2 −D4) den(λ(x)A(x))(π) = t → den(A)(π{x := t}) . We notice the basic property of the Ty2 -typing of terms: for every assignment π, (10) if A : σ, then den(A)(π) ∈ Tσ .
Two aspects of situated meaning 63
The Gallin translation For each LIL-term A and each state variable u representing “the current state”, the Gallin translation AG,u of A in Ty2 is defined by the following recursive clauses:10 [x]G,u :≡ x [c]G,u :≡ cG (u) [A(B)]G,u :≡ AG,u (BG,u ) [λ(x)(A)]G,u :≡ λ(x)(AG,u ) [ˇA]G,u :≡ AG,u (u) [ˆA]G,u :≡ λ(u)AG,u By an easy recursion on the LIL-terms, AG,u has the same LIL-type as A, and for every assignment π, if π(u) = a, then den(AG,u )(π) = denLIL (A)(π)(a). In effect, the Gallin translation expresses formally (within Ty2 ) the definition of denotations of LIL. Here are the Gallin translations of the standard examples above:11 [love(John, her)]G,u ≡ loveG (u)(JohnG (u), her G (u)) (11)
G,u ≡ λ(x)loveG (u)(x, x) (JohnG (u)) (12) λ(x) love(x, x) (John) G,u ˆ(American(the(president)))
≡ G (u) λ(u)AmericanG (u)(theG (u)(presidentG (u))) (13) G,u Yesterday1 (ˆbe insulted, I)
≡ Yesterday1 G (u) λ(u)be insultedG (u), IG (u) (14) G,u λ(x)[ninety(ˇx) & rise(x)] ˆ(thetemp)
≡ λ(x)[ninety G (u)(x(u)) & riseG (u)(x)] λ(u)thetempG (u) (15) Notice that the selected formal variable u occurs both free and bound in these Gallin translations—which may be confusing, but poses no problem.
64 Eleni Kalyvianaki and Yiannis N. Moschovakis 2.3.
The λ-calculus with acyclic recursion Lλar
We now add to Ty2 an infinite sequence of recursion variables or locations pσ0 , pσ1 , . . . for each type σ. In the semantics of the extended language Lλar , these will vary over the corresponding universe Tσ just as the usual (pure) variables v0σ , v1σ , . . ., but they will be assigned-to rather than quantified in the syntax, and so they will be treated differently by the semantics. The terms of Lλar are defined by the following extension of the recursive definition of the Ty2 -terms: A :≡ x | p | cG | A(B) | λ(x)(B) | A0 where {p1 := A1 , . . . , pn := An } (Lλar -terms) where x is a pure variable of any type; p is a location of any type; and the restrictions, typing and denotations are defined exactly as for Ty2 for all but the last, new acyclic recursion construct, where they are as follows. Acyclic recursive terms For A ≡ A0 where {p1 := A1 , . . . , pn := An } to be well-formed, the following conditions must be satisfied: (i) p1 , . . . , pn are distinct locations, such that the type of each pi is the same as that of the term Ai , and (ii) the system of simultaneous assignments {p1 := A1 , . . . , pn := An } is acyclic, i.e., there are no cycles in the dependence relation i ! j ⇐⇒ p j occurs free in Ai on the index set {1, . . . , n}. All the occurrences of the locations p1 , . . . , pn in the parts A0 , . . . , An of A are bound in A, and the type of A is that of its head A0 . The body of A is the system {p1 := A1 , . . . , pn := An }. To define the denotation function of a recursive term A, we notice first that by the acyclicity condition, we can assign a number rank(pi ) to each of the locations in A so that if p j occurs free in Ai , then rank(pi ) > rank(p j ).
Two aspects of situated meaning 65
For each assignment π then (which now interprets all pure and recursive variables), we set by induction on rank(pi ), pi (π) = den(Ai )(π{p j1 := p j1 , . . . , p jm := p jm }), where p j1 , . . . , p jm is an enumeration of the locations with rank(p jk ) < rank(pi ), (k = 1, . . . , m), and finally, den(A)(π) = den(A0 )(π{p1 := p1 , . . . , pn := pn }). For example, if
A ≡ λ(x)(p(x) & q(x)) (t) where p := λ(x)ninety G (u)(r(x)),
r := λ(x)x(u), q := λ(x)riseG (u)(x), t := λ(u)thetempG (u) , (16)
we can compute den(A)(π) = den(A) in stages, corresponding to the ranks of the parts, with a = π(u): Stage 1. r = (x → x(a)), so r(x) = x(a), q = (x → riseG (a)(x)) = riseG (a), so that q(x) = 1 ⇐⇒ x is rising in state a, and t = thetempG . Stage 2. p = (x → ninetyG (a)(r(x))), so p(x) = 1 ⇐⇒ x(a) = 90. Stage 3. den(A) = p(t) & q(t), so den(A) = 1 ⇐⇒ thetempG (a) = 90 & riseG (a)(thetempG ). We will use the familiar model-theoretic notation for denotational equivalence, |= A = B ⇐⇒ for all assignments π, den(A)(π) = den(B)(π). It is very easy to check that every Lλar -term A is denotationally equivalent with a Ty2 -term A∗ , and so Lλar is no-more expressive than Ty2 as far as denotations go; it is, however, intensionally more expressive than Ty2 , as we will see.
66 Eleni Kalyvianaki and Yiannis N. Moschovakis Congruence Two Lλar -terms are congruent if one can be obtained from the other by alphabetic changes of bound variables (of either kind) and re-orderings of the parts in the bodies of recursive subterms, so that, for example, assuming that all substitutions are free, λ(x)(A{z :≡ x}) ≡c λ(y)(A{z :≡ y}), A{p :≡ q} where {q := B{p :≡ q}} ≡c A where {p := B}, A where {p := B, q := C} ≡c A where {q := C, p := B}. All the syntactic and semantic notions we will define respect congruence, and so it will be convenient on occasion to identify congruent terms. Since Ty2 is a sublanguage of Lλar , we can think of the Gallin translation as an interpretation of LIL into Lλar ; and so we can apply to the terms of LIL the theory of meaning developed for Lλar in (Moschovakis 2006), which we describe next. 3.
Referential intension theory
The referential intension int(A) of a Lλar -term A is a mathematical (set-theoretic) object which purports to represent faithfully “the natural algorithm” (process) which “computes” den(A)(π) for each π. It models an intuitive notion of meaning for Lλar -terms (and the natural language expressions which they render), and it provides a precise relation ≈ of synonymy between terms which can be tested against our intuitions and other theories of meaning that are similarly based on “truth conditions”. Roughly: A ≈ B ⇐⇒ int(A) = int(B) (A, B in Lλar ),
(17)
where “A in Lλar ” naturally means that A is a Lλar -term. To facilitate the discussion of meaning in LIL, we also set A ≈LIL B ⇐⇒ AG,u ≈ BG,u
(A, B in LIL).
(18)
This relation models quite naturally (global) synonymy for terms of LIL. The operation A → int(A) and the relation of referential synonymy are fairly complex, and their precise definitions in (Moschovakis 2006) require the establishment of several technical facts. Here we will confine ourselves to a brief summary of the main results of referential intension theory, primarily so that this article can be read independently of (Moschovakis 2006).
Two aspects of situated meaning 67
There are two important points to keep in mind. First, variables and some very simple, immediate (variable-like) terms are not assigned referential intensions: they denote directly and immediately, without the mediation of a meaning. Thus (17) is not exactly right: it holds for proper (non-immediate terms), while for immediate terms synonymy coincides with denotational equality or (equivalently for these terms) congruence. The distinction between direct and immediate reference is precise but not just technical: it lies at the heart of the referential intension approach to modeling meaning, and it plays an important role in our analysis of examples from natural language. We will discuss it in Section 3.2. Second, the denotational rule of β-conversion
|= λ(x)A (B) = A{x :≡ B} does not preserve referential synonymy, so that, for example,
λ(x)love(x, x) (John) ≈LIL love(John, John). This is common in structural theories of meaning in which the meaning of a term A codes (in particular) the logical form of A; see (Moschovakis 2006) for a related extensive discussion. It is good to remember this here, especially as we render natural language phrases into LIL and then translate these terms into Ty2 and so into Lλar : we want rendering to preserve (intuitive) meaning, so that we have a chance of capturing it with the precisely defined referential intension of the end result, and so we should not lose it by carelessly applying β-conversions in some step of the rendering process. 3.1.
Reduction, irreducibility, canonical forms
The main technical tool of (Moschovakis 2006) is a binary relation of reduction between Lλar -terms, for which (intuitively) A ⇒ B ⇐⇒ A ≡c B or A and B have the same meaning and B expresses that meaning “more simply”. The disjunction is needed because the reduction relation is defined for all pairs of terms, even those which do not have a meaning, for which, however, the relation is trivial. We set A is irreducible ⇐⇒ for all B, if A ⇒ B, then A ≡c B,
(19)
68 Eleni Kalyvianaki and Yiannis N. Moschovakis so that the irreducible terms which have meaning, express that meaning “as simply as possible”. Theorem 1 (Canonical form) For each term A, there is a unique (up to congruence) recursive, irreducible term cf(A) ≡ A0 where {p1 := A1 , . . . , pn := An }, such that A ⇒ cf(A). We write A ⇒cf B ⇐⇒ B ≡c cf(A). If A ⇒ B, then |= A = B, and, in particular, |= A = cf(A). The reduction relation is determined by ten, simple reduction rules which comprise the Reduction Calculus, and the computation of cf(A) is effective. The parts Ai of cf(A) are explicit12 , irreducible terms; they are determined uniquely (up to congruence) by A; and they code the basic facts which are needed to compute the denotation of A, in the assumed fixed interpretation of the language. If A : t and den(A) = 1, then the irreducible parts of cf(A) can be viewed as the truth conditions which ground the truth of A. Variables and constants are irreducible, and so is the more complex looking term λ(x)loveG (u)(x, x). On the other hand, the term expressing John’s self-love in the current state is not:
λ(x)loveG (u)(x, x) (JohnG (u))
⇒cf λ(x)loveG (u)(x, x) ( j) where { j := JohnG (u)}. (20) For a more complicated example, the canonical form of the Gallin translation of the Partee term in (15) is the term (16). So canonical forms get very complex, as do their explicit, irreducible parts—which is not surprising, since they are meant to express directly the meanings of complex expressions. The specific rules of the Reduction Calculus are at the heart of the matter, of course, and they deliver the subtle differences in (formal) meaning with which we are concerned here. It is not possible to state or explain them in this article—they are the main topic of (Moschovakis 2006); but the most important of them will be gleaned from their applications in the examples below.
Two aspects of situated meaning 69
3.2.
Direct vs. immediate reference
An important role in the computation of canonical forms is played by the immediate terms. These are defined by X :≡ v | p | p(v) | λ(u)p(v),
(Immediate terms)
where v = (v1 , . . . , vn ),u = (u1 , . . . , um ) and v, v1 , . . . , vn , u1 , . . . , um are pure variables, while p is a location. Immediate terms are treated like variables in the Reduction Calculus; this is not true of constants (and other irreducible terms) which contribute in a non-trivial way to the canonical forms of the terms in which they occur. For example, runG (u)(p(v)) is irreducible, because p(v) is immediate, while runG (u)(JohnG (u)) is not: runG (u)(JohnG (u)) ⇒cf runG (u)( j) where { j := JohnG (u)}. In the intensional semantics of Lλar to which we will turn next, immediate terms refer directly and immediately: they are not assigned meanings, and they contribute only their reference to the meaning of larger (proper) terms which contain them. Irreducible terms also refer directly, in the sense that their meaning is completely determined by their reference; but they are assigned meanings, and they affect in a non-trivial (structural) way the meanings of larger terms which contain them. 3.3.
Referential intensions
If A is not immediate and A ⇒cf A0 where {p1 := A1 , . . . , pn := An }, then int(A) is the abstract algorithm which intuitively computes den(A)(π) for each assignment π as indicated in the remarks following (16), as follows: (i) Solve the system of equations di = den(Ai )(π{p1 := d1 , p2 := d2 , . . . , pn := dn })
(i = 1, . . . , n),
(which, easily, has unique solutions by the acyclicity hypothesis). (ii) If the solutions are p1 , . . . , pn , set den(A)(π) = den(A0 )(π{p1 := p1 , . . . , pn := pn }).
70 Eleni Kalyvianaki and Yiannis N. Moschovakis So how can we define precisely this “abstract algorithm”? The idea is that it must be determined completely by the head of A and the system of equations in its body, and it should not depend on any particular method of solving the system; so it is most natural to simply identify it with the tuple of functions (21) int(A) = ( f0 , f1 , . . . , fn ) defined by the parts of A, i.e., fi (d1 , . . . , dn , π) = den(Ai )(π{p1 := d1 , p2 := d2 , . . . , pn := dn }) (i ≤ n). Tuples of functions such as (21) are called recursors. For a concrete example, which also illustrates just how abstract this notion of meaning is, the referential intension of the Partee example (15) is determined by its canonical form AG,u in (16), and it is the recursor int(A) = ( f0 , f1 , f2 , f3 , f4 ), where f0 (p, r, q,t, π) = x → (p(x) & q(x))(t) , f1 (p, r, q,t, π) = x → ninetyG (π(u))(r(x)) , f2 (p, r, q,t, π) = x → x(π(u)) , f3 (p, r, q,t, π) = x → riseG (π(u))(x) , f4 (p, r, q,t, π) = thetempG . Theorem 2 (Compositionality) The operation A → int(A) on proper (not immediate) terms is compositional, i.e., int(A) is determined from the referential intensions of the proper subterms of A and the denotations of its immediate subterms. This does not follow directly from the definition of referential intensions that we gave above, via canonical forms, but it is not difficult to prove. 3.4.
Referential synonymy
Two terms A and B are referentially synonymous if either A ≡c B, or int(A) and int(B) are naturally isomorphic. Now this is tedious to make precise, but, happily, we don’t need to do this here because of the following
Two aspects of situated meaning 71
Theorem 3 (Referential Synonymy) For any two terms A, B of Lλar , A is referentially synonymous with B if and only if there exist suitable terms A0 , . . ., B0 , . . . such that A ⇒cf A0 where {p1 := A1 , . . . , pn := An }, B ⇒cf B0 where {p1 := B1 , . . . , pn := Bn }, and for i = 0, 1, . . . , n, |= Ai = Bi , i.e., for all π, den(Ai )(π) = den(Bi )(π). Thus the referential synonymy relation A ≈ B is grounded by a system of denotational identities between explicit, irreducible terms. It is important, of course, that the formal identities Ai = Bi ,
i = 0, . . . , n
can be computed from A and B (using the Reduction Calculus), although their truth or falsity depends on the assumed, fixed structure of interpretation and cannot, in general, be decided effectively. 4.
Two notions of situated meaning
We can now make precise the two, promised notions of situated meaning for terms of LIL, after just a bit more preparation. State parameters Intuitively, a notion of “situated meaning” of a LIL-term A : τ is a way that we understand A in a given state a; and so it depends on a, even when A is closed, when its semantic values do not depend on anything else. To avoid the cumbersome use of assignments simply to indicate this state dependence, we introduce a parameter a¯ for each state a, so that the definition of the terms of Lλar now takes the following form: A :≡ x | a¯ | p | cG | A(B) | λ(x)(B) | A0 where {p1 := A1 , . . . , pn := An } (Lλar -terms) Parameters are treated like free pure variables in the definition of immediate terms and in the Reduction Calculus; in fact, the best way to think of a¯ is as a free variable with preassigned value den(a)(π) ¯ =a which does not depend on the assignment π.
72 Eleni Kalyvianaki and Yiannis N. Moschovakis 4.1.
Factual content
The term ¯ AG,a¯ ≡ AG,u {u :≡ a}
(22)
expresses the Gallin translation of A at the state a—not only its denotation, but also its meaning or at least one aspect of it. Thus, for each proper LIL-term A and each state a, we set (23) FC(A, a) = int AG,a¯ . This is the (referential) factual content of A at the state a. For proper terms A, B and states a, b, we also set13 (A, a) is factually synonymous with (B, b) ⇐⇒ FC(A, a) = FC(B, b) ¯
⇐⇒ AG,a¯ ≈ BG,b . By the Referential Synonymy Theorem 3 then, we can read factual synonymy by examining the canonical forms of terms. For example, render
G ¯ (a), ¯ herG (a)) ¯ [John loves her]G,a¯ −−→ loveG (a)(John
¯ j, h) where { j := JohnG (a), ¯ h := herG (a)} ¯ ⇒cf loveG (a)( and G ¯ (a), ¯ MaryG (a)) ¯ [John loves Mary]G,a¯ −−→ loveG (a)(John render
¯ j, h) where { j := JohnG (a), ¯ h := MaryG (a)}, ¯ ⇒cf loveG (a)( so that ¯ = den herG (a) ¯ , if den MaryG (a) then [John loves her]G,a¯ ≈ [John loves Mary]G,a¯ , which expresses formally the fact that ‘John loves her’ and ‘John loves Mary’ convey the same “information” about the world at this state a. These two sentences are not, of course, synonymous, as it is easy to verify by the definition of ≈LIL in (18) and Theorem 3. Next consider example (8) which involves the indexical ‘Yesterday’. In (Frege 1918), Frege argues that
Two aspects of situated meaning 73
If someone wants to say today what he expressed yesterday using the word ‘today’, he will replace this word with ‘yesterday’. Although the thought is the same, its verbal expression must be different in order that the change of sense which would otherwise be effected by the differing times of utterance may be cancelled out. It appears that Frege’s “thought” in this case is best modeled by the factual content of the uttered sentence in the relevant state. In detail, suppose that at state a the speaker is DK and the time is 27 June 2005. If we consider the sentence ‘I am insulted today’ uttered at a state b = a− when the time is 26 June 2005, the speaker is again DK and nothing else has changed, then, according to Frege’s remark above, it should be that G,b¯
G,a¯
≈ I am insulted today . I was insulted yesterday This is indeed the case:
G,a¯ I was insulted yesterday
render −−→ Yesterday1 G (a) ¯ λ(u)be insultedG (u), IG (a) ¯
¯ q) where {p := λ(u)be insultedG (u), q := IG (a)} ¯ ⇒cf Yesterday1 G (a)(p,
G,b¯ render ¯ λ(v)be insultedG (v), IG (b) ¯ −−→ Today1 G (b) I am insulted today ¯ ¯ q) where {p := λ(v)be insultedG (v), q := IG (b)}, ⇒cf Today1 G (b)(p, and the canonical forms of these sentences at these states satisfy the conditions of Theorem 3 for synonymy—assuming, of course, that Yesterday1 and Today1 are interpreted in the natural way, so that for these a and b, Yesterday1 (a)(p)(x) ⇐⇒ Today1 (b)(p)(x)
(p : Ts → (Te → Tt ), x ∈ Te ).
On the other hand, example (7) shows that, in some cases, the factual content is independent of the state and incorporates the full meaning of the term: [The President is necessarily American]G,a¯
render −−→ G (a) ¯ λ(u)AmericanG (u)(theG (u)(presidentG (u))) ¯ where {q := λ(u)AmericanG (u)(t(u)), ⇒cf G (a)(q) t := λ(u)theG (u)(p(u)), p := λ(u)presidentG (u)}.
74 Eleni Kalyvianaki and Yiannis N. Moschovakis Notice that the state parameter a¯ occurs only in the head of the relevant canonical form, and so, with the “necessarily always” interpretation of that we have adopted, the factual content of this term is independent of the state a. 4.2.
Referential (global) meaning
A plausible candidate for the (global) referential meaning of a LIL-term A is the operation a → int AG,a¯ which assigns to each state a the factual content of A at a. We can understand this outside the formal system, as an operation from states to recursors; but we can also do it within the system, taking advantage of the abstraction construct of the typed λ-calculus and setting (24) M(A) = int λ(u)AG,u . It follows by Theorem 3 and the Reduction Calculus that for proper terms A, B, M(A) = M(B) ⇐⇒ λ(u)AG,u ≈ λ(u)BG,u ⇐⇒ A ≈LIL B, and so there is no conflict between this notion of global meaning and the referential synonymy relation between LIL-terms defined directly in terms of the Gallin translation. The recursor M(A) is expressed directly by the canonical form of λ(u)AG,u , which gives some insight into this notion of formal meaning. For example: λ(u)[John loves her]G,u −−→ λ(u)loveG (u)(JohnG (u), her G (u)) render
⇒cf λ(u)loveG (u)( j(u), h(u)) where { j := λ(u)JohnG (u), h := λ(u)her G (u)} ≈ λ(u)loveG (u)( j(u), h(u)) where { j := JohnG , h := herG } while render
λ(u)[John loves Mary]G,u −−→ λ(u)loveG (u)(JohnG (u), MaryG (u)) ⇒cf λ(u)loveG (u)( j(u), h(u)) where { j := λ(u)JohnG (u), h := λ(u)MaryG (u)} ≈ λ(u)loveG (u)( j(u), h(u)) where { j := JohnG , h := MaryG }. To “grasp” the meanings of these two sentences, as Frege would say, we need the functions love, John, Mary and her—not their values in any one, particular state, but their range of values in all states; and to realize that they are not synonymous, we need only realize that ‘her’ is not ‘Mary’ in all states.
Two aspects of situated meaning 75
4.3.
Local meaning
Once we have a global meaning of A, we can compute its local meaning at a state a by evaluation, and, again, we could do this outside the system by defining in a natural way an operation of application of a recursor to an argument; but since we already have application in the typed λ-calculus, we set, within the system, ¯ . (25) LM(A, a) = int (λ(u)AG,u )(a) This is the (referential) local meaning of A at a. For proper terms A, B and states a, b, we set (A, a) is locally synonymous with (B, b) ⇐⇒ LM(A, a) = LM(B, b)
¯ ¯ ≈ λ(v)BG,v (b). ⇐⇒ λ(u)AG,u (a) It is important to recall here that, in general,
¯ ≈ CG,a¯ , λ(u)CG,u (a) because β-conversion does not preserve referential synonymy. The three synonymy relations we have defined are related as one would expect: Lemma 1 (a) Referential synonymy implies local synonymy at any state, that is ¯ ≈ λ(u)BG,u (a) ¯ λ(u)AG,u ≈ λ(u)BG,u =⇒ λ(u)AG,u (a) (b) Local synonymy at a state implies factual synonymy at that state, ¯ ≈ λ(u)BG,u (a) ¯ =⇒ AG,a¯ ≈ BG,a¯ . λ(u)AG,u (a) Both parts of the lemma are easily proved using Theorem 3 and some simple denotational equalities between the parts of the relevant canonical forms. In the following sections, we consider some examples which (in particular) show that neither part of the Lemma has a valid converse. Perhaps most interesting are those which distinguish between factual and local synonymy, and show that the latter is a much more fine-grained relation, very close in fact to (global) referential synonymy.
76 Eleni Kalyvianaki and Yiannis N. Moschovakis 4.4.
Factual content vs. local meaning
In Section 4.1, we showed that for any state a, if her(a) = Mary(a), then [John loves her]G,a¯ ≈ [John loves Mary]G,a¯ . To check for local synonymy, we compute the canonical forms of these terms:
¯ λ(u)[John loves her]G,u (a)
render −−→ λ(u)loveG (u)(JohnG (u), her G (u)) (a) ¯
¯ ⇒cf λ(u)loveG (u)( j(u), h(u)) (a) where { j := λ(u)JohnG (u), h := λ(u)her G (u)} ≈ loveG (a)( ¯ j(a), ¯ h(a)) ¯ where { j := JohnG , h := herG } while
¯ λ(u)[John loves Mary]G,u (a)
render −−→ λ(u)loveG (u)(JohnG (u), MaryG (u)) (a) ¯
¯ ⇒cf λ(u)loveG (u)( j(u), h(u)) (a) where { j := λ(u)JohnG (u), h := λ(u)MaryG (u)} ¯ j(a), ¯ h(a)) ¯ where { j := JohnG , h := MaryG } ≈ loveG (a)( But herG = MaryG , and so these two sentences are not locally synonymous at a—although they have the same factual content at a. The example illustrates the distinction between factual content and local meaning: to grasp the factual content FC(John loves her, a) we only need know who ‘her’ is at state a; on the other hand, to grasp the local meaning LM(John loves her, a) we need to understand ‘her’ as a function on the states. This is what we also need in order to grasp the (global) referential meaning of ‘John loves her’, which brings us to the more difficult comparison between local and global meaning. 4.5.
Local vs. global synonymy
By the Reduction Calculus, if AG,u ⇒cf A0 where {p1 := A1 , . . . , pn := An }
Two aspects of situated meaning 77
then λ(u)AG,u ⇒cf
λ(u)(A0 {p1 :≡ q1 (u), . . . , pn :≡ qn (u)}) where q1 := λ(u)A1 {p1 :≡ q1 (u), . . . , p1 :≡ qn (u)}, .. . qn := λ(u)An {p1 :≡ q1 (u), . . . , pn :≡ qn (u)}
and ¯ ⇒cf λ(u)AG,u (a)
¯ λ(u)(A0 {p1 :≡ q1 (u), . . . , pn :≡ qn (u)}) (a) where q1 := λ(u)A1 {p1 :≡ q1 (u), . . . , p1 :≡ qn (u)}, .. . qn := λ(u)An {p1 :≡ q1 (u), . . . , pn :≡ qn (u)}
The computations here are by the most complex—and most significant—λrule of the Reduction Calculus, which, unfortunately, we cannot attempt to motivate here. The formulas do imply, however, that for any term B, ¯ ≈ λ(u)BG,u (a) ¯ λ(u)AG,u (a) if and only if BG,u ⇒cf B0 where {p1 := B1 , . . . , pn := Bn }, for suitable B0 , . . . , Bn , so that: For any i = 1, . . . , n,
(1)
|= λ(u)(Ai {p1 :≡ q1 (u), . . . , pn :≡ qn (u)}) = λ(u)(Bi {p1 :≡ q1 (u), . . . , pn :≡ qn (u)}), and ¯ ¯ . . . , pn :≡ qn (a)} ¯ |= A0 {u :≡ a}{p 1 :≡ q1 (a), ¯ ¯ . . . , pn :≡ qn (a)}. ¯ (2) = B0 {u :≡ a}{p 1 :≡ q1 (a), On the other hand, A ≈LIL B if (1) holds and instead of (2) the stronger |= λ(u)(A0 {p1 :≡ q1 (u), . . . , pn :≡ qn (u)}) = λ(u)(B0 {p1 :≡ q1 (u), . . . , p1 :≡ qn (u)}) (2∗ ) is true.
78 Eleni Kalyvianaki and Yiannis N. Moschovakis Thus, local synonymy is very close to global synonymy, the only difference being that for global synonymy we need the heads of the two terms to be denotationally equal for all states, while for local synonymy at a state a, we only need their heads to be denotationally equal at a. This explains why, by Lemma 1, the former implies the latter while the converse may fail. Natural examples which illustrate this distinction are hard to find, but the following one may, at least, be amusing. Consider a particular state a at which two common nouns are co-extensive – for example, ‘man’ and ‘human’. This was the case at the time just after God had created Adam but not yet Eve. At that state a, then, the sentences ‘Adam is a man’ and ‘Adam is a human’ are locally synonymous, since
¯ λ(u)[Adam is a man]G,u (a)
render −−→ λ(u)manG (u)(AdamG (u) (a) ¯
¯ where { j := λ(u)AdamG (u)} ⇒cf λ(u)manG (u)( j(u) (a)
¯ λ(u)[Adam is human]G,u (a)
render −−→ λ(u)humanG (u)(AdamG (u) (a) ¯
¯ where { j := λ(u)AdamG (u)} ⇒cf λ(u)humanG (u)( j(u) (a) and ¯ j(a)) ¯ = humanG (a)( ¯ j(a)). ¯ |= manG (a)( These sentences, of course, are not referentially synonymous, as they are not even factually synonymous in any reasonable state. 4.6.
Local synonymy across different states
Things get more complicated when we try to trace local synonymy between sentences at different states. In Section 4.1, it was shown that ‘Yesterday I was insulted’ uttered at a state a where the time is 27 June 2005 and ‘Today I am insulted’ uttered at state b where the time is 26 June 2005 are factually synonymous, provided that the two states are otherwise identical. By the Reduction Calculus,
Two aspects of situated meaning 79
G,u (a) ¯ λ(u) Yesterday I was insulted
render −−→ λ(u)Yesterday 1 G (u) λ(u)be insultedG (u), IG (u) (a) ¯
¯ where ⇒cf λ(u)Yesterday1 G (u)(p(u), q(u)) (a) {p := λ(u)λ(u)be insultedG (u), q := λ(u)IG (u)} and
G,u ¯ (b) λ(u) Today I am insulted
render ¯ −−→ λ(u)Today 1 G (u) λ(u)be insultedG (u), IG (u) (b)
¯ where ⇒cf λ(u)Today1 G (u)(p(u), q(u)) (b) {p := λ(u)λ(u)be insultedG (u), q := λ(u)IG (u)}, and these two canonical forms have the same bodies, so they will be locally synonymous if and only if their heads are denotationally equal. But ¯ ¯ q(b)) ¯ ¯ a), ¯ q(a)) ¯ = Today1 G (b)(p( b), |= Yesterday1 G (a)(p( on the plausible assumption that John is running while Mary sleeps (today and yesterday), taking π(p)(a) = π(p)(b) = runs,
π(q)(a) = John,
π(q)(b) = Mary
and computing the denotations of the two sides of this equation for the assignment π; so LM(Yesterday I was insulted, a) = LM(Today I am insulted, b). This argument is subtle and a little unusual, and it may be easier to understand in the next example, where we compare the local meanings of the same sentence, with no modal operator, on two different but nearly identical states. Consider ‘John runs’, in a and b:
λ(u)runG (u)(JohnG (u)) (a) ¯
¯ where { j := λ(u)JohnG (u))} ⇒cf λ(u)runG (u)( j(u)) (a)
80 Eleni Kalyvianaki and Yiannis N. Moschovakis
¯ λ(u)runG (u)(JohnG (u)) (b)
¯ where { j := λ(u)JohnG (u))} ⇒cf λ(u)runG (u)( j(u)) (b) Local synonymy requires that ¯ j(b)), ¯ ¯ j(a)) ¯ = runG (b)( |= runG (a)(
(26)
which means that for all functions j : Ts → Te , runG (a)( j(a)) = runG (b)( j(b)). Suppose further that the two states a and b are exactly the same except that the speaker is different—an aspect of the state that, intuitively, should not affect the meaning of ‘John runs’. In particular, the interpretation runG : s → (e → t) of the constant runG is the same in the two states, i.e., that whoever runs at state a also runs at state b and conversely. But unless either everyone or nobody runs in these states, (26) fails: just take an assignment π such as π( j)(a) = π( j)(b) and such that in both states π( j)(a) runs whereas π( j)(b) does not run. The examples suggest that synonymy across different states is a complex relation, and that when we feel intuitively that sentences may have the same “meaning” in different states, it is factual synonymy that we have in mind. 5.
Situated meaning in the philosophy of language
In this section we will investigate briefly the connection of the proposed aspects of situated meaning to indexicality, propositional attitudes and translation. 5.1.
Kaplan’s treatment of indexicals
Indexicality is a phenomenon of natural language usage, closely connected to situated meaning. Among its many proposed treatments, Kaplan’s theory of direct reference (Kaplan 1989) reaches some very interesting results which we can relate to the aspects of situated meaning introduced in this paper. Kaplan’s theory is expressed formally in the Logic of Demonstratives (LD), where each term or formula has two semantic values, Content and Character. The Content of a term or a formula is given with respect to a context, considered as context of utterance, and it is a function from possible circumstances, considered as contexts of evaluation, to denotations or truth
Two aspects of situated meaning 81
values, respectively. The Character of a term or a formula A is the function which assigns to each context the Content of A in that context. In (Kaplan 1978), it is argued that Thus when I say I was insulted yesterday
(K4)
specific content–what I said–is expressed. Your utterance of the same sentence, or mine on another day, would not express the same content. What is important to note is that it is not just the truth value that may change; what is said is itself different. Speaking today, my utterance of (K4) will have a content roughly equivalent to that which David Kaplan is insulted on 20 April 1973
(K5)
would have, spoken by you or anyone at any time. Kaplan gives formal definitions of these notions in LD, from which it follows that at a context of utterance where the speaker is David Kaplan and the time is 21 April 1973, the two sentences (K4) and (K5) have the same Content, that is the same truth value for every possible circumstance: but, of course, they have different Characters—(K5)’s Character is a constant function, whereas (K4)’s clearly depends on the context of utterance. In Lλar , these intuitively plausible semantic distinctions can be made with the use of the factual content and the global meaning which, roughly speaking, correspond to Kaplan’s Content and Character respectively. Suppose a is a state14 where the speaker is David Kaplan (or DK for short) and the time is 21 April 1973. As example (8) suggests, (K4) and (K5) are factually synonymous at a, that is
G,a¯ I was insulted yesterday
G,a¯ ≈ David Kaplan is insulted on 20 April 1973 .
Moreover, for any state b where DavidKaplan G (b) is David Kaplan,
G,a¯ I was insulted yesterday
G,b¯ ≈ David Kaplan is insulted on 20 April 1973 .
82 Eleni Kalyvianaki and Yiannis N. Moschovakis These two sentences, however, are not referentially (globally) synonymous,
G,u λ(u) I was insulted yesterday
G,u ≈ λ(u) David Kaplan is insulted on 20 April 1973 , since IG is not denotationally equivalent with DavidKaplanG nor is YesterG dayG 1 with on20April1973 . Notice that the indexical ‘I’ occurs within the scope of the modal operator ‘Yesterday’ in (K4), and in any such example in order to account for the directly referential usage of indexicals, we choose the de re reading of the operators, thus translating ‘Yesterday’ as Yesterday1 . Kaplan argues that (K4) has another characteristic as well. Consider two contexts c1 and c2 which differ with respect to agent and/or moment of time— that is, the aspects of the context of utterance which are relevant to this particular sentence. Then, its Content with respect to c1 is different with its Content with respect to c2 . Similarly, in Lλar , G,b¯
G,a¯
≈ I was insulted yesterday I was insulted yesterday for states a and b which differ in the same way as c1 and c2 do. It is clear from the examples that at least some of the aspects of indexicality which Kaplan (1989) seeks to explain can also be understood using factual content, with no need to introduce “contexts of utterance” and “contexts of evaluation”, which (to some) appear somewhat artificial. There are two points that are worth making. First, in Lλar , synonymy is based on the isomorphism between two recursors, which is a structural condition, whereas in LD the identity of Contents or Characters is defined as a simple equality between two functions. For example, consider ‘I am insulted’, a simpler version of (K4) with no modal operator, and suppose (as above) that there are states a and b which differ only in the identity of the speakers, call them Agenta and Agentb . Suppose also that both utterances of the sentence by the two agents are true. To show in LD that the two relevant Contents are different, we need to consider their values in contexts of evaluation other than that determined by the states a and b: the argument being that the interpretation function of the constant be insulted evaluated on some circumstances for the two different agents is not the same (because there are circumstances at which Agenta is insulted while Agentb is not), and so the two Contents are not identical. On the other hand, the factual content of this sentence in state a is expressed by the canonical form ¯ G (a)) ¯ ⇒cf be insultedG (a)(p) ¯ where {p := IG (a)}, ¯ be insultedG (a)(I
Two aspects of situated meaning 83
and the one for b is the same, but with b¯ in place of a. ¯ So, in Lλar , FC(I am insulted, a) = FC(I am insulted, b) simply because ¯ ¯ = IG (b). |= IG (a) There is no need to consider the values of the function be insultedG in any state, not even at a and b. Second, in Lλar there is the possibility to compare, in addition, the local meanings of the two sentences (K4) and (K5) at the specific state a. As one would expect, these are not locally synonymous at a, i.e.,
G,u (a) ¯ λ(u) I was insulted yesterday
G,u (a). ¯ ≈ λ(u) David Kaplan is insulted on 20 April 1973
This accounts for the fact that although “what is said” (the factual content) is the same, to understand this one must know that ‘I’ and ‘yesterday’ refer to DK and 20 April 1973 respectively; two sentences are locally synonymous in a state only when the fact that “they say the same thing” can be realized by a language speaker who does not necessarily know the references of the indexicals in them. 5.2.
What are the belief carriers?
The objects of belief must be situated meanings of some sort: can we model them faithfully by factual contents or local meanings, the two versions of situated meaning that we introduced? There are several well-known paradoxes which argue against taking factual contents as the objects of belief,15 but our pedestrian example (5) can serve as well. If belief respected factual content, then in a state a in which ‘her’ is Mary, an agent would equally believe ‘John loves her’ as she would ‘John loves Mary’; but we can certainly imagine situations in which the agent does not know that ‘her’ refers to Mary—we make this sort of factual error all the time, and it certainly affects our beliefs. Thus factual synonymy is not preserved by belief attribution. Local meanings are more promising candidates for belief carriers, especially as they eliminate this sort of belief paradox which depends on the agent’s mistaking the values of indexicals. Moreover, the discussion in Section 4.5 suggests that the local meaning LM(A, a) models what has sometimes
84 Eleni Kalyvianaki and Yiannis N. Moschovakis been called “the sentence A under the situation a” and has been proposed as the object of belief. So it would appear that of the known candidates, local meanings may be the best, formal representations of belief carriers. 5.3.
What is preserved under translation?
Faithful translation should also preserve some aspect of meaning, so which is it? It is clear, again, that it cannot be factual content, as ‘John loves her’ would never be translated as ‘Jean aime Marie’, whatever the state. Perhaps referential (global) synonymy or local synonymy are translation invariants, but there are no good arguments for one or the other of these—or for preferring one over the other, given how closely related they are. The question is interesting and we have no strong or defensible views on it—but it should be raised in the evaluation of any theory of meaning, and it almost never is. Acknowledgements The research of Eleni Kalyvianaki for this article was co-funded by the European Union - European Social Fund & National Resourses - EPEAEK II.
Notes 1. We will generally assume that the reader is reasonably familiar with Montague’s LIL and thus it will be clear to her that we employ a rather “simplified” version of this language, at least in what concerns the way natural language is translated into it. On the other hand, we will describe the basic ideas of Gallin (1975) and Moschovakis (2006), so that this article is largely independent of the details of those papers. 2. In fact, it is convenient (and harmless) to take Tt ⊆ Te , i.e., simply to assume that the truth values 0, 1, er are in Te , so that the denotations of sentences are treated like common entities, with Frege’s approval. The extra truth value is useful for dealing with simple cases of presupposition, but it will not show up in this article. 3. Pedantically, types are finite sequences (strings) generated by the distinct “symbols” e, s, t, (, → and ), and terms (later on) will similarly be strings from a larger alphabet. We use ‘≡’ to denote the identity relation between strings. 4. We assume for simplicity this standard (largest) model of the typed λ-calculus built from the universes Ts and Te . 5. We have assumed, for simplicity, a constant thetemp : e which we grouped with names and indexicals, because of its typing. 6. For the examples from natural language, we will assume some plausible properties of the interpretations of these constants, e.g., that there are states in which
Two aspects of situated meaning 85 some people are running while others sit, that John loves Mary in some states while he dislikes her in others, etc. None of these assumptions affect the logic of meaning which is our primary concern. 7. This ‘John’ is just a formal expression (a string of symbols), which, quite obviously, may refer to different objects in different states. “Proper names” which should be rigid designators by the prevailing philosophical view are more than strings of symbols, and in any case, the logic of meaning which concerns us here does not take a position on this (or any other) philosophical view. 8. We denote by π{x := t} the update of an assignment π which changes it only by re-assigning to the variable x : σ the object t ∈ Tσ : t, if viτ ≡ x, τ π{x := t}(vi ) = τ π(vi ), otherwise. 9. An alternative would be to re-type the same constants in K and distinguish between the LIL-typing and the Ty2 -typing of c. The method we adopted is (probably) less confusing, and it makes it easier to express the Gallin translation of LIL into Ty2 below. 10. We use ‘G’ instead of Gallin’s ‘*’ and we make the state variable explicit. 11. In example (15), for simplicity, the logic constant & is not translated into &G (u) since its denotation is independent of the state. 12. A term A is explicit if the constant where does not occur in it. 13. Factual synonymy can be expressed without the use of parameters by the follow¯ ing, simple result: AG,a¯ ≈ BG,b if and only if there exist terms A0 , . . ., B0 , . . ., as in the Referential Synonymy Theorem 3, such that for all assignments π, if π(u) = a and π(v) = b, then den(Ai G,u )(π) = den(Bi G,v )(π),
(i = 0, . . . n).
The same idea can be used to define the recursor FC(A, a) directly, without enriching the syntax with state parameters. 14. A state in Lλar acts both as context of utterance, thus disambiguating all occurrences of indexicals, names etc, and as context of evaluation, thus evaluating the denotations of verbs, adjectives etc. 15. See for example Russell’s “author of Waverley” example as presented in the Introduction of (Salmon and Soames 1988) or in (Church 1982).
References Church, Alonzo 1982 A remark conerning Quine’s paradox about modality. Spanish version in Analisis Filos´ofico 25–32, reprinted in English in (Salmon and Soames 1988).
86 Eleni Kalyvianaki and Yiannis N. Moschovakis Frege, Gottlob 1892 On sense and denotation. Zeitschrift f¨ur Philosophie und philosophische Kritik 100. Translated by Max Black in (Frege 1952) and also by Herbert Feigl in (Martinich 1990). 1918 Der Gedanke - Eine Logische Untersuchung. Beitrage ¨ zur Philosophie des deutschen Idealismus I 58–77. Translated as “Thoughts” and reprinted in (Salmon and Soames 1988). 1952 Translations from the Philosophical Writings of Gottlob Frege. Oxford: Blackwell. Edited by Peter Geach and Max Black. Gallin, Daniel 1975 Intensional and higher-order modal logic. Number 19 in NorthHolland Mathematical Studies. Amsterdam, Oxford, New York: North-Holland, Elsevier. Kaplan, David 1978 On the logic of demonstratives. Journal of Philosophical Logic 81– 98, reprinted in (Salmon and Soames 1988). 1989 Demonstratives An Essay on the Semantics, Logic, Metaphysics, and Epistemology of Demonstratives and Other Indexicals & Afterthoughts. In Joseph Almog, John Perry, and Howard Wettstein, (eds.), Themes from Kaplan, 481–614. Oxford University Press. Martinich, Aloysius P., (ed.) 1990 The Philosophy of Language. New York, Oxford: Oxford University Press, second edition. Montague, Richard 1973 The Proper Treatment of Quantification in Ordinary English. In Jaakko Hintikka et al., (eds.), Approaches to Natural Language: Proceedings of the 1970 Stanford Workshop on Grammar and Semantics, 221–224, reprinted in (Montague 1974). Dordrecht: D. Reidel Publishing Co. 1974 Formal Philosophy. New Haven and London: Yale University Press. Selected papers of Richard Montague, edited by Richmond H. Thomason. Moschovakis, Yiannis N. 2006 A logical calculus of logic and synonymy. Linguistics and Philosophy 29: 27–89. Salmon, Nathan and Scott Soames, (eds.) 1988 Propositions and attitudes. Oxford: Oxford University Press.
Further excursions in natural logic: The Mid-Point Theorems Edward L. Keenan
Abstact In this paper we explore the logic of proportionality quantifiers, seeking and formalizing characteristic patterns of entailment. We uncover two such patterns, though in each case we show that despite the naturaleness of proportionality quantifiers in these paradigms, they apply as well to some non-proportionality quantifiers. Even so we have elucidated some logical properties of natural language quantifiers which lie outside the generalized existential and generalized universal ones.
1.
Background
Pursuing a study begun in (Keenan 2004) this paper investigates inference patterns in natural language which proportionality quantifiers enter. We desire to identify such patterns and to isolate any such which are specific to proportionality quantifiers. Keenan (2004) identified the inference pattern in (1) and suggested that it involved proportionality quantifiers in an essential way. (1)
a.
b.
More than mn of the As are Bs. At least 1 − mn of the As are Cs. Ergo: Some A is both a B and a C. At least mn of the As are Bs. More than 1 − mn of the As are Cs. Ergo: Some A is both a B and a C.
To illustrate (1-a): If more than three tenths of the students are athletes and at least seven tenths are vegetarians then at least one student is both an athlete and a vegetarian. This is indeed a valid argument paradigm. However recently Westerståhl (pc) showed that the pattern (1-a), (1-b) is a special case of a more general one not specific to proportionality quantifiers but which includes them simply as a special case. His result supports the claim that proportionality quantifiers en-
88 Edward L. Keenan ter inference paradigms common to better understood classes of quantifiers. But it also leads us to question whether there are any inference patterns specific to proportionality quantifiers. To pursue these questions we need some background definitions. Definition 1 Given a domain E, the set GQE of generalized quantifiers over E =de f [P(E) → {0, 1}], the set of functions from P(E) into {0, 1}. Such functions will also be called (following Lindström (1966)) functions of type < 1 >. Interpreting P1 s, one place predicates, as elements of P(E) we can use type < 1 > functions as denotations of the DPs italicized in (2): (2)
a. b. c.
No teacher laughed at that joke. Every student came to the party. Most students are vegetarians.
So the truth value of (2-a) is the one that the function denoted by no teacher assigns to the denotation of the P1 laughed at that joke. The Dets no, every, and most combine with a single Noun to form a DP and are naturally interpreted by functions of type < 1, 1 >, namely maps from P(E) into GQE . They exemplify three different classes of type < 1, 1 > functions: the intersective, the co-intersective and the proportionality ones. Informally a D of type < 1, 1 > is intersective if its value at sets A, B just depends on A ∩ B, it is co-intersective if its value depends just on A \ B, and it is proportional if it just depends of the proportion of As that are Bs. Formally, we define these notions and a few others below in terms of invariance conditions. Definition 2 For D of type < 1, 1 >, 1.
a. b.
2.
a. b.
D is intersective iff for all sets A, B, X ,Y if A ∩ B = X ∩ Y then DAB = DXY . D is cardinal iff for all sets A, B, X ,Y if |A ∩ B| = |X ∩ Y | then DAB = DXY . D is co-intersective iff for all A, B, X ,Y if A \ B = X \ Y then DAB = DXY . D is co-cardinal iff for all A, B, X ,Y if |A \ B| = |X \ Y | then DAB = DXY .
3.
D is proportional iff for all A, B, X ,Y if |A ∩ B|/|A| = |X ∩Y |/|X | then DAB = DXY .
4.
D is conservative iff for all A, B, B if A∩B = A∩B then DAB = DAB .
Further excursions in natural logic: The Mid-Point Theorems 89
5.
D is permutation invariant iff for all A, B ⊆ E, all permutations π of E, DAB = D π(A)π(B).
One checks that NO, EVERY, and MOST defined below are intersective, co-intersective and proportional respectively. All three of these functions are permutation invariant and conservative. (3)
a. b. c.
/ NO(A)(B) = 1 iff A ∩ B = 0. / EV ERY (A)(B) = 1 iff A \ B = 0. MOST (A)(B) = 1 iff |A ∩ B| > 1/2 |A|.
Here is a representative sample of these three classes (our main concern in what follows). (4)
Some intersective Dets cardinal some, a/an, no, practically no, several, between six and ten, infinitely many, more than six, at least/exactly/just/only/fewer than/at most six, between six and ten, just finitely many, about/nearly/approximately a hundred, a couple of, a dozen, How many? non-cardinal Which?, more male than female, no...but John (as in No student but John came to the party)
(5)
Some co-intersective Dets co-cardinal every/all/each, almost all, all but six, all but at most six, all but finitely many non-co-cardinal every...but John
(6)
Some properly proportional Dets (proportional but not intersective or co-intersective). a. more than half the, less than two thirds, less than/at most/at least/exactly half, at most ten per cent, between a half and two thirds, between ten and twenty per cent, all but a tenth, almost a third, What percentage? b. most, every third (as in Every third student was inoculated), just/nearly/exactly/only/not one...in ten (as in Just one student in ten was inoculated), (almost) seven out of ten (as in Seven out of ten sailors smoke Players), between six out of ten and nine out of ten
So the proportionality Dets include mundane fractional and percentage expressions, (6-a), usually built on a partitive pattern with of followed by a definite DP, as in most of the students, a third of John’s students, ten per cent
90 Edward L. Keenan of those students, etc. (half is slightly exceptional, only taking of optionally: half the students, half of the students are both fine). The precise syntactic analysis of partitive constructions is problematic in the literature. Keenan and Stavi (1986) treat more than a third of the as a complex Det. But more usually linguists treat the expression following of as a definite DP and of expresses a partitive relation between that DP and the Det that precedes of. Barwise and Cooper (1981) provide a compositional semantics for this latter approach which we assume here. Proportionality Dets also include those in (6-b) which are not partitive, but are followed directly by the Noun as in the case of intersective and cointersective Dets. DPs built from proportionality Dets usually require that their Noun argument denotes a finite set. We could invent a meaning for a third of the natural numbers but this would be a creative step, extending natural usage not simply an act of modeling ordinary usage. In general the functions denoted by proportionality Dets are not intersective or co-intersective, though a few extremal cases are: Exactly zero per cent = no, a hundred percent = every, more than zero per cent = some, less than a hundred per cent = not every. Complex members in each of these seven classes can be formed by taking boolean compounds in and (but), or, not, and neither...nor. We note without proof: Proposition 1 1. GQE = [P(E) → {0, 1}] is a (complete atomic) boolean algebra inheriting its structure pointwise from {0, 1}. 2. Each of the classes K defined in Definition 2 is closed under the pointwise boolean operations and is thus a (complete, atomic) boolean subalgebra of [P(E) → GQE ]. So if D of type < 1, 1 > is intersective (cardinal, co-intersective,...) so is ¬D, which maps each A to ¬(D(A)), the complement of the GQ D(A). Thus boolean compounds of expressions in any of these classes also lie in that class. E.g. at least two and not more than ten is cardinal because at least two and more than ten are, etc. We write INTE (CARDE ,...) for the set of intersective (cardinal, ...) functions of type < 1, 1 > over a domain E, omitting the subscript E when no confusion results. Many relations between these subclasses of Dets are known. E.g. INT , CO-INT , PROP are all subsets of CONS; CARD and CO-CARD are PI (permutation invariant, defined in Section 2.3) subsets of INT and CO-INT respectively. When E is finite CARD = INT ∩ PI and CO-CARD = CO-INT ∩ PI. And an easily shown fact, used later, is:
Further excursions in natural logic: The Mid-Point Theorems 91
Proposition 2 INTE ∩CO − INTE = {0, 1}, where 0 is that constant function of type < 1, 1 > mapping all A, B to 0; 1 maps all A, B to 1. Proof. One checks easily that 0 and 1 are both intersective and co-intersective. For the other direction let D ∈ INTE ∩ CO − INTE . Then for A, B arbitrary, / DAB = D(A∩B)(E), since D is intersective, = D(0)(E) since D is co-intersective. Thus D is constant, so D = 0 or D = 1. Thus only the two trivial Det functions are both intersective and co-intersective. Further Fact 1 In general Dets are not ambiguous according as their denotations are intersective or co-intersective. fewer than zero denotes 0 which is both intersective and co-intersective, but Fact 1 says that no Det expression has two denotations, one intersective and the other co-intersective. 2.
Proportionality Dets
We begin with some basic facts regarding proportionality Dets. 2.1.
Not first order definable
Barwise and Cooper (1981) argue that MOST as defined here, is not definable in first order logic. See also Westerståhl (1989). The arguments given in these two sources extend to the non-trivial proportionality Dets - those which are not also intersective or co-intersective. Given that the proportionality Dets in general are not first order definable (FOD) it is unsurprising that we have little understanding of their inferential behavior as inference patterns have been best studied for first order expressions. 2.2.
Not sortally reducible
We say that a possible Det function D is sortally reducible iff there is a two place boolean function h such that for all subsets A, B of E, DAB = D(E)(h(A, B)). Note that intersective D and co-intersective D are sortally reducible, as illustrated below with some and all: (7)
a.
Some poets are socialists.
92 Edward L. Keenan b.
Some individuals are both poets and socialists.
c. d.
All poets are socialists. All individuals are either not poets or are socialists (≡ All individuals are such that if they are poets then they are socialists).
In fact Keenan (1993) shows that the conservative D which are sortally reducible are just the intersective and co-intersective ones. Most reasoning techniques used with formulas of the form ∃xφ or ∀xφ involve removing the quantifiers, reasoning with the resulting formula, and then restoring the quantifiers when needed. But such techniques will not apply directly to Ss built with proper proportionality quantifiers as they do not admit of a translation which eliminates the Noun domain of the variable in favor of the entire universe as in (7-b) and (7-d) above.
2.3.
Permutation invariant
Given a permutation h of E (so h is a bijection from E to E) we extend h to subsets of E by setting h(X ) = {h(x)|x ∈ X }, all X ⊆ E. And a possible Det denotation D is said to be PI (permutation invariant) iff for all permutations h of E, D(A)(B) = D(h(A), h(B)). Proportionality Dets (over finite E) always denote PI functions (in distinction for example to no ... but John or Which? among the intersective Dets).
2.4.
Two place variants
Proportionality Dets have two place variants like intersective Dets, as in: (8)
A greater percentage of teachers than (of) students signed the petition. The same proportion of little boys as (of) little girls laugh at funny faces. Proportionately fewer female students than male students get drafted.
(9)
(A GREAT ER PERCENTAGE OF A T HAN B)(C) = 1 |B∩C| iff |A∩C| |A| > |B| .
Further excursions in natural logic: The Mid-Point Theorems 93
3.
Inference paradigms
To begin our study of inference paradigms proportionality Dets enter we first review Westerståhl’s result concerning our previous attempt (Keenan 2004). That work built on three operations defined on GQs: complement, postcomplement, and dual. Complement has already been defined (pointwise) above. For the others: Definition 3 (Postcomplement and dual) a. b.
F, the postcomplement of F, is that GQ mapping each B to F(¬B), that is, to F(E \ B). F d , the dual of F, =def ¬(F). Note that ¬(F) = (¬F), so we may omit parentheses.
We extend these operations pointwise to type < 1, 1 > functions: Definition 4 For D of type < 1, 1 >, ¬D, D, and Dd are those type < 1, 1 > functions defined by: a. b. c. 3.1.
¬D maps each set A to ¬(D(A)). D maps each set A to (D(A)). Dd maps each set A to (D(A))d . Some examples
We write negX for a DP which denotes the complement of the denotation of X; similarly Xneg denotes its postcomplement and dualX its dual. X negX Xneg dualX
some no not every every
every not every no some
more than half at most half less than half at least half
less than half at least half more than half at most half
So the complement of every boy is not every boy, its postcomplement is no boy, and its dual is some boy. And the complement of more than half is at most half, its postcomplement is less than half, and its dual at least half. Observe that the postcomplement and dual operators preserve the property of being proportional but interchange the intersective and co-intersective Dets: Proposition 3 For D of type < 1, 1 >, a.
if D is proportional so are D and Dd , but
94 Edward L. Keenan if D is intersective (cardinal), D and Dd and both co-intersective (co-cardinal), and if D is co-intersective (co-cardinal), then D and Dd are both intersective (cardinal).
b. c.
Proof sketch. We show b. above, as it plays a role in our later discussion. Let D be intersective. We show that D is co-intersective. Let A \ B = X \ Y . We must show that DAB = DXY . But DAB = (DA)(B) = DA(¬B) = DA(A ∩ ¬B), since D is intersective, = DA(A \ B) = D(E)(A ∩ (A \ B)) = D(E)(A \ B) = D(E)(X \Y ) = ... = DXY , completing the proof. To see that Dd is co-intersective we observe that D is by the above and so then is ¬(D) = Dd since pointwise complements preserves co-intersectivity (Proposition 1).
3.2.
Westerståhl’s generalization
We repeat (1-a) above, (1-b) being similar. (10)
a.
More than mn of the As are Bs. At least 1 − mn of the As are Cs. Ergo: Some A is both a B and a C.
Now the relevant Dets are interpreted as in (11): (11)
For 0 ≤ n ≤ m, 0 < m, (MORE T HAN mn )(A)(B) = 1 iff A = 0/ and
|A∩B| n |A| > m , n (LESS T HAN 1 − mn )(A)(B) = 1 iff A = 0/ and |A∩B| |A| < 1 − m , n (AT LEAST 1 − mn )(A)(B) = 1 iff A = 0/ and |A∩B| |A| ≥ 1 − m .
Westerståhl (pc) notes that the DPs in the premisses in (10) are duals. (LESS T HAN 1 − mn is the postcomplement of (MORE T HAN mn ) and (AT LEAST 1 − mn ) is its dual. Theorem 1 (Westerståhl’s Generalization) For D conservative, the following properties are equivalent: 1. D is right increasing (= increasing on its second argument). 2. D(A)(B) ∧ Dd (A)(C) ⇒ SOME(A)(B ∩C).
Further excursions in natural logic: The Mid-Point Theorems 95
Proof. “⇐” Let D be conservative and assume [2]. We show [1]. Let B ⊆ B
and assume DAB. We must show DAB . Assume otherwise. So DAB = 0. Then (DA)(¬B ) = 0, so ¬(DA)(¬B ) = Dd (A)(¬B ) = 1. So by [2], / contradicting that B ⊆ B . Thus DAB = 1, and D is right A ∩ B ∩ ¬B = 0, increasing. “⇒” Let D be right increasing and assume DAB = 1 and Dd AC = 1, whence by the conservativity of D and Dd we have DAA ∩ B = 1 and Dd AA ∩ / Then A ∩ B ⊆ C = 1. Assume leading to a contradiction that A ∩ B ∩ C = 0. ¬C, so D(A)(¬C) = 1 by the right increasingness of D. Thus D(A)(C) = 1. / whence But ¬D(A)(C) = Dd AC = 1, a contradiction. So A ∩ B ∩ C = 0, SOME(A)(B ∩C) = 1, establishing [2]. So [2] generalizes the argument paradigm in (1-a), (1-b) and does not seem specific to proportionality Dets since it holds for DPs built from right increasing conservative Dets in general. So far however I have found it difficult to find examples of non-proportional Dets which instantiate [1] and [2]. One’s first guess, some and every, satisfies [1] and [2] but these Dets are, recall, proportional: SOME = MORE THAN 0% and EVERY = 100%. The only other cases I can think of are ones that make presuppositions on the cardinality of their first argument. Perhaps the least contentious is both and at least one of the two. Both students are married and At least one of the two students is a vegan imply Some student is both married and a vegan. But this instance does require taking at least one of the two as a Det, which we decided against earlier. 4.
The Mid-Point Theorems
We seek now additional inference patterns that proportionality quantifiers naturally enter. Keenan (2004) observes that natural languages present some non-trivial DPs distinct from first order ones which always assign the same truth value to a predicate and its negation, as in (12-a), (12-b) and (12-c), (12-d). (13) is the general form of the regularity. Proposition 4 is then immediate. (12)
(13)
a. b.
Exactly half the students got an A on the exam. Exactly half the students didn’t get an A on the exam.
c. d.
Between a third and two thirds of the students got an A. Between a third and two thirds of the students didn’t get an A.
DP(P1 ) = DP(not P1 ).
96 Edward L. Keenan Proposition 4 The DPs which satisfy (13) are those which denote in FIX () = {F ∈ GQE |F = F}. At issue then is a syntactic question: just which DPs do satisfy (13)? Let us limit ourselves for the moment to ones of the form [Det+N], as we are interested in isolating the role of the Det. And in characterizing that class do the proportionality Dets play any sort of distinguished role? It seems to me that they do, though I can only give a rather informal statement of that role. Still that informal statement at least helps us to understand why many of the natural examples of Dets which denote in FIX () are proportional. We begin by generalizing the observation in (12). Definition 5 For p and q fractions with 0 ≤ p ≤ q ≤ 1, a. b.
(BETW EEN p AND q)(A)(B) = 1 iff A = 0/ and p ≤ (MORE T HAN p AND LESS T HAN q)(A)(B) = 1 iff A = 0/ and p < |A∩B| |A| < q.
|A∩B| |A|
≤ q.
Thus (12-a) is true iff there is at least one student and at least a third of the students passed and not more than two thirds passed. Dets of the forms in Def 5 are fixed by postcomplement, when the fractions p, q lie between 0 and 1 and sum to 1. The condition that p + q = 1 guarantees that p and q are symmetrically distributed around the midpoint 1/2. Clearly p ≤ 1/2 since p ≤ q and p + q = 1. Similarly 1/2 ≤ q. The distance from 1/2 to p is 1/2 − p, and that from 1/2 to q is q − 1/2. And 1/2 − p = q − 1/2 iff, adding 1/2 to both sides, 1 − p = q, iff 1 = p + q. And we have: Theorem 2 (Mid-Point Theorem) Let p, q be fractions with 0 ≤ p ≤ q ≤ 1, p + q = 1. Then (BETWEEN p AND q) and (MORE THAN p AND LESS THAN q) are both fixed by . The theorem (plus pointwise meets) guarantees the logical equivalence of the (a,b) pairs below: (14)
a. b.
(15)
a.
Between one sixth and five sixths of the students are happy. ≡ Between one sixth and five sixths of the students are not happy. More than three out of ten and less than seven out of ten teachers are married. ≡
Further excursions in natural logic: The Mid-Point Theorems 97
b.
More than three out of ten and less than seven out of ten teachers are not married.
A variant statement of this theorem using percentages is: (16)
Let 0 ≤ n ≤ m ≤ 100 with n + m = 100. Then Between n and m per cent of the As are Bs. ≡ Between n and m per cent of the As are not Bs.
For example choosing n = 40 we infer that (17-a) and (17-b) are logically equivalent: (17)
a. b.
Between 40 and 60 per cent of the students passed. Between 40 and 60 per cent of the students didn’t pass.
And Theorem 2 and (16) are (mutual) entailment paradigms which appear to use proportionality Dets in an essential if not completely exclusive way. Many of the pairs of proportional Dets will not satisfy the equivalence in 2 since their fractions do not straddle the mid-point appropriately. And Dets such as between 10 and 20 per cent do not satisfy (16) for the same reason. A very large class of complex proportional Dets which satisfy 2 or (16) is given by Theorem 3. Theorem 3 FIX () is closed under the pointwise boolean operations and so is a complete (and thus) atomic subalgebra of Type . The proof can be found in the Appendix. And given our earlier observation that proportionality functions are closed under the pointwise boolean operations we infer that all the Dets that can be built up as boolean compounds of the basic fractional and percentage Dets in 2 and (16) respectively are both proportional and fixed by , so they satisfy the equivalence in (13). For example (18)
Either less than a third or else more than two thirds of the As are Bs. ≡ Either less than a third or else more than two thirds of the As are not Bs.
Proof. The Det in this example denotes the boolean complement of BETWEEN A THIRD AND TWO THIRDS and is thus proportional and fixed by .
98 Edward L. Keenan It is perhaps worth noticing what happens with proportional Dets of the form Between p and q when their distribution with respect to the mid-point (1/2, 50%) changes. If both p and q lie below, or both above the midpoint then we have: Proposition 5 If 0 < p ≤ q < 1/2 or 1/2 < p ≤ q < 1 then Between p and q of the As are Bs. ≡ It is not the case that between p and q of the As are not Bs. Thus such Det pairs satisfy the equivalences in (19). (19)
D(A)(B) = Dd (A)(B) = (¬D(A))(¬B) = ¬(D(A)(¬B)).
In contrast if the fraction (percentage) pairs p, q include the mid-point but are not centered then no entailment relation in either direction holds. In (20-a), (20-b) neither entails the other: (20)
a. b.
5.
Between a third and three quarters of the students passed the exam. Between a third and three quarters of the students didn’t pass the exam.
Generalizing the Mid-Point Theorem
We observe first that the proportionality Dets differ from the intersective and co-intersective ones in being closed under the formation of postcomplements:
Proposition 6 If D of type < 1, 1 > is intersective or co-intersective, and D = D then D is trivial (D = 0 or D = 1). Proof. Let D be intersective. Then D is co-intersective by Proposition 3 (b). So D = D is co-intersective and hence trivial. For D co-intersective the argument is dual. Moreover the expression of the postcomplement relation is natural and does not use a distinctive syntax. Here are the simplest cases:
Further excursions in natural logic: The Mid-Point Theorems 99
POSTCOMPLEMENT
(21)
more than mn exactly mn at most mn more than n% exactly n% at most n%
less than 1 − mn exactly 1 − mn at least 1 − mn less than 100 − n% exactly 100 − n% at least 100 − n%
Notice that in our first group mn ranges over all fractions and so includes 1 − mn = m−n m . Similarly in the second group n ranges at least over the natural numbers between 0 and 100 (inclusive) so includes both n% and (100 − n)%. Thus the linguistic means we have for expressing ratios covers proportionality expressions and their postcomplements indifferently. (Note that postcomplement is symmetric: D = F iff F = D). Recall also that all the natural classes we have adduced are closed under the pointwise boolean operations, expressible with appropriate uses of and, or and not. Now recall the fractions p, q for which the Mid-Point Theorem holds. If p = mn and p and q sum to 1 then q = 1 − mn . And more than mn and less then 3 ths of the As are Bs then less 1 − mn are postcomplements. (If more than 10 7 than 10 ths of the As are non-Bs). Similarly between mn and 1 − mn just means the same as at least mn and at most 1 − mn . So we have Theorem 4 (Generalized Mid-Points) For D of type < 1, 1 >, (D ∧ D) and (D ∨ D) are fixed by , as are their complements (¬D ∨ Dd ) and (¬D ∧ Dd ). Partial proof a. (D ∧ D) = D ∧ D = D ∧ D = D ∧ D. b. ¬(D ∧ D) = (¬D ∨ ¬D) = (¬D ∨ Dd ), and (¬D ∨ Dd ) = (¬D ∨ Dd ) = (Dd ∨ ¬D) = (¬D ∨ Dd ).
These proofs use the following proposition. Proposition 7 The postcomplement function is self inverting (D = D) and thus bijective, and it commutes with ¬ and ∧ thus is a boolean automorphism of GQE . Below we give some (more) examples of proportionality Dets which are fixed by postcomplement. Colloquial expression may involve turns of phrase other than simple conjunction and disjunction.
100 Edward L. Keenan (22)
Some examples A. more than three tenths but less than seven tenths more than three out of ten but less than seven out of ten more than thirty per cent but less than seventy per cent exactly a quarter or exactly three quarters exactly one in four or exactly three out of four exactly twenty-five per cent or exactly seventy-five percent at least three tenths and at most seven tenths at least three out of ten and at most seven out of ten at least thirty per cent and at most seventy per cent between a quarter and three quarters between twenty-five per cent and seventy-five per cent exactly one (student) in ten or exactly nine (students) in ten B.
not more than three tenths or not less than seven tenths ≡ at most three tenths or at least seven tenths more than three out of ten and less than seven out of ten ≡ at most three out of ten or at least seven out of ten more than thirty per cent and less than seventy per cent ≡ at most thirty per cent or at least seventy per cent not at least three tenths or not at most seven tenths ≡ less than three tenths or more than seven tenths not at least three out of ten or not at most seven out of ten ≡ less than three out of ten or more than seven out of ten not at least thirty per cent or not at most seventy per cent ≡ less than thirty per cent or more than seventy per cent
6.
Summary
So far we have semantically characterized a variety of Dets which build DPs satisfying (13), as illustrated in (23): (23)
a.
More than three out of ten but less than seven out of ten students are vegans.
Further excursions in natural logic: The Mid-Point Theorems 101
b. a. b.
More than three out of ten but less than seven out of ten students aren’t vegans. At least three out of four or else at most one out of four students are vegans. At least three out of four or else at most one out of four students aren’t vegans.
But our paradigms above are given for generalized quantifiers in general, not just proportionality ones. To what extent can they be instantiated by Dets that are not proportionality ones? Proposition 6 tells us that, putting the trivial Dets aside, we cannot find an an intersective Det which is its own postcomplement, nor can we find a co-intersective one meeting that condition. But we can choose < Det, Det + neg > pairs where one is intersective and the other co-intersective. Here are a few examples. POSTCOMPLEMENT
(24)
some no exactly five at most five just finitely many no. . .but John
not all every all but five all but at most five all but finitely many every. . .but John
The left hand members of the table above are all intersective, the right hand members co-intersective. And clearly (25-a), (25-b) are logically equivalent, as are (26-a), (26-b): (F ∧ F)
(25)
a. b.
Some but not all students read the Times. Some but not all students don’t read the Times.
(26)
a. b.
Either all or none of the students will pass that exam. (F ∨ F) Either all or none of the students won’t pass that exam.
Note that the (compound) Dets in these two examples are properly proportional: some but not all ≡ more than zero per cent and less than 100 per cent, and all or none ≡ either 100 per cent or else exactly zero per cent. For the record, (27)
some but not all, which denotes (SOME ∧ ¬ALL), is proportional and not intersective or co-intersective (E assumed to have at least two elements).
102 Edward L. Keenan Basically some but not all fails to be intersective because of the not all part which is not intersective; and it fails to be co-intersective because some fails to be. To see that it is proportional, suppose that the proportion of As that are Bs is the same as the proportion of Xs that are Ys. Then if Some but not all As are Bs is true then at least one A is a B and at least one A is not a B, so the percentage of As that are Bs lies strictly between 0 % and 100 %, which is exactly where the percentage of Xs that are Ys lies, whence some but not all Xs are Ys. One sees then that the (complete) boolean closure of INTE ∪CO-INTE includes many functions that lie outside INTE and CO-INTE . In fact, Keenan (1993), this closure is exactly the set of conservative functions and so includes in particular all the conservative proportional ones. Note now however that examples (28-a), (28-b) are logically equivalent, as the < 1, 1 > functions the Dets denote are postcomplements, but either exactly five or else all but five is not proportional: (28)
a. b.
Either exactly five or else all but five students came to the party. Either exactly five or else all but five students didn’t come to the party.
To see this let A have 100 members, just five of which are Bs. The D denoted by the Det in (28-a) maps A, B to 1. But for |X | = 1, 000 and |X ∩ Y | = 50, 1 , is the that D maps X ,Y to 0, even though the proportion of Xs that are Ys, 20 same as the proportion of As that are Bs. Clearly then certain boolean compounds of intersective with co-intersective Dets yields some non-proportional Dets which satisfy Theorem 4, so that paradigm is not limited to proportionality Dets. A last case of DPs that may satisfy Theorem 4 is given by partitives of the form in (29): (29)
a. b.
(EX ACT LY n OF T HE 2n)(A)(B) = 1 iff |A| = 2n and |A ∩ B| = n. (BETW EEN n and 2n o f the 3n)(A)(B) = 1 iff |A| = 3n and n ≤ |A ∩ B| ≤ 2n (n > 0).
Of course in general DPs of the form exactly n of the m Ns are not fixed by . But in the case where m = 2n they are. Note that if we treat exactly n of the m as a Det (an analysis that we, along with most linguists, reject) we have: (30)
For m > n, exactly n of the m is in general not intersective, cointersective or proportional (but is conservative and permutation invariant).
Further excursions in natural logic: The Mid-Point Theorems 103
Acknowledgments The paper was supported by an BSF Grant # 1999210. Appendix This appendix contains the proofs of Proposition 1 and Theorems 2 and 3. Proposition 1 2. Each of the classes K defined in Definition 2 is closed under the pointwise boolean operations and is thus a (complete, atomic) boolean subalgebra of [P(E) → GQE ]. Proof sketch. (1b). Let D be conservative, let A ∩ B = A ∩ B . We show that (¬D)(A)(B) = (¬D)(A)(B ). (¬D)(A)(B) = . . . = ¬(DAB) = ¬(DA(A ∩ B)) = ¬(DA(A ∩ B )) = ¬(DAB ) = . . . = (¬D)(A)(B ). For D, let A ∩ B = A ∩ B . Then DAB = DA(¬B) = DA(A \ B) = DA(A \ (A ∩ B)) = DA(A \ (A ∩ B )) = DA(A \ B ) = DA(¬B ) = D(A)(B). (2c). Let D be intersective. Let A \ B = X \ Y and show that DAB = DXY . DAB = (DA)(B) = DA(¬B) = DA(A \ B) = D(A \ B)(A \ B), since A ∩ (A \ B) = (A \ B) ∩ (A \ B) = D(X \Y )(X \Y ) = DX (X \Y ) = DX (¬Y ) = (DX )(Y ) = DXY . Further, since D is intersective so is ¬D by Prop 1, whence by the above, (¬D) = Dd is co-intersective. Theorem 2 (First Mid-Point Theorem) Let p, q be fractions with 0 ≤ p ≤ q ≤ 1, p + q = 1. Then (BETWEEN p AND q) and (MORE THAN p AND LESS THAN q) are both fixed by . Proof. Assume (BETWEEN p AND q)(A)(B) = 1. Show (BETWEEN p AND q)(A)(¬B) = 1. Suppose leading to a contradiction that |A∩¬B| |A| < p. Then the percentage of As that are Bs is greater than q, contrary to assumption. The second case in which |A∩¬B| |A| > q is similar, hence the percentage of As that aren’t Bs lies between p and q. Theorem 3 FIX () is closed under the pointwise boolean operations and so is a complete (and thus) atomic subalgebra of Type . Proof. a. Let D ∈ FIX (). We must show that for all sets A, (¬D)(A) = ((¬D)(A)), that is, ¬D is fixed by . Let A, B arbitrary. Then
104 Edward L. Keenan (¬D)(A)(B)
= ¬(D(A)(B)) = ¬((D(A))(B)) = ¬(D(A))(B) = ((¬D)(A))(B)
Pointwise ¬ (twice) D(A) is fixed by Pointwise ¬ Pointwise ¬
Thus (¬D)(A) = ((¬D)(A)), as was to be shown. b. Show (D ∧ D ) = (D ∧ D ), i.e. show (D ∧ D )(A) = ((D ∧ D )(A)) (D ∧ D )(A)(B)
= (DA ∧ D A)(B) = ((DA) ∧ (D A))(B) = (DA)(B) ∧ (D A)(B) = DA(¬B) ∧ D A(¬B) = (DA ∧ D A)(¬B) = (D ∧ D )(A)(¬B) = (D ∧ D )(A)(B)
Essentially the same proof carries over for completeness. Atomicity then follows.
V
i Di
replacing D ∧ D showing
References Barwise, Jon and Robin Cooper 1981 Generalized quantifiers in natural language. Linguistics and Philosophy 4: 159–219. Keenan, Edward L. 1993 Natural language, sortal reducibility and generalized quantifiers. Journal of Symbolic Logic 58: 314–325. 2004 Excursions in natural logic. In Claudia Casadio, Philip J. Scott, and Robert A.G. Seely, (eds.), Language and Grammar: Studies in Mathematical Linguistics and Natural Language. Stanford: CSLI. Keenan, Edward L. and Jonathan Stavi 1986 Semantic characterization of natural language determiners. Linguistics and Philosophy 9: 253–326. Lindström, Per 1966 First–order predicate logic with generalized quantifiers. Theoria 32: 186–195. Westerståhl, Dag 1989 Quantifiers in formal and natural languages. In Dov Gabbay and Franz Guenthner, (eds.), Handbook of Philosophical Logic, Vol IV. Dordrecht: Reidel.
On the logic of LGB type structures. Part I: Multidominance structures Marcus Kracht
Abstract The present paper is the first part of a sequence of papers devoted to the modal logics of structures that arise from Government and Binding theory. It has been shown in (Kracht 2001b) that they can be modeled by so-called multidominance structures (MDSs). The result we are going to prove here is that the dynamic logic of the MDSs is decidable in 2EXPTIME. Moreover, we shall indicate how the theory of Government and Binding as well as the Minimalist Program can be coded in dynamic logic. Some preliminary decidability results for GB are obtained, which will be extended in the sequel to this paper.
1.
Introduction
In recent years, the idea of model theoretic syntax has been getting more attention. One of the advantages of model theoretic syntax is that because it describes syntactic structures using a logical language fundamental theoretical questions can receive a precise formulation and can—hopefully—be answered. This idea can be found already in (Stabler 1992), where it was argued that questions of dependency among different modules of grammar, or independence questions for principles can be translated into logical questions. Stabler chose a translation into predicate logic, accompanied by an implementation in Prolog. Thus, the questions could be posed to a computer, which would then answer them. The problem with this procedure is twofold. Often the predicate logic of a class of structures is undecidable and so not all questions can effectively be answered (and it is impossible to know which ones). Second, even if the logic is decidable we need to know about its complexity so that we know how long we have to wait until we get an answer. Thus, the best possible result would be one where we had not only a decidability result but also a complexity result, preferably showing that complexity is low.
106 Marcus Kracht Rabin has shown that the (weak) monadic second order logic (MSO) of trees is decidable, a result that James Rogers (1998) has applied to syntactic theory. The main disadvantage of this approach is that it does not cover LGB type structures.1 The obvious step was to reduce the latter to the former. This is not always possible, but it led to a result (independently proved by James Rogers and myself) that if head movement is bounded then Minimality in the sense of Rizzi (1990) or Locality in the sense of Manzini (1992) come down to the theory that the language is strongly context free. However, nothing could be said about the case when head movement was unbounded because the reduction fails in this case. Now, Rogers remarks that adding free indexation makes the second order theory undecidable (it is no longer monadic), and so the monadic second order theory of LGB type structures might after all be undecidable. The good news however is that this does not seem to be the case. In this paper I shall show that the dynamic logic of a good many classes of structures is decidable. An application to non-context free languages will be given. Moreover, I shall describe how GB type structures as well as MP type structures can be described using dynamic logic. The sequel to this paper will generalise the result of this paper still further.2 It will emerge that many theories of generative grammar are effectively decidable. This is hopefully the beginning of a general decidability proof that covers the linguistically relevant structures. The applications of the present results are manifold. We are given a decision procedure to see whether certain principles of grammar are independent or not, and we are given a decision procedure to see whether or not a sentence is in the language. I have tried to include into the paper all essential definitions. Nevertheless, this paper is not easy to read without some background knowledge. In particular, I am relying on (Kracht 2001b) for a discussion of the relevance of the structures discussed below to syntactic structures known from generative grammar. However, making the material accessible to an ordinary linguistic audience would make this paper of book size length.3 2.
Multidominance structures
In generative grammar, structures are derived from deep structure trees. In (Kracht 2001b) I considered three kinds of structures: trace chain structures (TCSs), copy chain structures (CCSs) and multidominance structures (MDSs). TCSs are the kind of entities most popular in linguistics. When an element moves, it leaves behind a trace and forms a chain together
On the logic of LGB type structures 107
with its trace. The technical implementation is a little different, but the idea is very much the same. CCSs are different in that the moving element does not leave just a trace behind but a full copy of itself. This type of chain structures is more in line with recent developments (the Minimalist Program, henceforth MP), rather than with Government and Binding (= GB). MDSs, however, are different from both. In an MDS, there are no traces. Instead, movement to another position is represented by the addition of a link to that position. Thus, as soon as there is movement there are elements which have more than one mother. Moreover, it was shown in (Kracht 2001b) that MDSs contain exactly the same information as TCSs, since there is an algorithm that converts one into the other. MDSs, like TCSs, are based on an immediate dominance relation, written . (The converse of this relation is denoted by ≺.) In what is to follow, we assume that structures are downward binary branching. Every node has at most two daughters. To implement this we shall assume two relations, 0 and 1 each of which is a partial function, and = 0 ∪ 1 . We do not require the two relations to be disjoint. Recall the definition of the transitive closure R+ of a binary relation R ⊆ U × U over a set U. It is the least set S containing R such that if (x, y) ∈ S and (y, z) ∈ S then also (x, z) ∈ S . Recall that R is loop free if and only if R+ is irreflexive. Also, R∗ := {(x, x) | x ∈ U} ∪ R+ is the reflexive, transitive closure of R. Definition 1 A preMDS is a structure M, 0 , 1 , where the following holds (with =0 ∪ 1 ): (P1) If y 0 x and y 0 x then x = x . (P2) If y 1 x and y 1 x then x = x . (P3) If y 1 x then there is a z such that y 0 z. (P4) There is exactly one x such that for no y, y x (this element is called the root). (P5) ≺+ is irreflexive. (P6) The set M(x) := {y : x ≺ y} is linearly ordered by ≺+ . We call a pair x, y such that x ≺ y a link. We shall also write x; y to say that x, y is a link. An MDS is shown in Figure 1. The lines denote the immediate daughter links. For example, there is a link from a upward to c. Hence we have a ≺ c, or, equivalently, c a. We also have b ≺ a. We use the standard practice of making the order of the daughters implicit: the leftward link is to
108 Marcus Kracht h •
g •
@ @ @ @• • d e c• @ @ @ @•
f•
• a
b
Figure 1. An MDS
the daughter number 0. This means that a ≺0 c and b ≺1 c. Similarly, it is seen that b ≺1 d and b ≺1 h, while c ≺0 d and g ≺0 h. It follows that M(a) = {c}, while M(b) = {c, d, h}. A link x, y such that y is minimal in M(x) is called a root link. For example, b, c is a root link, since c ≺+ d and c ≺+ h. A link that is not a root link is called derived. A leaf is a node without daughters. For technical reasons we shall split ≺0 and ≺1 into two relations each. Put x ≺00 y iff (= if and only if) x ≺0 y and y is minimal in M(x); and put x ≺01 y iff x ≺0 y but y in not minimal in M(x). Alternatively, x ≺00 y if x ≺0 y and x, y is a root link. Let x ≺01 y iff x ≺0 y but but not x ≺00 y. Then by definition ≺00 ∩ ≺01 = ∅ and ≺0 = ≺00 ∪ ≺01 Similarly, we decompose ≺1 into ≺1 = ≺10 ∪ ≺11 where x ≺10 y iff x ≺1 y and y is minimal in M(x) (or, equivalently, x, y is a root link). And x ≺11 y iff x ≺1 and y is not minimal in M(x). We shall define ≺•0 := ≺00 ∪ ≺10 ≺•1 := ≺01 ∪ ≺11
On the logic of LGB type structures 109
We shall spell out the conditions on these four relations in place of just ≺0 and ≺1 . The structures we get are called PMDSs. Definition 2 An MDS is a structure M, 00 , 01 , 10 , 11 which, in addition to (P1) – (P6) of Definition 1 satisfies4 (P7) If y ∈ M(x) then x ≺•0 y iff x; y is a root link (iff y is the least element of M(x) with respect to ≺+ ). We assume that the leaves are linearly ordered in the following way. x y :⇔ (∃z)(∃u)(∃v)(x ≺∗•0 z ≺00 u 10 v ∗•0 y)
(1)
This is not the only possible ordering; this establishes in fact the order at Dstructure. This is enough for the present purposes, though. It is verified that a b e, for example. Write R ◦ S for the relation {x, z : there is y:x R y S z}. We can then restate (1) as follows. := ≺∗•0 ◦ ≺00 ◦ 10 ◦ ∗•0 Table 1 gives a synopsis of the various relations used in this paper. Definition 3 An ordered MDS (OMDS) is a MDS in which is transitive, irreflexive and linear on the leaves. Now, since ≺+•0 is a tree ordering, we can extend to an ordering between any two incomparable nodes (where x and y are incomparable if neither x ≺+•0 y nor y ≺+•0 x nor x = y). In fact, the extension is exactly as defined by (1). Details can be found, for example, in (Kracht 2003b). Notice that in an OMDS, ≺0 ∩ ≺1 = ∅. For suppose otherwise. Then for some x and y we have x ≺0 y and x ≺1 y and therefore z z for every leaf z ≤ x, by definition of . Contradiction. In presence of the ordering postulate, the conditions (P6) and (P7) can be replaced by the following The set M(x) := {y : x ≺ y} is linearly ordered by ≺+•0 . This is easy to see. First we prove a Lemma 1 Suppose that y ≺ y and that there is no x such that y ≺+ x ≺+ y . Then y ≺•0 y .
110 Marcus Kracht Symbol 1M R◦S R∪S R+ R∗ ≺00 ≺10 ≺01 ≺11 ≺0 ≺1 ≺•0 ≺+•0 ≺•1 ≺
Definition {x, x : x ∈ M} (∃y)(x R yS z) R ∪R ◦R ∪R ◦R ◦R··· 1 M ∪ R+ – – – – ≺00 ∪ ≺01 ≺10 ∪ ≺11 ≺00 ∪ ≺10 (≺00 ∪ ≺10 )+ ≺01 ∪ ≺11 ≺0 ∪ ≺1 ≺∗•0 ◦ ≺00 ◦ 10 ◦ ∗•0
Meaning diagonal concatenation union transitive closure reflexive and transitive closure left root daughter of right root daughter of left non-root daughter of right non-root daughter of left daughter of right daughter of root daughter of root descendant of non-root daughter of daughter of left of (at deep structure)
Table 1. Synopsis of Relations
The proof of the claim is in the fact that y ∈ M(y). If the link is derived it is not minimal, so there is a z such that y ≺•0 z ≺+ y . And conversely. Suppose now that x ≺ y. Then there is a chain y = y0 ≺ y1 ≺ y2 ≺ · · · ≺ yn = y . The longest such chain contains only nonderived links, by Lemma 1. This means that x ≺+•0 y. Now, ≺+•0 is a tree ordering so that if y ∈ M(x), then x ≺+•0 y as well, and so either y = y or y ≺+•0 y or y ≺+•0 y, as promised. Proposition 1 Let M be a MDS. M is an OMDS iff the following holds: if x is not the root, ≺10 is defined iff ≺00 is undefined on x. We shall prove the theorem and exhibit some useful techniques. We code the elements of M by sequences in the following way. Let I be a chain {xi : i < n + 1} such that x0 is the root, and xi •0 xi+1 for every i < n. (So we are going down.) We call I a standard identifier for x and denote it by I(x). n is called the standard depth of xn and we write sd(xn ) to denote it. Lemma 2 In an OMDS, every x has exactly one standard identifier. Hence, the standard depth of x is uniquely defined. (See also (Kracht 2001b) on the notion of an identifier.) Let us see why the standard identifier is unique.
On the logic of LGB type structures 111
We translate the identifier into a binary sequence b0 b1 · · · bn defined by ⎧ ⎪ ⎪ ⎨0 if xi 00 xi+1 , (2) bi = ⎪ ⎪ ⎩1 if xi 10 xi+1 . In this way, we associate a binary sequence with each node. Now recall that (1) defines a linear ordering on the leaves. This means that the number associated to x via (2) is unique. For if not, there are two distinct sequences, b0 b1 · · · bn and c0 c1 · · · cm for xn . Let j be the least index such that b j c j , say b j = 0 and c j = 1. Then, by (1), if z ≤ xn is a leaf, z z. Contradiction. Now, let x be given. It has a sequence b0 b1 · · · bn associated with it. Let y •0 x. Then y is defined by b0 b1 · · · bn−1 , which is unique. So, ≺•0 is a partial function. Conversely, if ≺•1 is a partial function, then the translation into binary sequences is unique. Now define for sequences by b0 b1 · · · bn and c0 c1 · · · cm iff for the first j such that b j c j , b j = 0 < c j = 1. This is exactly the order (1), spelled out for the representing sequences. This order is loop free, transitive and linear on the maximal sequences (which correspond to the leaves). We add that b0 b0 · · · bm is immediately to the left of c0 c1 · · · cn if b0 b0 · · · bm = b0 b1 · · · b j−1 01 · · · 1, c0 c1 · · · cn = b0 b1 · · · b j−1 10 · · · 0 (The lengths of these sequences need not be equal.) I should emphasise that the identifiers do not necessarily form a tree domain. Recall that a tree domain T is a subset of N∗ such that the following holds: (a) if xi ∈ T then x ∈ T , and (b) if x j ∈ T and i < j then also xi ∈ T . Property (a) holds but (b) does not hold in general. For suppose that x 01 y and x 10 z. Then I(z) = I(x)1. However since the link y; x is derived there is no standard identifier of the form I(x)0. The identifier I(y) contains I(z) = I(x)1 as a prefix. 3.
Dynamic logic
The language of propositional dynamic logic (PDL) is defined as follows. Given any set Π0 of so-called basic programs,5 a set Γ of propositional constants, and V of variables, the sets of formulae and programs are the least sets satisfying: – If a ∈ Γ is a propositional constant, a is a formula.
112 Marcus Kracht – If p ∈ V is a propositional variable, p is a formula. – If χ, χ are formulae, so are ¬χ and χ ∧ χ . – If α ∈ Π0 is a basic program, α is a program. – If α, α are programs, so are α; α and α ∪ α; and α∗ . – If χ is a formula, χ? is a program. – If α is a program and χ a formula, α χ is a formula. We put χ ∨ χ := ¬(¬χ ∧ ¬χ ) and [α]χ := ¬α ¬χ, and similarly for other boolean connectives. The minimal logic, denoted by PDL, is the least set of formulae with the following properties: 1. All propositional tautologies are in PDL. 2. [α](χ → χ ) → ([α]χ → [α]χ ) ∈ PDL. 3. χ? χ ↔ (χ ∧ χ ) ∈ PDL. 4. α ∪ α χ ↔ α χ ∨ α χ ∈ PDL. 5. α; α χ ↔ α α χ ∈ PDL. 6. χ ∧ [α∗ ](χ → [α]χ) → [α∗ ]χ ∈ PDL. 7. If χ ∈ PDL then [α]χ ∈ PDL. 8. If χ → χ ∈ PDL and χ ∈ PDL then χ ∈ PDL. 9. If χ ∈ PDL, then s(χ) ∈ PDL for every substitution s. Here, a substitution is defined to be a function s that assigns a formula s(p) to every variable p. The formula s(χ) is obtained by replacing every occurrence of a variable p by s(p), for every variable p. A dynamic logic is a set L ⊇ PDL which has the properties (7) – (9). Let χ be a formula and L a dynamic logic; then L ⊕ χ denotes the least dynamic logic containing L and χ. Similarly with a set Δ in place of χ. Model structures are of the form F = W,C, R , where W is a set (the set of worlds or points), C : Γ → ℘(W) a function assigning each constant a set of worlds, and R : Π0 → ℘(W × W) a function assigning each basic program a binary relation on W. A valuation is a function β : V → ℘(W). Based on this
On the logic of LGB type structures 113
we define the interpretation of complex programs as relations in the following way. R(α ∪ α ) := R(α) ∪ R(α ) R(α; α ) := R(α) ◦ R(α ) R(α∗ ) := R(α)∗ R(χ?) := {w, w : F, β, w χ} The truth of a formula at a world is defined thus. F, β, w ¬χ :⇔ F, β, w χ F, β, w χ ∧ χ :⇔ F, β, w χ; χ
F, β, w α χ :⇔ there is u: w R(α) u and F, β, u χ We write F ϕ if for all valuations β and all worlds w: F, β, w ϕ. The logic of a class K of structures is Th(K) := {ϕ : for all F ∈ K: F ϕ} It has been shown that PDL is the logic of all structures and that it is also the logic of the finite structures. From this follows the decidability of PDL. However, more is known. Theorem 1 PDL is EXPTIME-complete. This means that there are constants c and b and a polynomial p(x) such that for every formula ϕ of length n > c the time needed to solve the problem whether or not ϕ ∈ PDL takes b p(n) time. (Additionally, any problem of this complexity can be coded as such a problem in polynomial time.) 4.
Grammars as logics
In context free grammars one distinguishes the terminal alphabet from the rules. A similar distinction is made here as well. Nodes that have no daughters are called terminal. The lexicon is a set of declarations which state what labels a terminal node may have. This is typically done by introducing a finite set of constants and the statement that all and only those nodes may be terminal at which one of the constants is true. Since the constants are part of the language the lexicon is effectively identified with a specific nonmodal formula. In fact, we are more generous here and assume that the lexicon is a constant formula λ, which may involve modal operators. This is useful when
114 Marcus Kracht we want to assume that the lexicon also contains complex items, as is often assumed in generative grammar. The grammar is a (finite) set of formulae expressed in the above language. While the grammar is believed to be the same for all languages, the lexicon is subject to variation. The logic DPDL (“deterministic PDL”) is obtained from PDL by adding the formulae α χ → [α]χ for every formula χ and basic program α. (Nonbasic programs will not necessarily satisfy this postulate even if the basic ones do.) A frame is a frame for DPDL iff for every basic program α: if x R(α) y and x R(α) y then y = y . (Recall that α is called deterministic if it has that property, and this is the reason the logic is called DPDL.) Furthermore, the logic of finite deterministic computations is obtained by adding the formula [α+ ]([α+ ]p → p) → [α+ ]p where α is the union of all basic programs (hence this definition requires that Π0 is finite). If we want to mention the number n of programs, we write DPDLn .f. The following is proved in (Kracht 1999) (finite model property and decidability) and (Vardi and Wolper 1986) (EXPTIME-completeness). Theorem 2 For every n, DPDLn .f is the logic of all finite structures with n basic programs, where the basic programs are deterministic and their union is loop free. DPDLn .f is decidable, it is EXPTIME-complete and complete with respect to finite trees. Theorem 3 For every n, the PDL-logic of n-branching trees has the finite model property and is decidable. Many of the basic results can also be obtained by using a translation of dynamic logic into monadic second-order logic (MSO). The disadvantage of using MSO is that the complexity of the logic is for the most part nonelementary (in the sense of recursion theory), while PDL is elementary (it is EXPTIME complete). Second, the main result that we shall establish here, the decidability of the dynamic logic of multidominance structures, cannot be derived in this way, as far as we can see. For this reason we shall use dynamic logic. 5.
The logic of multidominance structures
Let us agree on the following notation. For each of the relations i j we introduce a program i j , which is interpreted by a relation that we write i j or
On the logic of LGB type structures 115
i j rather than R(i j ). Structures are of the form M, 00 , 01 , 10 , 11 . We use 0 in place of 00 ∪ 01 , 1 for 10 ∪ 11 and for 0 ∪ 1 . The programs 0 and 1 are interpreted as partial functions. Also, the notation •0 := 00 ∪ 10 and •1 := 01 ∪ 11 is frequently used. Finally, let us write u := ∗ (u stands for “universal”.) A structure is called generated if there is a single element w such that the least set containing w which is closed under taking successors along all basic relations is the entire set of worlds. (In our case this is exactly true if the structure is a constituent.) The following is easy to see. Lemma 3 Let M be a generated PDLn -structure with root x. Then we have M, β, x [u]ϕ iff for all w: M, β, w ϕ. Our first goal is to axiomatise the logic of all MDSs. There is an important tool that we shall use over and over. A formula is constant if it contains no variables. Theorem 4 Suppose that L is a logic containing PDLn which has the finite model property, and let χ be a constant formula. Then the logic L ⊕ χ also has the finite model property. Proof. Suppose that ϕ is consistent with L ⊕ χ. Then ϕ; [u]χ also is L ⊕ χconsistent, and a fortiori L-consistent. Thus it has a finite model F, R , β, x . u We may actually assume that for every y, x → y. Then y χ, and so the frame is a frame for L ⊕ χ, since χ is constant. 2 This theorem has interesting consequences worth pointing out. It allows us to focus on the grammar rather than the lexicon. This reduces the problem to some degree. Definition 4 Let PM :=
DPDL4 .f
⊕ 1 → 0 ⊕ 00 → [01 ]⊥ ⊕ 10 → [11 ]⊥ ⊕ •1 p → +•0 ; p
(3) (4) (5) (6) (7)
116 Marcus Kracht From (4) we get that each node with a right hand daughter also has a left hand daughter. The axioms (5) makes sure that one cannot have both a left hand derived and a left hand nonderived daughter; (6) is the same for right hand daughters. The postulates are constant and can be added without sacrificing decidability, by Theorem 4. It follows easily that since ≺00 and ≺01 are both functional, so is their union ≺0 ; and likewise for ≺1 . Postulate (7) ensures that the structures are trees at deep structure. That means that •0 is a tree order. This is because if z ≺•1 y then there is a path along nonderived links to y, as we shall show. Lemma 4 Suppose F satisfies the following property. Link. For all w and u: if w •1 u then there is a y such that w +•0 y and y u. Then F satisfies (7). Proof. Choose a valuation β and a point w such that F, β, w •1 p So there is a u ≺•1 w such that u p. By assumption on F, there is a y such that w +•0 y and y u. ¿From the second we get y p, and from the first F, β, y +•0 p This shows the claim. 2 Using this we prove that the axioms of PM are valid in all MDSs. This is Lemma 5 below. This is one half of the characterisation, Theorem 5, which asserts that a finite structure satisfies the axioms of PM exactly it is actually a MDS. The other half is constituted by Lemma 8. Lemma 5 Finite MDSs are PM-structures. Proof. It is clear that MDSs satisfy the postulates (4), (5) and (6). We need to show that they satisfy (7). To see this, we shall verify that they satisfy the property Link from Lemma 4. To this end, take a MDS M, 00 , 01 , 10 , 11 . Suppose that x •1 y. Then x ∈ M(y), and there is, by assumption, an element u ∈ M(y) such that u ≺+ x. (Notice that by (P7) of Definition 2, x cannot be the least element in M(y) with respect to ≺+ since the link x, y is derived.) Choose a path Π0 = u; · · · ; x. If this path contains only root links we are done. Otherwise, let the path contain v; v , a derived link. Then there is a path
On the logic of LGB type structures 117
Δ = v; · · · ; w; v such that w ≺•0 v , by a similar argument. Replace the pair v; v in Π0 by Δ. This gives a path which is longer than Π0 . Thus, as long as we have derived links we can replace them, increasing the length of the path. However, ≺ is loop free and the structure finite. Hence, the procedure must end. It delivers a path without derived links, as promised. 2 In connection with the following lemma, we say that R(α) satisfies the fixed point property if for all formulae ϕ, frames F, valuations β and points x: F, β, x α∗ ϕ ↔ ϕ ∨ α; α∗ ϕ Lemma 6 Let F, R be a finite frame, β a valuation, and R(α) be loop free. Then for all x and ϕ: F, β, x α∗ ϕ ↔ ϕ ∨ α; α∗ ϕ Proof. In PDL, ϕ → α∗ ϕ and α; α∗ ϕ → α∗ ϕ are generally valid. Hence we only have to establish F, β, x α∗ ϕ → ϕ ∨ α; α∗ ϕ α
α
α
By assumption on R(α), for every x there is a sequence x = x0 → x1 → x2 · · · → xn where xn has no R(α)-successor. We proceed by induction on maximum length of such a chain starting at x. Call this the height of x. If the height is 0, x has no R(α)-successors. Then α; α∗ ϕ is false, and so the claim reduces to F, β, x α∗ ϕ → ϕ which is correct. Now let x be of height n + 1 and the claim proved for all points of height ≤ n. Suppose α∗ ϕ is true at x. Then there is a chain of α α α length ≤ n + 1: x = x0 → x1 → x2 · · · → xk , and ϕ is true at xk . Two cases arise. k = 0, in which case x ϕ and we are done. Or k > 0. Then, by inductive hypothesis, since x1 has height ≤ n, F, β, x1 α∗ ϕ and so we 2 have x α; α∗ ϕ, as promised. Say that a program α is progressive in a logic L if R(α) is loop free in every structure for L. In that case we say that a node x has α-height n if there α α α is no sequence x → x1 → x2 · · · → xn+1 . If x has α-height 0 it means that it has no α-successors. The important fact to note is that we can restrict ourselves in the present context to progressive programs, and these are the programs which have the fixed point equation property. We say that α is contained in β, in symbols α ⊆ β, if L α p → β p. If L has the finite model property this is equivalent to R(α) ⊆ R(β) in every finite L-structure. If L ⊇ L and α ⊆ β in
118 Marcus Kracht L, then this holds also in L . α and β are equivalent in L if α ⊆ β as well as β ⊆ α in L. If α is progressive then so are αn (n > 0) and α+ . The following theorem rests on the fact that the logic of finite computations has a maximal progressive program. Lemma 7 In PDLn .f every program is equivalent to a program of the form ϕ?, α, or ϕ? ∪ α, where α is progressive. Proof. Notice that α is equivalent to ⊥? ∪ α, so we do not need a separate case for progressive programs. Let ζi , i < n, be the basic modalities. Put γ := (ζ0 ∪ ζ1 ∪ · · · ∪ ζn−1 )+ In PDLn .f, γ is progressive. Then γ; γ as well γ+ are likewise progressive. Every η that is contained in a progressive program is also progressive. What we shall show is that every program η that is not a test can be written as ϕ?∪α where α is contained in γ. Before we start, notice that if χ is a test and α ⊆ γ then χ?; α ⊆ α ⊆ γ and likewise α; χ? ⊆ α ⊆ γ. We note that ϕ?; χ? is equivalent to (ϕ ∧ χ)? and that ϕ? ∪ χ? is equivalent to (ϕ ∨ χ)?. Finally, (ϕ?)∗ is equivalent to ?, so that the program operators reduce on tests to a single test. Now, suppose that η1 = ϕ1 ? ∪ α1 and η2 = ϕ2 ? ∪ α2 with α1 , α2 contained in γ. Then η1 ∪ η2 = (ϕ1 ? ∪ α1 ) ∪ (ϕ2 ? ∪ α2 ) = (ϕ1 ∨ ϕ2 )? ∪ (α1 ∪ α2 ) is of the desired form. η1 ; η2 = (ϕ1 ? ∪ α1 ); (ϕ2 ? ∪ α2 ) = (ϕ1 ?; ϕ2 ?) ∪ (ϕ1 ?; α2 ) ∪ (α1 ; ϕ2 ?) ∪ (α1 ; α2 ) ⊆ (ϕ1 ∧ ϕ2 )? ∪ (ϕ1 ?; α1 ∪ α2 ; ϕ2 ? ∪ α1 ; α2 ) which is again of the desired form. Finally, let η = ϕ? ∪ α. We observe that η ⊆ ? ∪ α. Furthermore, since star is monotone, η∗ ⊆ (? ∪ α)∗ = ? ∪ α+ . 2 Now, α ⊆ γ, and so α+ ⊆ γ+ ⊆ γ, since γ is transitive. Definition 5 The Fisher Ladner closure FL(ϕ) of a formula ϕ is the smallest set containing ϕ such that the following is satisfied. 1. If χ ∧ ψ ∈ FL(ϕ) then χ, ψ ∈ FL(ϕ). 2. If ¬χ ∈ FL(ϕ) then χ ∈ FL(ϕ).
On the logic of LGB type structures 119
3. If α ∪ β χ ∈ FL(ϕ) then α χ, β χ ∈ FL(ϕ). 4. If α; β χ ∈ FL(ϕ) then α β χ ∈ FL(ϕ). 5. If α∗ χ ∈ FL(ϕ) then χ, α α∗ χ ∈ FL(ϕ). 6. If ψ? χ ∈ FL(ϕ) then ψ, χ ∈ FL(ϕ). 7. If α χ ∈ FL(ϕ), α basic then χ ∈ FL(ϕ). We remark that | FL(ϕ)| is linear in the length of ϕ. This is shown by induction on ϕ. This means that complexity can be measured either in terms of the size of the formula or in terms of the size of FL(ϕ). Now let At(ϕ) be the set of all conjunctions of formulae (or their negations) from the Fisher Ladner closure of ϕ. (This set has a size exponential in the size of ϕ, which induces a rise in complexity for the logic of MDSs in Theorem 5 from EXPTIME to 2EXPTIME.) Set X(ϕ) :=
{•1 δ → +•0 ; δ : δ ∈ At(ϕ)} ∪ {1 → 0 , 00 → [01 ]⊥, 10 → [11 ]⊥}
Lemma 8 ϕ is consistent with PM iff ϕ; [u]X(ϕ) is consistent with DPDL4 .f. Proof. (⇒). If ϕ; [u]X(ϕ) is inconsistent in DPDLn .f, ¬ϕ can be proved from [u]X(ϕ) in DPDL4 .f. However, [u]X(ϕ) can be proved in PM. Hence ¬ϕ is provable in PM. (⇐). Now let us suppose that ϕ; [u]X(ϕ) is DPDL4 .f-consistent. Then by Theorem 2 it has a finite model based on a frame M = M, 00 , 01 , 10 , 11 with root w0 and valuation β. So, M, β, w0 ϕ; [u]X(ϕ) Notice that the frame satisfies the formulae (4), (5) and (6). Hence we may assume that the relation ≺•0 induces a tree ordering on the set of worlds, though with multiple roots (thus we have what is known as a forest). We shall construct a finite PM-model from this. Let S be the closure of w0 under the relation •0 , that is, S is the least set which contains w0 and is closed under •0 . Members of S are called standard points. Let E := {w : there is v ∈ S such that w ≺•1 v}
120 Marcus Kracht For a point w, let a(w) be the unique δ ∈ At(ϕ) such that M, β, w δ Now choose a w ∈ E. Let v be a standard world such that w ≺•1 v. By choice of X(ϕ), M, β, w0 [u](•1 a(w) → +•0 ; a(w)) where w0 is the root. Hence M, β, v •1 a(w) → +•0 ; a(w) Since a(w) is true at w and since w ≺•1 v, we have M, β, v +•0 ; a(w) Hence there is a standard u ≺+•0 v and u∗ ≺ u such that a(u∗ ) = a(w). By definition of E, u∗ is either standard, or in E. For each w, pick such a point and say that it is linked from w and write w L u∗ . Thus, L is a function from E to E ∪ S . We note the following. w L u∗ does not mean that u∗ is standard. However, already u has greater standard depth as w, and if u∗ S then u∗ ∈ E and so u∗ can in turn be linked to some node. It follows that for every w ∈ E there is a standard v such that w L+ v. For suppose not. Then there is a w ∈ E of maximal depth which cannot be linked to a standard point. But it can be linked to a point in E. The latter has greater depth. Contradiction. Now we define a new frame S as follows. The set of points is S . Put x 00 y iff x ≺00 y, x 10 y iff x ≺10 y; put x 01 y iff there is a u such that u ≺01 y and u L+ x; x 11 y iff there is a u such that u ≺11 y and u L+ x. Finally, S := S , 00 , 01 , 10 , 11 The valuation β is defined by β (p) := β(p) ∩ S . (If constants are present, the value of a constant c in S is the value of c intersected with S .) We shall prove for every w ∈ S and every χ ∈ FL(ϕ): S, β , w χ
⇔
M, β, w χ
(8)
The basic clause is (Case 1.) χ = p, a variable (or constant). Then S, β , w p iff w ∈ β (p) iff w ∈ β(p) iff M, β, w p, by definition of β . (Case 2.) χ = ¬ϑ. S, β , w ¬ϑ iff S, β , w ϑ iff M, β, w ϑ iff M, β, w ¬ϑ
On the logic of LGB type structures 121
(Case 3.) χ = ϑ ∧ ϑ . S, β , w ϑ ∧ ϑ iff S, β , w ϑ; ϑ
iff M, β, w ϑ; ϑ
iff M, β, w ϑ ∧ ϑ
Now let χ = α ϑ. The claim will be proved by induction on the syntactic complexity of α. (Case 4.) α = α ∪ α
. S, β , w α ∪ α
ϑ iff S, β , w α ϑ ∨ α
ϑ iff M, β, w α ϑ ∨ α
ϑ iff M, β, w α ∪ α
ϑ (Case 5.) α = α ; α
. S, β , w α ; α
ϑ iff S, β , w α α
ϑ iff M, β, w α α
ϑ iff M, β, w α ; α
ϑ We use (i) the fact that α is syntactically less complex than α ; α
and (ii) the inductive hypothesis for α
ϑ. (Case 6.) α = ψ?. S, β , w ψ? ϑ iff S, β , w ψ; ϑ iff M, β, w ψ; ϑ iff M, β, w ψ? ϑ Using the inductive assumptions on ψ and ϑ. (Case 7.) α = α ∗ . Now, in virtue of Lemma 7 we may assume that α is progressive, so α ∗ χ ↔ χ ∨ α α ∗ χ is a theorem of PDL. Further, α is of lesser complexity than α ∗ . S, β , w α ∗ ϑ iff S, β , w ϑ ∨ α α ∗ ϑ iff M, β, w ϑ ∨ α α ∗ ϑ iff M, β, w α ∗ ϑ
122 Marcus Kracht (Case 8.) α = 00 . Then the claim follows since ≺00 = 00 . (Case 9.) α = 10 . Likewise. (Case 10.) α = 01 . We show first (⇒) in (8). S, β , w 01 ϑ implies that there is a v 01 w such that S, β , v ϑ. v is standard, and by induction hypothesis, M, β, v ϑ. By construction, w 01 u for a u ∈ E such that u L+ v. This means that a(u) = a(v) and so M, β, u ϑ; hence M, β, w 01 ϑ. Now we show (⇐) in (8). Assume M, β, v 01 ϑ and v ∈ S . Then there is a w ∈ E such that w ≺01 v and M, β, w ϑ. By construction there is a standard u such that w L+ u, and so M, β, u ϑ, since a(u) = a(w). By inductive hypothesis, S, β, u ϑ. Again by construction, v 01 u, so S, β, v 01 ϑ. (Case 11.) α = 11 . Similar. The next step is to verify that S is a PM-frame. To that effect we have to ensure that the union of the basic programs is deterministic and loop free and that the structure satisfies (7). First, let w ∈ S . Recall the definition of the standard depth. It is easy to see that the standard depth of points is the same in both structures. Now suppose that w u. We claim that sd(w) > sd(u). (Case 1.) w •0 u. Then w ≺00 u or w ≺10 u, and by definition of standard depth, sd(w) = 1 + sd(u). (Case 2.) w 01 u or w 11 u. In this case there is a y such that w 01 y or w 11 y such that y L+ u and w + u for some standard u . This means that sd(u) ≥ 2 + sd(w). Next, to show that the programs are deterministic, observe that the original programs were deterministic, and each link was replaced by just one link. Finally, from Lemma 4 it follows that the constructed structure satisfies PM. Now, from (8) it follows that S, β, w0 ϕ This shows the claim.
2
Theorem 5 The logic of MDSs is PM. Moreover, this logic has the finite model property, is finitely axiomatisable and therefore decidable. Its complexity is 2EXPTIME. The complexity bound follows from the fact that the formula to be satisfied has length O(2n ), and that DPDL4 .f is in EXPTIME. 6.
Single movement MDSs
There is an important class of MDSs, those where M(x) has at most two elements. This means in practice that each element is allowed to move only
On the logic of LGB type structures 123
once. This class of structures is very important, since the now current Minimalist Program requires each movement step to be licensed. These structures are the topic of a sequel to this paper. Here we are interested only in the axiomatisation of these structures. We have noted earlier that root links are always the lowest links. Therefore, for every node x there is at most one y such that x ≺•0 y. On the other hand there can be any number of non-root links. The narrowness determines the maximum number of non-root links. ν(p) := (p → [+ ]¬p) ∧ ¬(00 ; ∗ p ∧ 10 ; ∗ p) Lemma 9 Let β be a valuation such that F, β ν(p). Then |β(p)| ≤ 1. Proof. Suppose that x, y ∈ β(p). Then x ≺+•0 y cannot hold; for then y p but y [+ ]¬p. Likewise y ≺+•0 x cannot hold. If however x and y are incomparable there are points u, v and v such that v v and x ≺+ v ≺ u as well as y ≺+ v ≺ u. Then however u 00 ; + p; 10 ; + p. Contradiction. 2 Definition 6 An MDS is called n-narrow if |M(x)| ≤ n + 1 for all x. An MDS is called narrow if it is 1-narrow. Set ξ(p) :=[u]ν(p) → [u](•1 p → [•0 ; ∗ ]( p → •0 p)) Lemma 10 An MDS satisfies ξ(p) iff it is narrow. Proof. Suppose the MDS is not narrow. Then there is a y and z, z ∈ M(y) such that z ≺+ z and both links y; z and y; z are not root links. Then put β(p) := {y}. Then throughout the MDS, p → [+ ]¬p holds. Also, there is no point u such that u 00 v, u 10 v and y ≺∗ v as well as y ≺∗ v . It follows that z •1 p; ¬ p and z •1 p. However, z R(•0 ; ∗ ) z. So the formula is false under this valuation. Now assume that the MDS is narrow. Take a valuation such that ν(p) everywhere. By the preceding lemma, either β(p) is empty or β(p) = {u} for some u. In the first case no node satisfies •1 p, so the second part of ξ(p) is true. Now assume β(p) = {u} and let y be a node such that y •1 p. Then say u ≺•1 y. We have to show y [•0 ; ∗ ]( p → •0 p) To this end let z and z be such that z ≤ z ≺•0 y and z p. Then z u. 2 Since the structure is narrow, u ≺•0 z, showing z •0 p.
124 Marcus Kracht 7.
Extending and reducing the language
The fact that we are dealing with cycle free structures has a great effect on the expressivity of the language; basically, using implicit definitions all program constructors of PDL can be eliminated; conversely, many seemingly more powerful constructs can be effectively mimicked. We consider here two devices: nominals (see (Blackburn 1993)) and the converse. A nominal is a variable that can be true only at a single world. It emerges from the discussion above that nominals actually do not add any expressive strength to our language. Consider a formula ϕ(i) which contains a nominal i. Now consider the formula ν(p) ∧ + p → ϕ[p/i] This formula has a model F, β, x only if β(p) is a singleton. The consequence of this is the following Theorem 6 For every first-order universal formula ζ using atomic formulae of the form x R(α) y or x = y there is a modal formula ϕ such that for any MDS, F ζ iff F ϕ. Proof. Let ζ = (∀x0 x1 · · · xn−1 )α. Introduce nominals i0 , i1 , · · · , in−1 and define the following translation: (x p = xq )† := ∗ (i p ∧ iq ) (x p R(α) xq )† := ∗ (i p ∧ α iq ) (¬α)† := ¬α† (α ∧ α )† := α† ∧ α † It is not hard to see that F, β, x ¬α† iff F ζ. The sought after formula is ν(p0 ) ∧ ν(p1 ) ∧ · · · ∧ ν(pn−1 ) → α† [pk /ik : i < n] This completes the proof. 2 Also, let me recall a few other reductions that we have achieved. The following equivalences hold: α ∪ α p ↔ α p ∨ α p α; α p ↔ α α p ϕ? χ ↔ ϕ ∧ χ
On the logic of LGB type structures 125
This means that the program constructors union, concatenation and test are eliminable if they occur as outermost program constructors. However, we have also shown that every program is a union of a test and a progressive program and that for progressive programs the following holds in finite structures: α∗ p ↔ p ∨ α α∗ p This allows to eliminate the star as follows: Lemma 11 Let α be progressive and δ a formula not containing q. Then F, β q ↔ α∗ δ ⇔ F, β q ↔ δ ∨ α δ Proof. Directly from Lemma 6.
2
Lemma 12 Let α be progressive in F and χ and δ formulae such that δ does not contain q. Then F χ[α∗ δ/q] ⇔ F [u](q ↔ δ ∨ α q) → χ Proof. Using the previous lemma. (⇒) Suppose F χ[α∗ δ/q]. Pick β and x. Suppose F, β, x [u](q ↔ δ ∨ α q). We have to show that F, β, x χ. Now, F, β q ↔ δ∨α∗ δ. Then F, β q ↔ α∗ δ (by the previous lemma), and so we can interchange q by α∗ δ, giving us F, β, x χ. (⇐) Choose any valuation β. Suppose that F [u](q ↔ δ ∨ α δ) → χ, and choose β such that β (p) = β(p) for all p q and β (q) = {y : F, β, y α∗ δ}. (For this to be noncircular we need that δ does not contain q.) Now F, β , x [u](q ↔ δ ∨ α δ) by the previous lemma, and so we get F, β χ. By definition of β this is F, β χ[α∗ δ/q]. β was arbitrary, giving us F χ[α∗ δ/q]. 2 Notice that q does not need to occur in χ. We may strengthen our language further by adding an operator on programs, the converse (see (Giacomo 1996)). This will allow to talk about going up the tree. This makes the statement of some restrictions easier. We shall show for a large enough portion of the newly added formulae that they do not add expressive power, they just make life easier. The good news about them is that they can be added without having to redo the proofs. Recall that for a binary relation R, R := {y, x : x R y} The language PDL extends PDL by a unary operator , and we require that R(α) = R(α)
126 Marcus Kracht PDL is axiomatised over PDL for all programs plus for every program α:
p → [α]α p,
p → [α]α p
(9)
It turns out that it is enough to just add the converse for every elementary program, for we have (R ∪ S ) := R ∪ S (R ◦ S ) := S ◦ R (R∗ ) := (R)∗ Also, notice that
R((ϕ?)) = R(ϕ?)
Thus, rather than looking at PDL 4 (four basic programs and a converse operator) we may look at PDL8 (eight basic programs, no converse), where the postulates (9) have been added just for the basic programs. We shall not take that route, though, since it produces needless complications. Rather, we shall make the following observation. Lemma 13 Let F = F, R be a frame, x ∈ F a world, and 2, , and modalities such that R() = R(2) is a partial function and an operator such that x R() y for all y. Then for any two formulas χ and δ, δ not containing q, and any valuation β: F, x ((⊥ → q) ∧ (♦ → (δ → 2q) ∧ (¬δ → 2¬q))) → χ
(10)
iff F, x χ[δ/q]
(11)
Proof. Pick a valuation β. We claim that F, β, x ((⊥ → q) ∧ (♦ → (δ → 2q) ∧ (¬δ → 2¬q)))
(12)
iff β(q) = {u : u δ}. This establishes the claim as follows. Assume (10) holds. Pick β and choose β such that (12) holds with β in place of β. This is exactly the case if β (q) = {u : F, β, u δ}. Now we have both F, β , x χ (by (10)) and F, β , x q ↔ δ. Thus we have F, β , x χ[δ/q]. For this we get F, β, x χ[δ/q], since q does not occur in this formula. Conversely, suppose (11) holds. Choose β. (Case 1) β(q) = {u : F, β, u δ}. Then F, β χ, and so (12) holds. Also, (10) holds. (Case 2) β(q) {u : F, β, u δ}. Then (12) does not hold, so (10) holds as well.
On the logic of LGB type structures 127
Now (12) is equivalent to F, β (⊥ → q); ♦ → (δ → 2q) ∧ (¬δ → 2¬q) Pick z. We have to show that z q iff z δ. Two cases arise. (Case 1.) z has no R()-successor. Then ⊥ is true at z and so is both q and δ. (Case 2.) z has a R()-successor. Then this successor is unique by assumption. Call it y. By assumption we have y R(2) z. Furthermore, as x R() y, we have y δ → 2q as well as y ¬δ → 2¬q. Suppose z δ. Then y δ, from which y 2q, and so z q. If z ¬ δ then y ¬δ, by functionality of R(). Hence y 2¬q and so z ¬q. 2 This lemma can be used to introduce converses for the programs 00 and 10 , since they are backwards deterministic. This seemingly allows for the reduction of any program to a forward looking program. However, recall that the elimination of star used the fact that every program is basically progressive. With converses added this is no longer the case. So, star is eliminable only if the program either contains only downward looking modalities or only upward looking modalities. Tests belong to either class (can be included as only downward looking in the first case, or as only upward looking in the second). Call such a formula a finite turn formula. Theorem 7 Suppose a class of constituents is axiomatisable with some finite turn formulae using the operators 00 , 10 and •0 in addition to i j . Then it can be axiomatised without the use of 00 and 10 . This can be used in the following way. We have said earlier that the MDSs are not necessarily ordered in the standard sense. To enforce this we need to add another postulate. The linear order from (1) is modally definable by α := ∗•0 ; •0 ; •1 ; ∗•0 In the definition we have made use of upward looking programs. It is straightforward to verify that xy
⇔
x, y ∈ R(α)
This would ordinarily involve adding the converse operator. We have seen, however, that there is a way to consider the converse operators as abbreviations. Thus we may define the following.
128 Marcus Kracht Definition 7 Let OL :=
PM
⊕ 00 → [ 10 ]⊥ ⊕ 10 → [ 00 ]⊥ ⊕ 00 p → [ 00 ]p ⊕ 10 p → [ 10 ]p Using Theorem 4 we see that Theorem 8 OL is decidable in 2EXPTIME.
8.
Nearness
The above results are encouraging; unfortunately, they are not exactly what we need. There typically is a restriction on the distance that an element can move in a single step. We take as our prime example the subjacency definition in (Chomsky 1986). As I have argued in (Kracht 1998), perhaps the best definition is this. The antecedent of a trace can be found within the next CP which contains the next IP properly containing the trace. This definition uses that concatenation of the two command relations of IP-command and CP-command. One is tempted to cash this out as the following axiom. •1 p → •0 (¬CP?; ))∗ ; (¬IP; )∗ p
(13)
Here, CP, IP are constants, denoting phrasal nodes of category CP and IP. This formula says that for every node x, if there is a derived downward link from x to some y, then there is a path to y following first a nonderived link, then following down non-CP nodes and finally non-IP nodes. Unfortunately, matters are not that easy. The program (¬ϕ; )∗ can be transcribed as “while ¬ϕ go one step down”. This is a nondeterministic program, capturing the relation x, y where there is no node ϕ on the path from y to x. (ϕ may hold at y, but not at x.) However, this gives the wrong results (cf. (Kracht 2001a)). Consider a VP and an NP that scrambles out of it. Consider a movement of the VP that passes the NP, whereupon the NP moves to pass the VP again. NP1 [· · · t1 · · · ]VP2 • t1 t2
(14)
On the logic of LGB type structures 129
Then the formula above maybe true even if there was a step of the NP that crossed a barrier at •. I do not know of a natural example of this kind, but the formalisation should work even if none existed. Furthermore, the problem is with the NP movement, so it cannot be dismissed on the ground that the VP has crossed a barrier. Interestingly, the latter objection can easily be eliminated; for we can assume that the VP has moved into spec of CP before leaving the barrier. And in that case it has blocked the chances of the NP to do the same. So, why does (14) pose a problem with (13)? Let us display some more constituents: (15) X [NP1 Y [[· · · t1 · · · ]VP2 • t1 [t2 ]Z ]Y ]X The constituent (= node) X has NP1 as a derived daughter. (13) requests that we find a path following first a nonderived link and so that if we ever cross a CP we do not cross an IP after that. We shall give such a path. First we go to Y. From Y we go to VP2 and down to t1 = NP1 . Recall that we are in an MDS, so whenever you see a trace there is actually a constituent, and it is the same as the antecedent of the trace. In particular, Y is mother to VP2 , and Z mother to t2 , both are mothers of the same VP2 constituent. And so the path inside the upper VP2 is the same path as the one in the lower copy. And likewise, to go to t1 is to go to NP1 because they are the same. What went wrong? Formula (13) asks for the existence of some path of the required kind, but it may not be the one that the constituent actually took when it moved. It is not enough to say, therefore, that some alternative path satisfies the nearness condition, we must somehow require that it is the actual path that was used in the movement that satisfies the nearness condition. It is possible to find such a formula, but it is unfortunately quite a complex one. The particularly tricky part here is that the structure almost looks as if the NP has been moving with the VP only to escape after the VP has crossed the barrier (= piggy backing). But that is not what happened (we have a trace witnessing the scrambling). So, nearness constraints are not easily captured in model theoretic terms because the structure does not explicitly say which link has been added before which other. Indeed, notice that one and the same MDS allows for quite different derivations. There is (up to inessential variations, see (Kracht 2003a)) exactly one derivation that satisfies Freeze, and exactly one that satisfies Shortest Steps. The problem is with the Shortest Steps derivations. As it turns out, however, at least Freeze derivations are easy to characterise. The idea is that the longest path between two standard elements is actually the one following standard links. Suppose we want to define the sub-
130 Marcus Kracht jacency domain for Freeze. (Notice the slightly different formulation of the command domain from (13). Both can be used, and they differ only minimally. This is anyhow only an example.) σ := •1 p → (•0 ; ¬CP?)+ ; (•0 ; ¬IP?)+ ; p Lemma 14 M σ iff there is a Freeze derivation such that movement is within the IP ◦ CP-domain. Proof. Suppose that movement is such that each step is within the IP ◦ CPdomain of the trace. Then in the MDS, every path between these nodes respects these domains. Conversely, let x be a node in an MDS and y •1 x. Put β(p) := {x}. Then y •0 p. Hence, by assumption, M, β, y (•0 ; ¬CP?)+ ; (•0 ; ¬IP?)+ ; p which says that there is a standard path first along nodes that do no satisfy CP and then along nodes that do not satisfy IP to some node z which dominates x immediately. The standard path is the movement path in the Freeze derivation. This shows the theorem. 2 This can be generalised to any requirement that says that a path must respect a regular language, which is more general than the definable command relations of (Kracht 1993). The general principle is therefore of the form Dist(c; ) = •1 (c ∧ p) → ; p where c is a constant and is an expression using only •0 and constants. (It may not even use 00 or 10 .) Moreover, as we shall see, one can mix these postulates to have a particular notion of distance for phrases and another one for heads, for example. In general, any mixture of distance postulates is fine, as long as it is finite. Theorem 9 The logic of MDSs which have a Freeze derivation satisfying a finite number of postulates of the form Dist(R) has the finite model property and is decidable. Proof. We replay the proof of Lemma 8. Let Dist(ci ; i ), i < n, be the distance postulates. Y(ϕ) := {•1 (ci ∧ δ) → i ; δ : δ ∈ At(ϕ), i < n} Now define the linking in the following way. If w ≺•1 u and w ci , then u i ; a(u)
On the logic of LGB type structures 131
Hence there are w , u such that u ≺•0 u, w ≺ u and the standard path from u to u is contained in i , and a(w ) = a(w). We then put w L w . Thus, the condition on Freeze derivations is respected. The rest of the proof is the same. 2
9.
First example: Movement
We shall present an example of a language that is trans-context free and can be generated from a context free language through movement. Furthermore, it shall follow from our results that the logic of the associated structures is decidable. Take the following grammar. S → aT T → bU U → cS
S → aX X → bc S→S
(16)
This grammar generates the language {(abc)n : n > 0}. Now, we shall allow for movement of any element into c-commanding position. Movement is only constrained by the fact that it is into c-commanding position, nothing else. Since we have added the rule S → S, the base grammar freely generates sites to which a constituent can adjoin. In order to implement this, we need to add constants. For each terminal and each nonterminal element there will be a constant denoted by underlining it; for example, U is the constant denoting nodes with label U. This will be our new language. We also add the condition that the constants from C are mutually exclusive: Exc(C) := {X → ¬Y : X Y and X, Y ∈ C} Also, we express the fact at each node at least one constant from C must be true by Suf(C) := X : X ∈ C These two together ensure that each node satisfies exactly one constant. Next the context free grammar is described by a set of rules:
132 Marcus Kracht ρS := S →
00 a ∧ 10 T ∨ 00 a ∧ 10 X ∨ 00 S ∧ ¬10 ρT := T → 00 b ∧ 10 U ρU := U → 00 c ∧ 10 S ρX := X → 00 b ∧ 10 c ρa := a → ¬ ρb := b → ¬ ρc := c → ¬ Now we are looking at the following logic Mv, where C := {S, T, U, X, a, b, c}, with Mv := OL ⊕ Exc(C) ⊕ Suf(C) ⊕ {ρX : X ∈ C} Since the added postulates are constant, it is a matter of direct verification that the structures for this logic are the MDSs in which the underlying tree (using the nonderived links) satisfies the context free grammar given in (16). Any constituent may move, and it can move to any c-commanding position. It is interesting to spell out which linear order we use for the surface constituents. To this end, let x ≺ s y if y is the highest member of P(x); we also call the link x; y a surface link. It is not hard to show that ≺+s defines a tree order on the worlds. Moreover, let x ≺ s0 y if x ≺ s y and x ≺0 y; similarly, x ≺ s1 y iff x ≺ s y and x ≺1 y. We say that for two leaves x and y that x surface-precedes y, in x symbols x ∝ y. x ∝ y :⇔ (∃u)(∃v)(∃w)(x ≺+s u ≺ s0 v s1 w +s y) This order is not modally definable. However, this does not defeat the usefulness of the present approach. There are two fixes; one is to introduce a surface relation. Like we did for the root links, we introduce relations ≺ s0 and ≺ s1 explicitly. The proofs so far go through without a change. Decidability is again guaranteed. 10.
Adjunction
The next generalisation we are going to make concerns adjunction. Recall from (Kracht 1998) that it is not enough to leave adjunction implicit. We must add an explicit statement which nodes are maximal. An adjunction
On the logic of LGB type structures 133
structure is therefore obtained by adding a subset Q of M. (Intuitively, this set represents the maximal nodes of a category.) xμ := the least y ∈ Q such that y ∗ x The category of x is defined as follows. C(x) := {y : yμ = xμ } A category is a subset of M of the form C(x). y is a segment of C(x) if y ∈ C(x). Two categories are either equal or disjoint; hence the categories form a partition of M. Categories must also be linear. To ensure this it is enough to require the following of the set Q: Linear Categories. if y and y are distinct daughters of x then y ∈ Q or y ∈ Q (or both). For suppose y, y Q. Then yμ , (y )μ ≥ x, when it is easy to see that yμ = (y )μ and so C(y) = C(y ) = C(x). On the other hand, if y ∈ Q then yμ = y ≺ x, while (y )μ ≥ y and so C(y) is disjoint from C(y ). Finally, in adjunction structures c-command is revised as follows. Say that y includes x if all segments dominate x. x c-commands y iff the least z including x dominates y. Now we require that chains are linearly ordered through ac-command. This is reflected in the following conditions. The set M(x) gets replaced by the set P(x), which is formed as follows. Suppose that x ≺+ u, where u is minimal in its category (so that the category is the least one that includes x), and there is a path Π from x to u going only through nonminimal nodes, and following derived links. Then u ∈ P(x). As before, P(x) reports about the movement history of x. But now that ccommand is no longer defined using the one-node-up version (idc-command in the sense of (Barker and Pullum 1990)), we need to define a different set of nodes that need to be compared. This is why we chose P(x) to be the mothers of the ultimate landing site of a complex formed through successive adjunction. The link that adjunction creates is always counted as derived. We shall see below an example of where this arises naturally. In fact, adjunction has been taken to be more restrictive. Typically, when an element adjoins, it must adjoin to the maximal segment of the existing category. And so we shall simplify the task as follows. Call x infimal if there is no y ≺ x which is nonmaximal (that is to say, x is the least member in its category).
134 Marcus Kracht P(x) := {y : y x and x infimal or there is a noninfimal z and y •0 z •1 x} (17) Definition 8 A pseudo-ordered adjunction MDS (PAMDS) is a structure M, Q, 00 , 01 , 10 , 11 , where the following holds: (A1) Q ⊆ M. (A2) If y 0 x and y 0 x then x = x . (A3) If y 1 x and y 1 x then x = x . (A4) If y 1 x then there is a z such that y 0 x. (A5) There is exactly one x such that for no y, y x (this element is called the root). (A6) If x •1 y then y ∈ Q. (Adjoining elements are maximal segments.) (A7) If x ≺ y and x ≺ y and x, x Q then x = x . (Only one daughter is a nonmaximal segment. Categories are linear.) (A8) The set P(x) is linearly ordered by ≺+•0 and if y is minimal with respect to ≺+ then y •0 x. As before, we need to define the logic of these structures and then show that the defined logic has the finite model property, which shall establish its decidability. First, let us notice a few facts about these structures. Adjunction typically is head adjunction because here the new notion of c-command takes effect. A head adjoins to a higher head, but in the new position it does not idc-command its trace, it just c-commands it. The postulates are as follows. We shall introduce a constant Q whose interpretation is the set Q. First, let us agree on the following notation. A := ¬Q
(18)
:= (¬A?; •1 ) ∪ (•0 ; A?; •1 )
(19)
A is true on the node to which one has adjoined; y, x ∈ R( ) iff y ∈ P(x).
On the logic of LGB type structures 135
Definition 9 Let PAM =
DPDL4 .f
⊕ 00 ¬Q → [10 ]Q ⊕ 10 ¬Q → [00 ]Q ⊕ [•1 ]Q ⊕ p → +•0 ; (? ∪ ) p Lemma 15 Every finite PAMDS satisfies the postulates of PAM. Proof. (a) The postulates of DPDL4 .f are satisfied, by similar arguments. (b) Suppose M is a PAMDS, and let x ∈ M, x 00 ¬Q. Then there is a y ≺00 x which is not in M. By (A7), if z ≺10 x, z must be maximal, whence z Q. z was arbitrary (in fact, if it exists, it is unique). Therefore, x [10 ]Q. Similarly for the second axiom. (c) x [•1 ]Q. For let y ≺•1 x. Then by (A6), y ∈ Q, whence y Q. (d) Suppose x p. This means that there is a y such that x ∈ P(y). By (A8), if x •1 y, then x is not minimal in P(y). Hence, there is a z such that x + z and z ∈ P(x). This means either that z is minimal in P(x), in which case z ? p, or else that z is not minimal, but then z p. By assumption on P(y), we have that x +•0 z. Hence z (?) ∪ p and so 2 x +•0 ; p. Now we turn to the converse. Put Z(ϕ) :=
{[u](•1 δ → +•0 ; (? ∪ ) δ) : δ ∈ At(ϕ)} ∪ {[u](00 ¬Q → [10 ]Q), [u](10 ¬Q → [00 ]Q)} ∪ {[u][•1 ]Q}
Lemma 16 ϕ is consistent with PAM iff ϕ; Z(ϕ) is consistent with DPDL4 .f. Proof. (⇒) Clear. (⇐). Let Z(ϕ); ϕ be consistent with DPDL4 .f. Then it has a finite generated model based on M = M, Q, 00 , 01 , 10 , 11 , the valuation β and w0 such that M, β, w0 Z(ϕ); ϕ (a) By choice of Z(ϕ), w0 [u](00 ¬Q → [10 ]Q). Take z ∈ M. Then, by definition of u, z 00 ¬Q → [10 ]Q). Suppose now that y is nonmaximal and z 00 y. Then z 00 ¬Q. Whence z [10 ]Q. So, if z 10 u, then u is maximal. Similarly it is seen that if z 10 y and y is nonmaximal, and z 00 u then u is maximal. This establishes linearity (A7). (b) z [•1 ]Q. Hence
136 Marcus Kracht if y •1 z, y is maximal. Thus, (A6) is satisfied. (c) Now we deal with the most problematic formula, the last axiom. We replay the proof of Theorem 8. The only change is that we define the relation L differently. For as before, S is the set of standard points, and E the set of immediate, derived daughters of standard points. We shall have to verify that L is cycle free, and that the structure obtained by identifying all points L-related to each other is a PAMstructure and the resulting model satisfies ϕ. Basically, the proof of the latter is as in Theorem 8. So let us see why the structure is a PAM-structure. For, 2 this we need to establish that P(x) is linearly ordered by + . It follows that the logic PAM is in 2EXPTIME. There are typically other requirements that are placed on adjunction structures. The first is that head adjunction takes place to the right only. Thus, if y is a zero level projection and x •1 y, then y must be to the right, so • = 1. This is captured as follows. There is a constant H which is true of exactly the zero-level projections. So we say H → [ 10 ]⊥ Next, at least in the standard theory, the head-head complex cannot be taken apart by movement again. (The phenomenon is known as excorporation.) Structurally, it means that an adjoined element cannot have two mothers. Thus, if x, x •1 y and y is zero level, then x = x . This must be added to the list of requirements if needed. This is a universal first-order formula, so only have to appeal to Theorem 6 to see that it can be axiomatised modally. 11.
Second example: Swiss German
It is worth seeing a concrete example of how the present ideas can be made to work. We choose Swiss German to exemplify the interplay between movement and adjunction. Our analysis will be the cyclic head adjunction analysis put forward in the 80s for Dutch and German. We shall assume that lexical items have internal structure, which is also binary branching. For simplicity, we denote the relations below the lexical level by another symbol ( and ). (For all those worried about decidability: these relations are dispensable. We could introduce a constant L, which is true of all sublexical nodes. Then we put = ; L? and = L?; .) The lexicon contains complex nodes whose leftmost part is a string. The other nodes are auxiliary and carry phonetically empty material, here one of the following: α, δ and σ. They are mutually exclusive (just like the other labels). α is a feature for accusative case, δ for dative case and σ for the selection of an infinitival complement. The following are the lexical trees that we shall use;
On the logic of LGB type structures 137
V •
@
@
NP •
σ
@
@ @ @•
NP •
@ @•
V•
@
@
@ @•
V•
α
α
• laa
• d’chind
Figure 2. Some Lexical Trees
Figure 2 shows two of them in tree format. (By the way, we abandon now the underscore notation for constants.) [d’chind α]NP [em chind δ]NP [aastriche α]V [[h¨ alfe δ]V σ]V [[laa α]V σ]V The grammar for the deep structure is this: VP → V1 VP
VP → V NP
V → V NP
VP → NP VP
1
We shall assume that the surface structure is created through successive cyclic head adjunction. That is to say, any head is allowed to move and adjoin to the next higher head; adjunction is always to the right, but it need not be cyclic. Suppose we have four heads V1 V2 V3 V4 . Then we can first adjoin V3 to V4 , giving [V4 V3 ], then V1 to V2 , giving [V2 V1 ], and then finally [V2 V1 ] to [V4 V3 ] to give [[V4 V3 ] [V2 V1 ]]. This can be excluded, see below. The rules, together with the lexicon can be translated into constant axioms as follows. (Recall from (18) the definition A := ¬Q. Furthermore, 20 := 2 ; 2 .)
138 Marcus Kracht ρVP := VP → (00 V1 ∧ 10 VP) ∨ (00 V ∧ 10 NP) ∨ (00 V ∧ 10 VP) ρV := V1 → 00 V ∧ 10 NP ρNP := NP → (20 (d’chind ∨ Hans ∨ · · · ) ∧ 1 α)) ∨ (20 (em chind ∨ em Hans ∨ · · · ) ∧ 1 δ)) N := (V ∧ ¬A) → (20 (aastriche ∨ · · · ) ∧ 1 α) ρV
alfe ∨ · · · ) ∧ 1 δ) ∧ 1 σ) ∨ (20 (0 h¨ ∨ (0 (0 laa ∨ · · · ) ∧ 1 α) ∧ 1 σ) A := (V ∧ A) → 00 V ∧ 11 (V ∧ Q) ρV
ρα := α → []⊥ ρδ := δ → []⊥ ρσ := σ → []⊥ Notice that it is possible to enforce cyclic head adjunction by issuing the A: following formula in place of ρV γVA := (V ∧ A) → 00 (V ∧ ¬A) ∧ 11 (V ∧ Q) This says that the left hand daughter must be infimal, hence that daughter is lexical. The right hand daughter may however be complex. Case government is implemented as follows. κα := V ∧ ∪ 2 α → ; α κδ := V ∧ ∪ 2 δ → ; δ Selectional restriction concerning the infinitive is the formula σ := V ∧ σ → (¬VP?); )∗ ; VP Notice that these formulae are all constant. They describe the restrictions that apply at D-structure. The only derivational steps are head adjunction, as shown above. The crucial fact here is that head adjunction is local; so we restrict the condition (7) in Definition 8 by saying that the distance between two members of P(x) must be small. The head movement constraint is embodied in the following formula Qh := p → 2•0 ; (? ∪ ) p
On the logic of LGB type structures 139
This formula is somewhat crude, saying that movement is only two steps up. It suffices for our purposes, thanks to the particular grammar chosen. It would be no problem to formulate a more sophisticated version which says that a head may only move to the next head. Definition 10 Call Swiss the logic N A OL ⊕ Exc(C) ⊕ Suf(C) ⊕ {ρVP , ρV , ρNP , ρV , ρV , κα , κγ , σ, Q } h
Swiss is decidable. This follows from our results. The language is transcontext free. To see this we must first define the surface order. This means that we have to spell out which of the links is a surface link. This is the standard link if the element is not a V, and it is not adjoined. Otherwise, it is a derived link.
≺ s0 p ↔ ((¬V ∧ ¬A) → ≺00 p)) ∧ ((V ∨ A) → ≺01 p) ≺ s1 p ↔ ((¬V ∧ ¬A) → ≺10 p)) ∧ ((V ∨ A) → ≺11 p) Notice that although we have introduced new symbols, ≺ s0 and ≺ s1 , they are eliminable, so they are in effect just shorthands. After that we define the left-to-right order on the surface and finally the relation ∝ s , which is like the surface ∝, but it skips intervening empty heads. ∝ := ≺∗s ; ≺ s0 ; s1 ; ∗s c := σ ∨ α ∨ δ ∝ s := ∝; (c?; ∝)∗ ; ¬c Now, x is immediately to the left of y in surface order if x R(∝ s ) y. x R(Λ s ) y if y is the next phonetically nonempty element to the right of x. So, the question whether the following sequence is derivable de chind em Hans es huus h¨ alfe aastriche now becomes the question whether the following formula has a model: [∝ s ]⊥ ∧ ∝ s (de chind ∧ ∝ s (em Hans ∧ ∝ s (es huus ∧ ∝ s (h¨ alfe ∧ ∝ s (aastriche ∧ ∝ s [Λ]⊥))))) (20)
140 Marcus Kracht 12.
Conclusion
Let us briefly review what has been achieved and what remains to be done. We have established a way to reduce a grammar to a logic L, the lexicon to a constant formula λ. As a result, parsing becomes a satisfiability problem in a given logic (here L ⊕ λ). (See (Kracht 1995, 2001a) for an extensive discussion.) Provided that the logic L is decidable, the logic L ⊕ λ is also decidable and the following questions become decidable: – Given a string x and a particular lexicon λ, is x derivable in L ⊕ λ? – Is a given PDL-definable principle α satisfiable in a structure of L ⊕ λ? Or does L ⊕ λ refute α? – Is a given regular language included in the language derived by L ⊕ λ? Since principles are axioms, our results establish decidability of these questions only on condition that L falls within the range of logics investigated here (or expansions by constant formulae). In particular, this means that movement is assumed to satisfy Freeze. (This has consequences only for the formulation of nearness conditions.) It should be said that there are questions that are known to be undecidable and so there is no hope of ever finding an algorithm that decides them once and for all. One problem is the question whether a given grammar generates less sentences than another one. This is undecidable already for context free grammars. The reader might wonder what happened to surface structure and LF. These two pose no problems, as far as I can see. All that needs to be done is to split the relations ≺i into four different ones (which are not mutually exclusive). In this way, practically the full theory can be axiomatised within PDL. It is to be noted, however, that while the lexicon consists of constant formulae, the theory (consisting of general structural axioms) is phrased with formulae containing variables. The results obtained in this paper support the claim that properties of generative grammars developed within GB or the Minimalist Program are in fact decidable as long as they can be expressed in PDL. In Part II of this sequence we shall show that this holds true also for the logic of narrow multidominance structures. These are structures where a given trigger licenses only one movement step. Decidability will be shown for theories that admit narrow structures with Freeze-style movement and command relations to measure distance. This will hopefully be taken up in Part III, where we plan to study Minimalism in depth.
On the logic of LGB type structures 141
Notes 1. The shorthand ‘LGB’ refers to (Chomsky 1981) as a generic source for the kinds of structures that Government and Binding uses. 2. Added in February 2008. I noted with dismay that none of the promised papers have reached satisfactory stage yet. Some of the generalisations have been obtained, but a thorough analysis of the MP is still missing. 3. The idea to this paper came on a bus ride in Santa Monica, during which I was unable to do anything but think. It came just in time for the Festcolloquium for Uwe. I owe thanks to the participants of the colloquium, especially Stephan Kepser, Jens Michaelis, Yiannis Moschovakis, Larry Moss and Uwe M¨onnich for useful comments. Furthermore, special thanks to Stephan Kepser for carefully reading this manuscript. All errors that have survived so far are my responsibility. 4. Please note that this definition of an MDS differs slightly from the one given in (Kracht 2001b). 5. The reader may find it confusing that we talk about programs here. The reason is simply that PDL is used to talk about the actions of a computer, and this has given rise to the terminology. Here however we shall use PDL to talk about trees. As shall be seen below, the interpretation of a program is actually a relation over the constituent tree. So, when I write “program” it is best to think “relation”.
References Barker, Chris and Geoffrey Pullum 1990 A theory of command relations. Linguistics and Philosophy 13: 1–34. Blackburn, Patrick 1993 Nominal tense logic. Notre Dame Journal of Formal Logic 39: 56–83. Chomsky, Noam 1981 Lecture Notes on Government and Binding. Dordrecht: Foris. 1986 Barriers. Cambrigde (Mass.): MIT Press. Giacomo, Giuseppe de 1996 Eliminating “Converse” from Converse PDL. Journal of Logic, Language and Information 5: 193–208. Kracht, Marcus 1993 Mathematical aspects of command relations. In Proceedings of the EACL 93, 241 – 250. 1995 Is there a genuine modal perspective on feature structures? Linguistics and Philosophy 18: 401–458. 1998 Adjunction structures and syntactic domains. In Uwe M¨onnich and Hans-Peter Kolb, (eds.), The Mathematics of Sentence Structure. Trees and Their Logics, number 44 in Studies in Generative Grammar, 259–299. Berlin: Mouton–de Gruyter.
142 Marcus Kracht 1999
Tools and Techniques in Modal Logic. Number 142 in Studies in Logic. Amsterdam: Elsevier. 2001a Logic and Syntax – A Personal Perspective. In Maarten de Rijke, Krister Segerberg, Heinrich Wansing, and Michael Zakharyaschev, (eds.), Advances in Modal Logic ’98, 337–366. CSLI. 2001b Syntax in chains. Linguistics and Philosophy 24: 467–529. 2003a Constraints on derivations. Grammars 6: 89–113. 2003b The Mathematics of Language. Berlin: Mouton de Gruyter. Manzini, Maria R. 1992 Locality – A Theory and Some of Its Empirical Consequences. Number 19 in Linguistic Inquiry Monographs. Boston (Mass.): MIT Press. Rizzi, Luigi 1990 Relativized Minimality. Boston (Mass.): MIT Press. Rogers, James 1998 A Descriptive Approach to Language-Theoretic Complexity. Stanford: CSLI Publications. Stabler, Edward P. 1992 The Logical Approach to Syntax. Foundation, Specification and Implementation of Theories of Government and Binding. ACL-MIT Press Series in Natural Language Processing. Cambridge (Mass.): MIT Press. Vardi, Moshe and Pierre Wolper 1986 Automata theoretic techniques for modal logics of programs. Journal of Computer and Systems Sciences 32: 183 – 221.
Completeness theorems for syllogistic fragments Lawrence S. Moss
Abstract Traditional syllogisms involve sentences of the following simple forms: All X are Y , Some X are Y , No X are Y ; similar sentences with proper names as subjects, and identities between names. These sentences come with the natural semantics using subsets of a given universe, and so it is natural to ask about complete proof systems. Logical systems are important in this area due to the prominence of syllogistic arguments in human reasoning, and also to the role they have played in logic from Aristotle onwards. We present complete systems for the entire syllogistic fragment and many sub-fragments. These begin with the fragment of All sentences, for which we obtain one of the easiest completeness theorems in logic. The last system extends syllogistic reasoning with the classical boolean operations and cardinality comparisons.
1.
Introduction: the program of natural logic
This particular project begins with the time-honored syllogisms. The completeness of various formulations of syllogistic logic has already been shown, for example by in Łukasiewicz (1957) (in work with Słupecki), and in different formulations, by Corcoran (1972) and Martin (1997). The technical part of this paper contains a series of completeness theorems for various systems as we mentioned in the abstract. In some form, two of them were known already: see van Benthem (1984) and Westerst˚ahl (1989). We are not aware of systematic studies of syllogistic fragments, and so this is a goal of the paper. Perhaps the results and methods will be of interest primarily to specialists in logic, but we hope that the statements will be of wider interest. Even more, we hope that the project of natural logic will appeal to people in linguistic semantics, artificial intelligence, computational semantics, and cognitive science. This paper is not the place to give a full exposition of natural logic, and so we only present a few remarks here on it. Textbooks on model theoretic semantics often say that the goal of the enterprise is to study entailment relations (or other related relations). So the question arises as to what complete logical systems for those fragments would
144 Lawrence S. Moss look like. Perhaps formal reasoning in some system or other will be of independent interest in semantics. And if one has a complete logical system for some phenomenon, then one might well take the logical system to be the semantics in some sense. Even if one does not ultimately want to take a logical presentation as primary but treats them as secondary, it still should be of interest to have completeness and decidability for as large a fragment of natural language as possible. As we found out by working on this topic, the technical work does not seem to be simply an adaptation of older techniques. So someone interested in pursuing that topic might find something of interest here. Most publications on syllogistic-like fragments comes from either the philosophical or AI literatures. The philosophical work is generally concerned with the problem of modern reconstruction of Aristotle beginning with Łukasiewicz (1957) and including papers which go in other directions, such as Corcoran (1972) and Martin (1997). Our work is not reconstructive, however, and the systems from the past are not of primary interest here. The AI literature is closer to what we are doing in this paper; see for example Purdy (1991). (However, we are interested in completeness theorems, and the AI work usually concentrates on getting systems that work, and the metatheoretic work considers decidability and complexity.) The reason is that it has proposals which go beyond the traditional syllogistic systems. This would be a primary goal of what we are calling natural logic. We take a step in this direction in this paper by adding expressions like There are more As than Bs to the standard syllogistic systems. This shows that it is possible to have complete syllogistic systems which are not sub-logics of first-order logic. The next steps in this area can be divided into two groups, and we might call those the “conservative” and “radical” sub-programs. The conservative program is what we just mentioned: to expand the syllogistic systems but to continue to deal with extensional fragments of language. A next step in this direction would treat sentences with verbs other than the copula. There is some prior work on this: e.g., Nishihara et al. (1990) and McAllester and Givan (1992). In addition, Pratt-Hartmann (2003, 2004) and Pratt-Hartmann and Third (2006) give several complexity-theoretic results in this direction. As soon as one has quantifiers and verbs, the phenomenon of quantifier-scope ambiguity suggests that some interaction with syntax will be needed. Although the program of natural logic as I have presented it seems ineluctably model-theoretic, my own view is that this is a shortcoming that will have to be rectified. This leads to the more radical program. We also want to
Completeness theorems for syllogistic fragments 145
explore the possibility of having proof theory as the mathematical underpinning for semantics in the first place. This view is suggested in the literature on philosophy of language, but it is not well-explored in linguistic semantics because formal semantics is nowadays essentially the same as model-theoretic semantics. We think that this is only because nobody has yet made suggestions in the proof-theoretic direction. This is not quite correct, and one paper worth mentioning is Ben-Avi and Francez (2005). In fact, Francez and his colleagues have begun to look at proof theoretic treatments of syllogistic fragments with a view towards what we are here calling the radical program. One can imagine several ways to “kick away the ladder” after looking at complete semantics for various fragments, incorporating work from several areas. But this paper is not concerned with any of these directions. The results This paper proves completeness of the following fragments, written in notation which should be self-explanatory: (i) the fragment with All X are Y ; (ii) the fragment with Some X are Y ; (iii) = (i)+(ii); (iv) = (iii) + sentences involving proper names; (v) = (i) + No X are Y ; (vi) All + Some + No; (vii)= (vi) + Names; (viii) boolean combinations of (vii); (ix)= (i) + There are at least as many X as Y ; (x)= boolean combinations of (ix) + Some + No; In addition, we have a completeness for a system off the main track: (xi) All X which are Y are Z; (xii) Most; and (xiii) Most + Some. For the most part, we work on systems that do not include sentential boolean operations. This is partly due to the intrinsic interest of the more spare systems. Also, we would like systems whose decision problem is polynomial-time computable. The existing (small) literature on logics for natural language generally works on top of propositional logic, and so their satisfiability problems are NP-hard. At the same time, adding propositional reasoning to the logics tends to make the completeness proofs easier, as we shall see: the closer a system is to standard first-order logic, the more applicable are well-known techniques. So from a logical point of view, we are interested in exploring systems which are quite weak. A final point is that the work here should be of pedagogic interest: the simple completeness theorems in the first few sections of this paper are good vehicles for teaching students about logical systems, soundness, and completeness. This is because the presentation completely avoids all of the details of syntax such as substitution lemmas and rules with side conditions on free variables, and the mathematical arguments of this paper are absolutely elementary. At the same time, the techniques foreshadow what we find in the Henkin-style completeness proofs for first-order logic. So students would see
146 Lawrence S. Moss the technique of syntactically defined models quite early on. (However, since we only have three sides of the classical square of opposition, one occasionally feels as if sitting on a wobbly chair.) This paper does not present natural deduction-style logics, but they do exist, and this would add to a presentation for novices. Overall, this material could be an attractive prelude to standard courses. Dedication Uwe M¨onnich has been a source of inspiration and support for many years. I have learned much from him, not only from his work on the grammatical formalisms but also from his wide-ranging store of knowledge and his menschlichkeit (a quality shared with Traudel). I would like to thank Uwe and to wish him many more years of productive work. 1.1.
Getting started
We are concerned with logical system based on syllogistic reasoning. We interpret a syllogism such as the famous All men are mortal. Socrates is a man. Socrates is mortal. (The first recorded version of this particular syllogism is due to Sextus Empiricus, in a slightly different form.) The interpretations use sets in the obvious way. The idea again is that the sentences above the line should semantically entail the one below the line. Specifically, in every context (or model) in which All men are mortal and Socrates is a man are true, it must be the case that Socrates is mortal is also true. Here is another example, a bit closer to what we have in mind for the study: All xenophobics are yodelers. John is a xenophobic. Mary is a zookeeper. (1) John is Mary. Some yodeler is a zookeeper. To begin our study, we have the following definitions: “Syntax” We start with a set of variables X , Y , . . ., representing plural common nouns. We also also names J, M, . . .. Then we consider sentences S
Completeness theorems for syllogistic fragments 147
of the following very restricted forms: All X are Y , Some X are Y , No X are Y , J is an X , J is M. The reason we use scare quotes is that we only have five types of sentences, hence no recursion whatsoever. Obviously it would be important to propose complete systems for infinite fragments. The main example which I know of is that of McAllester and Givan (1992). Their paper showed a decidability result but was not concerned with logical completeness; for this, see Moss (to appear). Fragments As small as our language is, we shall be interested in a number of fragments of it. These include L(all), the fragment with All (and nothing else); and with obvious notation L(all, some), L(all, some, names), and L(all, no). We also will be interested in extensions of the language and variations on the semantics. Semantics One starts with a set M, a subset [[X]] ⊆ M for each variable X , and an element [[J]] ∈ M for each name J. This gives a model M = (M, [[ ]]). We then define M |= All X are Y M |= Some X are Y M |= No X are Y M |= J is an X M |= J is M
iff iff iff iff iff
[[X ]] ⊆ [[Y ]] [[X ]] ∩ [[Y ]] = 0/ [[X ]] ∩ [[Y ]] = 0/ [[J]] ∈ [[X ]] [[J]] = [[M]]
We allow [[X]] to be empty, and in this case, recall that M |= All X are Y vacuously. And if Γ is a finite or infinite set of sentences, then we write M |= Γ to mean that M |= S for all S ∈ Γ. Main semantic definition Γ |= S means that every model which makes all sentences in the set Γ true also makes S true. This is the relevant form of semantic entailment for this paper. Notation If Γ is a set of sentences, we write Γall for the subset of Γ containing only sentences of the form All X are Y . We do this for other constructs, writing Γsome , Γno and Γnames .
148 Lawrence S. Moss Inference rules of the logical system The complete set of rules for the syllogistic fragment may be found in Figure 6 below. But we are concerned with other fragments, especially in Sections 8 and onward. Rules for other fragments will be presented as needed. Proof trees A proof tree over Γ is a finite tree T whose nodes are labeled with sentences in our fragment, with the additional property that each node is either an element of Γ or comes from its parent(s) by an application of one of the rules. Γ S means that there is a proof tree T for over Γ whose root is labeled S. Example 1 Here is a proof tree formalizing the reasoning in (1): All X are Y J is an X M is a Z J is M J is a Y J is a Z Some Y are Z Example 2 We take Γ = {All A are B, All Q are A, All B are D, All C are D, All A are Q}. Let S be All Q are D. Here is a proof tree showing that Γ S: All A are B All B are B All A are B All B are D All A are D All Q are A All Q are D Note that all of the leaves belong to Γ except for one that is All B are B. Note also that some elements of Γ are not used as leaves. This is permitted according to our definition. The proof tree above shows that Γ S. Also, there is a smaller proof tree that does this, since the use of All B are B is not really needed. (The reason why we allow leaves to be labeled like this is so that that we can have one-element trees labeled with sentences of the form All A are A.) Lemma 1 (Soundness) If Γ S, then Γ |= S. Proof. By induction on proof trees.
2
Completeness theorems for syllogistic fragments 149
All X are X
All X are Z All Z are Y All X are Y
Figure 1. The logic of All X are Y .
Example 3 One easy semantic fact is {Some X are Y , Some Y are Z} |= Some X are Z. The smallest countermodel is {1, 2} with [[X]] = {1}, [[Y ]] = {1, 2}, and [[Z]] = {2}. Even if we ignore the soundness of the logical system, an examination of its proofs shows that {Some X are Y , Some Y are Z} Some X are Z Indeed, the only sentences which follow from the hypotheses are those sentences themselves, the sentences Some X are X , Some Y are Y , Some Z are Z, Some Y are X , and Some Z are Y , and the axioms of the system: sentences of the form All U are U and J is J. There are obvious notions of submodel and homomorphism of models. Proposition 1 Sentences in L(all, no, names) are preserved under submodels. Sentences in L(some, names) are preserved under homomorphisms. Sentences in L(all) are preserved under surjective homomorphic images. 2.
All
This paper is organized in sections corresponding to different fragments. To begin, we present a system for L(all). All of our logical systems are sound by Lemma 1. Theorem 1 The logic of Figure 1 is complete for L(all). Proof. Suppose that Γ |= S. Let S be All X are Y . Let {∗} be any singleton, and define a model M by M = {∗}, and M if Γ All X are Z [[Z]] = (2) 0/ otherwise
150 Lawrence S. Moss It is important that in (2), X is the same variable as in the sentence S with which we began. We claim that if Γ contains All V are W , then [[V ]] ⊆ [[W]]. For this, we may assume that [[V ]] = 0/ (otherwise the result is trivial). So [[V]] = M. Thus Γ All X are V. So we have a proof tree over Γ as indicated . by the vertical dots .. below: .. .. All X are V All V are W All X are W The tree overall has as leaves All V are W plus the leaves of the tree above All X are V . Overall, we see that all leaves are labeled by sentences in Γ. This tree shows that Γ All X are W . From this we conclude that [[W]] = M. In particular, [[V ]] ⊆ [[W]]. Now our claim implies that the model M we have defined makes all sentences in Γ true. So it must make the conclusion true. Therefore [[X ]] ⊆ [[Y ]]. And [[X]] = M, since we have a one-point tree for All X are X . Hence [[Y ]] = M as well. But this means that Γ All X are Y , just as desired. 2 Remark The completeness of L(all) appears to be the simplest possible completeness result of any logical system! (One can also make this claim about the pure identity fragment, the one whose statements are of the form J is M and whose logical presentation amounts to the reflexive, symmetric, and transitive laws.) At the same time, we are not aware of any prior statement of its completeness. 2.1.
The canonical model property
We introduce a property which some of the logical systems in this paper enjoy. First we need some preliminary points. For any set Γ of sentences, define ≤Γ on the set of variables by U ≤Γ V
iff
Γ All U are V
(3)
Lemma 2 The relation ≤Γ is a preorder: a reflexive and transitive relation. We shall often use preorders ≤Γ defined by (3). Also define a preorder "Γ on the variables by: U "Γ V if Γ contains All U are V . Let "∗Γ be the reflexive-transitive closure of "Γ . Usually we suppress mention of Γ and simply write ≤, ", and "∗ .
Completeness theorems for syllogistic fragments 151
Proposition 2 Let Γ be any set of sentences in this fragment, let "∗ be defined from Γ as above. Let X and Y be any variables. Then the following are equivalent: 1. Γ All X are Y . 2. Γ |= All X are Y . 3. X "∗ Y . Proof. (1)=⇒(2) is by soundness, and (3)=⇒(1) is by induction on "∗ . The most significant part is (2)=⇒(3). We build a model M. As in the proof of Theorem 1, we take M = {∗}. But we modify (2) by taking [[Z]] = M iff X "∗ Z. We claim that M |= Γ. Consider All V are W in Γ. We may assume that [[V ]] = M, or else our claim is trivial. Then X "∗ V . But V " W , so we have X "∗ W , as desired. This verifies that M |= Γ. But [[X]] = M, and therefore [[Y ]] = M as well. Hence X "∗ Y , as desired. 2 Definition 1 Let F be a fragment, let Γ be a set of sentences in F, and consider a fixed logical system for F. A model M is canonical for Γ if for all S ∈ F, M |= S iff Γ S. A fragment F has the canonical model property (for the given logical system) if every set Γ ⊆ F has a canonical model. (For example, in L(all), M is canonical for Γ provided: X ≤ Y iff [[X]] ⊆ [[Y ]].) Notice, for example, that classical propositional and first-order logic do not have the canonical model property. A model of Γ = {p} will have to commit to a value on a different propositional symbol q, and yet neither q nor ¬q follow from Γ. These systems do have the property that every maximal consistent set has a canonical model. Since they also have negation, this last fact leads to completeness. As it turns out, fragments in this paper exhibit differing behavior with respect to the canonical model property. Some have it, some do not, and some have it for certain classes of sentences. Proposition 3 L(all) has the canonical model property with respect to our logical system for it. Proof. Given Γ, let M be the model whose universe is the set of variables, and with [[U ]] = {Z : Z ≤ U }. Consider a sentence S ≡ All X are Y . Then [[X]] ⊆ [[Y ]] in M iff X ≤ Y . (Both rules of the logic are used here.) 2 The canonical model property is stronger than completeness. To see this, let M be canonical for a fixed set Γ. In particular M |= Γ. Hence if Γ |= S, then M |= S; so Γ S.
152 Lawrence S. Moss
(X ,Y, X )
(X ,Y,Y )
(X ,Y,U ) (X ,Y,V ) (U,V, Z) (X ,Y, Z)
Figure 2. The logic of All X which are Y are Z, written here (X ,Y, Z).
2.2.
A digression: All X which are Y are Z
At this point, we digress from our main goal of the examination of the syllogistic system of Section 1.1. Instead, we consider the logic of All X which are Y are Z. To save space, we abbreviate this by (X ,Y, Z). We take this sentence to be true in a given model M if [[X]] ∩ [[Y ]] ⊆ [[Z]]. Note that All X are Y is semantically equivalent to (X , X ,Y ). First, we check that the logic is genuinely new. The result in Proposition 4 clearly also holds for the closure of L(all, some, no) under (infinitary) boolean operations. Proposition 4 Let R be All X which are Y are Z. Then R cannot be expressed by any set in the language L(all, some, no). That is, there is no set Γ of sentences in L(all, some, no) such that for all M, M |= Γ iff M |= R. Proof. Consider the model M with universe {x, y, a} with [[X]] = {x, a}, [[Y ]] = {y, a}, [[Z]] = {a}, and also [[U]] = 0/ for other variables U . Consider also a model N with universe {x, y, a, b} with [[X]] = {x, a, b}, [[Y ]] = {y, a, b}, [[Z]] = {a}, and the rest of the structure the same as in M. An easy examination shows that for all sentences S ∈ L(all, some, no), M |= S iff N |= S. Now suppose towards a contradiction that we could express R, say by the set Γ. Then since M and N agree on L(all, some, no), they agree on Γ. But M |= R and N |= R, a contradiction. 2 Theorem 2 The logic of All X which are Y are Z in Figure 2 is complete. Proof. Suppose Γ |= (X ,Y, Z). Consider the interpretation M given by M = {∗}, and for each variable W , [[W]] = {∗} iff Γ (X ,Y,W ). We claim that for (U,V,W ) ∈ Γ, [[U ]] ∩ [[V ]] ⊆ [[W]]. For this, we may assume that M = [[U ]] ∩ [[V ]]. So we use the proof tree .. .. .. .. (X ,Y,U ) (X ,Y,V ) (U,V,W ) (X ,Y,W )
Completeness theorems for syllogistic fragments 153
This shows that [[W]] = M, as desired. Returning to our sentence (X ,Y, Z), our overall assumption that Γ |= (X , Y, Z) tells us that M |= (X ,Y, Z). The first two axioms show that ∗ ∈ [[X]] ∩ [[Y ]]. Hence ∗ ∈ [[Z]]. That is, Γ (X ,Y, Z). 2 Remark Instead of the axiom (X ,Y,Y ), we could have taken the symmetry rule (Y, X , Z) (X ,Y, Z) The two systems are equivalent. Remark The fragment with (X , X ,Y ) is a conservative extension of the fragment with All, via the translation of All X are Y as (X , X ,Y ). 3.
All and Some
We enrich our language with sentences Some X are Y and our rules with those of Figure 3. The symmetry rule for Some may be dropped if one ‘twists’ the transitivity rule to read All Y are Z Some X are Y Some Z are X Then symmetry is derivable. We will use the twisted form in later work, but for now we want the three rules of Figure 3 because the first two alone are used in Theorem 3 below. Example 4 Perhaps the first non-trivial derivation in the logic is the following one: All Z are X Some Z are Z Some Z are X Some X are Z All Z are Y Some X are Y That is, if there is a Z, and if all Zs are X s and also Y s, then some X is a Y . In working with Some sentences, we adopt some notation parallel to (3): for All (4) U ↑Γ V iff Γ Some U are V Usually we drop the subscript Γ. Using the symmetry rule, ↑ is symmetric. The next result is essentially due to van Benthem (1984), Theorem 3.3.5.
154 Lawrence S. Moss Some X are Y Some X are X
Some X are Y Some Y are X
All Y are Z Some X are Y Some X are Z
Figure 3. The logic of Some and All, in addition to the logic of All.
Theorem 3 The first two rules in Figure 3 give a logical system with the canonical model property for L(some). Hence the system is complete. Proof. Let Γ ⊆ L(some). Let M = M(Γ) be the set of unordered pairs (i.e., sets with one or two elements) of variables. Let [[U]] = {{U,V } : U ↑ V }. Observe that the elements of [[U ]] are unordered pairs with one element being U . If U ↑ V , then {U,V } ∈ [[U]] ∩ [[V ]]. Assume first X = Y and that Γ contains S = Some X are Y . Then {X ,Y } ∈ [[X]] ∩ [[Y ]], so M |= S. Conversely, if {U,V } ∈ [[X ]] ∩ [[Y ]], then by what we have said above {U,V } = {X ,Y }. In particular, {X ,Y } ∈ M. So X ↑ Y . Second, we consider the situation when X = Y . If Γ contains S = Some X are X, then {X } ∈ [[X ]]. So M |= S. Conversely, if {U,V } ∈ [[X ]], then (without loss of generality) U = X , and X ↑ V . Using our second rule of Some, we see that X ↑ X . 2 The rest of this section is devoted to the combination of All and Some. Lemma 3 Let Γ ⊆ L(all, some). Then there is a model M with the following properties: 1. If X ≤ Y , then [[X]] ⊆ [[Y ]]. 2. [[X ]] ∩ [[Y ]] = 0/ iff X ↑ Y . In particular, M |= Γ. Proof. Let N = |Γsome |. We regard N as the ordinal number {0, 1, . . . , N −1}. For i ∈ N, let Ui and Vi be such that Γsome
=
{Some Vi are Wi : i ∈ I}
(5)
Note that for i = j, we might well have Vi = V j or Wi = W j . For the universe of M we take the set N. For each variable Z, we define [[Z]]
=
{i ∈ N : either Vi ≤ Z or Wi ≤ Z}.
(6)
Completeness theorems for syllogistic fragments 155
(As in (3), the relation ≤ is: X ≤ Y iff Γ All X are Y .) This defines the model M. For the first point, suppose that X ≤ Y . It follows from (6) and Lemma 2 that [[X]] ⊆ [[Y ]]. Second, take a sentence Some Vi are Wi on our list in (5) above. Then i itself belongs to [[Vi ]] ∩ [[Wi ]], so this intersection is not empty. At this point we know that M |= Γ, and so by soundness, we then get half of the second point in this lemma. For the left-to-right direction of the second point, assume that [[X]]∩ [[Y ]] = / Let i ∈ [[X]] ∩ [[Y ]]. We have four cases, depending on whether Vi ≤ X or 0. Vi ≤ Y , and whether Wi ≤ X or Wi ≤ Y . In each case, we use the logic to see that X ↑ Y . The formal proofs are all similar to what we saw in Example 4 above. 2 Theorem 4 The logic of Figures 1 and 3 is complete for L(all, some). Proof. Suppose that Γ |= S. There are two cases, depending on whether S is of the form All X are Y or of the form Some X are Y . In the first case, we claim that Γall |= S. To see this, let M |= Γall . We get a new model M = M ∪ {∗} via [[X]] = [[X]] ∪ {∗}. The model M so obtained satisfies Γall and all Some sentences whatsoever in the fragment. Hence M |= Γ. So M |= S. And since S is a universal sentence, M |= S as well. This proves our claim that Γall |= S. By Theorem 1, Γall S. Hence Γ S. The second case, where S is of the form Some X are Y , is an immediate application of Lemma 3. 2 Remark Let Γ ⊆ L(all, some), and let S ∈ L(some). As we know from Lemma 3, if Γ S, there is a M |= Γ which makes S false. The proof gets a model M whose size is |Γsome |. We can get a countermodel of size at most 2. To see this, let M be as in Lemma 3, and let S be Some X are Y . If either [[X]] or [[Y ]] is empty, we can coalesce all the points in M to a single point ∗, and / So we assume that [[X ]] and [[Y ]] are nonthen take [[U ]] = {∗} iff [[U ]] = 0. empty. Let N be the two-point model {1, 2}. Define f : M → M by f (x) = 1 iff x ∈ [[X]]. The structure of N is that [[U ]]N = f [[[U ]]N]. This makes f a surjective homomorphism. By Proposition 1, N |= Γ. And the construction / insures that in N, [[X]] ∩ [[Y ]] = 0. Note that 2 is the smallest we can get, since on models of size 1, {Some X are Y , Some Y are Z} |= Some X are Z.
156 Lawrence S. Moss
J is J
J is M M is F F is J
All X are Y J is an X J is a Y
M is an X J is M J is an X
J is an X J is a Y Some X are Y
Figure 4. The logic of names, on top of the logic of All and Some.
Remark L(all, some) does not have the canonical model property with respect to any logical system. To see this, let Γ be the set {All X are Y }. Let M |= Γ. Then either M |= All Y are X, or M |= Some Y are Y . But neither of these sentences follows from Γ. We cannot hope to avoid the split in the proof of Theorem 4 due to the syntax of S. Remark Suppose that one wants to say that All X are Y is true when [[X]] ⊆ / Then the following rule becomes sound: [[Y ]] and also [[X ]] = 0. All X are Y Some X are Y
(7)
On the other hand, is is no longer sound to take All X are X to be an axiom. So we drop that rule in favor of (7). In this way, we get a complete system for the modified semantics. Here is how one sees this. Given Γ, let Γ be Γ with all sentences Some X are Y such that All X are Y belongs to Γ. An easy induction on proofs shows that Γ S in the modified system iff Γ S in the old system. 4.
Adding Proper Names
In this section we obtain completeness for sentences in L(all, some, names). The proof system adds rules in Figure 4 to what we already have seen in Figures 1 and 3. Fix a set Γ ⊆ L(all, some, names). Let ≡ and ∈ be the relations defined from Γ by J≡M iff Γ J is M J∈X iff Γ J is an X Lemma 4 ≡ is an equivalence relation. And if J ≡ M ∈ X ≤ Y , then J ∈ Y .
Completeness theorems for syllogistic fragments 157
Lemma 5 Let Γ ⊆ L(all, some, names). Then there is a model N with the following properties: 1. If X ≤ Y , then [[X]] ⊆ [[Y ]]. 2. [[X ]] ∩ [[Y ]] = 0/ iff X ↑ Y . 3. [[J]] = [[M]] iff J ≡ M. 4. [[J]] ∈ [[X ]] iff J ∈ X . Proof. Let M be any model satisfying the conclusion of Lemma 3 for Γall ∪ Γsome . Let N be defined by N [[X ]]
= =
M + {[J] : J a name} [[X]]M + {[J] : Γ J is an X }
(8)
The + here denotes a disjoint union. It is easy to check that M and N satisfy the same sentences in All, that the Some sentences true in M are still true in N, and that points (3) and (4) in our lemma hold. So what remains is to check that if [[X]] ∩ [[Y ]] = 0/ in N, then X ↑ Y . The only interesting case is when J ∈ [[X]] ∩ [[Y ]] for some name J. So J ∈ X and J ∈ Y . Using the one rule of the logic which has both names and Some, we see that X ↑ Y . 2 Theorem 5 The logic of Figures 1, 3, and 4 is complete for L(all, some, names). Proof. The proof is nearly the same as that of Theorem 4. In the part of the proof dealing with All sentences, we had a construction taking a model M to a one-point extension M . To interpret names in M , we let [[J]] = ∗ for all names J. Then all sentences involving names are automatically true in M . 2 5.
All and No
In this section, we consider L(all, no). Note that No X are X just says that there are no X s. In addition to the rules of Figure 1, we take the rules in Figure 5. As in (3) and (4), we write U ⊥Γ V This relation is symmetric.
iff
Γ No U are V
(9)
158 Lawrence S. Moss All X are Z No Z are Y No Y are X
No X are X No X are Y
No X are X All X are Y
Figure 5. The logic of No X are Y on top of All X are Y .
Lemma 6 L(all, no) has the canonical model property with respect to our logic. Proof. Let Γ be any set of sentences in All and No. Let M [[W]]
= =
{{U,V } : U ⊥ V } {{U,V } ∈ M : U ≤ W or V ≤ W }
(10)
The semantics is monotone, and so if X ≤ Y , then [[X ]] ⊆ [[Y ]]. Conversely, / then X ⊥ X , for otherwise {X } ∈ [[X]]. suppose that [[X ]] ⊆ [[Y ]]. If [[X ]] = 0, From the last rule in Figure 5, we see that X ≤ Y , as desired. In the other / Fix {V,W } ∈ [[X]] so that V ⊥ W , and either V ≤ X or W ≤ X . case, [[X]] = 0. Without loss of generality, V ≤ X . We cannot have X ⊥ X , or else V ⊥ V and then V ⊥ W . So {X } ∈ [[X]] ⊆ [[Y ]]. Thus X ≤ Y . We have shown X ≤ Y iff [[X ]] ⊆ [[Y ]]. This is half of the canonical model / Suppose first that property, the other half being X ⊥ Y iff [[X]] ∩ [[Y ]] = 0. / Then {X ,Y } ∈ [[X]] ∩ [[Y ]] = 0. / M, lest it belong to both [[X ]] and [[Y ]]. So X ⊥ Y . Conversely, suppose that X ⊥ Y . Suppose towards a contradiction that {V,W } ∈ [[X ]] ∩ [[Y ]]. There are four cases, and two representative ones are (i) V ≤ X and W ≤ Y , and (ii) V ≤ X and V ≤ Y . In (i), we have the following tree over Γ: .. .. .. .. All V are X No X are Y No Y are V All W are Y No V are W .. ..
This contradicts {V,W } ∈ M. In (ii), we replace W by V in the tree above, so that the root is No V are V . Then we use one of the rules to conclude that No V are W , again contradicting {V,W } ∈ M. 2 Since the canonical model property is stronger than completeness, we have shown the following result: Theorem 6 The logic of Figures 1 and 5 is complete for All and No.
Completeness theorems for syllogistic fragments 159
All X are X
All X are Z All Z are Y All X are Y
Some X are Y Some X are X
All Y are Z Some X are Y Some Z are X
J is J
J is M M is F F is J
J is an X J is a Y Some X are Y
All X are Y J is an X J is a Y
M is an X J is M J is an X
All X are Z No Z are Y No Y are X
No X are X No X are Y
No X are X All X are Y
Some X are Y No X are Y S Figure 6. A complete set of rules for L(all, some, no, names).
6.
The language L(all, some, no, names)
At this point, we put together our work on the previous systems by proving a completeness result for L(all, some, no, names). For the logic, we take all the rules in Figure 6. This includes the all rules from Figures 1, 3, 4, and 5. But we also must add a principle relating Some and No. For the first time, we face the problem of potential inconsistency: there are no models of Some X are Y and No X are Y . Hence any sentence S whatsoever follows from these two. This explains the last rule, a new one, in Figure 6. Definition 2 A set Γ is inconsistent if Γ S for all S. Otherwise, Γ is consistent. Before we turn to the completeness result in Theorem 7 below, we need a result specifically for L(all, no, names). Lemma 7 Let Γ ⊆ L(all, no, names) be a consistent set. Then there is a model N such that
160 Lawrence S. Moss 1. [[X ]] ⊆ [[Y ]] iff X ≤ Y . 2. [[X ]] ∩ [[Y ]] = 0/ iff X ⊥ Y . 3. [[J]] = [[M]] iff J ≡ M. 4. [[J]] ∈ [[X ]] iff J ∈ X . Proof. Let M be from Lemma 6 for Γall ∪ Γno . Let N come from M by the definitions in (8) in Lemma 5. (That is, we add the equivalence classes of the names in the natural way.) It is easy to check all of the parts above except perhaps for the second. If [[X]] ∩ [[Y ]] = 0/ in N, then the same holds in its submodel M. And so X ⊥ Y . In the other direction, assume that X ⊥ Y / There are no points in the but towards a contradiction that [[X ]] ∩ [[Y ]] = 0. intersection in M ⊆ N. So let J be such that [J] ∈ [[X ]] ∩ [[Y ]]. Then by our last point, J ∈ X and J ∈ Y . Using the one rule of the logic which has both names and Some, we see that Γ Some X are Y . Since X ⊥ Y , we see that Γ is inconsistent. 2 Theorem 7 The logic in Figure 6 is complete for L(all, some, no, names). Proof. Suppose that Γ |= S. We show that Γ S. We may assume that Γ is consistent, or else our result is trivial. There are a number of cases, depending on S. First, suppose that S ∈ L(some, names). Let N be from Lemma 5 for Γall ∪ Γsome ∪ Γnames . There are two cases. If N |= Γno , then by hypothesis, N |= S. Lemma 5 then shows that Γ S, as desired. Alternatively, there may / And again, Lemma 5 be some No A are B in Γno such that [[A]] ∩ [[B]] = 0. shows that Γall ∪ Γsome ∪ Γnames Some A are B. So Γ is inconsistent. Second, suppose that S ∈ L(all, no). Let N come from Lemma 7 for N |= Γall ∪ Γnames . If N |= Γsome , then by hypothesis N |= S. By Lemma 7, Γ S. Otherwise, there is some sentence Some A are B in Γsome such that [[A]] ∩ / And then N |= No A are B. By Lemma 7, Γ No A are B. Again, Γ [[B]] = 0. is inconsistent. 2 7.
Adding Boolean Operations
The classical syllogisms include sentences Some X is not a Y . In our setting, it makes sense also to add other sentences with negative verb phrases:
Completeness theorems for syllogistic fragments 161
1. All substitution instances of propositional tautologies. 2. All X are X 3. (All X are Z) ∧ (All Z are Y ) → All X are Y 4. (All Y are Z) ∧ (Some X are Y ) → Some Z are X 5. Some X are Y → Some X are X 6. No X are X → All X are Y 7. No X are Y ↔ ¬(Some X are Y ) 8. J is J 9. (J is M) ∧ (M is F) → F is J 10. (J is an X) ∧ (J is a Y ) → Some X are Y 11. (All X are Y ) ∧ (J is an X) → J is a Y 12. (M is an X) ∧ (J is M) → J is an X Figure 7. Axioms for boolean combinations of sentences in L(all, some, no, names).
J is not an X , and J is not M. It is possible to consider the logical system that is obtained by adding just these sentences. But it is also possible to simply add the boolean operations on top of the language which we have already considered. So we have atomic sentences of the kinds we have already seen (the sentences in L(all, some, no, names)), and then we have arbitrary conjunctions, disjunctions, and negations of sentences. We present a Hilbert-style axiomatization of this logic in Figure 7. The completeness of it appears in Łukasiewicz (1957) (in work with Słupecki; they also showed decidability), and also by Westerst˚ahl (1989), and axioms 1–6 are essentially the system SYLL. We include Theorem 8 in this paper because it is a natural next step, because the techniques build on what we have already seen, and because we shall generalize the result in Section 8.3. It should be noted that the axioms in Figure 7 are not simply transcriptions of the rules from our earlier system in Figure 6. The biconditional (7) relating Some and No is new, and using it, one can dispense with two of the transcribed versions of the No rules from earlier. Similarly, we should emphasize that the pure syllogistic logic is computationally much more tractable than the boolean system, being in polynomial time. As with any Hilbert-style system, the only rule of the system in this section is modus ponens. (We think of the other systems in this paper as having many
162 Lawrence S. Moss rules.) We define ϕ in the usual way, and then we say that. Γ ϕ if there are ψ1 , . . . , ψn from Γ such that (ψ1 ∧ · · · ∧ ψn ) → ϕ. The soundness of this system is routine. Proposition 5 If Γ0 ∪ {χ} ⊆ L(all, some, no, names), and if Γ0 χ using the system of Figure 6, then Γ0 χ in the system of Figure 7. The proof is by induction on proof trees in the previous system. We shall use this result above frequently in what follows, without special mention. Theorem 8 The logic of Figure 7 is complete for assertions Δ |= ϕ in the language of boolean combinations from L(all, some, no, names). The rest of this section is devoted to proof of Theorem 8. As usual, the presence of negation in the language allows us to prove completeness by showing that every consistent Δ in the language of this section has a model. We may as well assume that Δ is maximal consistent. Definition 3 The basic sentences are those of the form All X are Y , Some X and Y , J is M, and J is an X or their negations. Let Γ
=
{S ∈ Δ : and S is basic}.
Note that Γ might contain sentences ¬(All X are Y ) which do not belong to the syllogistic language L(all, some, no, names). Claim 1 Γ |= Δ. That is, every model of Γ is a model of Δ. To see this, let M |= Γ and let ϕ ∈ Δ. We may assume that ϕ is in disjunctive normal form. It is sufficient to show that some disjunct of ϕ holds in M. By maximal consistency, let ψ be a disjunct of ϕ which also belongs to Δ. Each conjunct of ψ belongs to Γ and so holds in M. The construction of a model of Γ is similar to what we saw in Theorem 5. Define ≤ to be the relation on variables given by X ≤ Y if the sentence All X are Y belongs to Γ. We claim that ≤ is reflexive and transitive. We’ll just check the transitivity. Suppose that All X are Y and All Y are Z belong to Γ. Then they belong to Δ. Using Proposition 5, we see that Δ All X are Z. Since Δ is maximal consistent, it must contain All X are Z; thus so must Γ. Define the relation ≡ on names by J ≡ M iff the sentence J is M belongs to Γ. Then ≡ is an equivalence relation, just as we saw above for ≤. Let the
Completeness theorems for syllogistic fragments 163
set of equivalence classes of ≡ be {[J1 ], . . . , [Jm ]}. (Incidentally, this result does not need Γ to be finite, and we are only pretending that it is finite to simplify the notation a bit.) Let the set of Some X are Y sentences in Γ be S1 , . . . , Sn , and for 1 ≤ i ≤ n, let Ui and Vi be such that Si is Some Ui are Vi . So Γsome
=
{Some Ui are Vi : i = 1, . . . , n}
(11)
Let the set of ¬(All X are Y ) sentences in Γ be T1 , . . . , Tp . For 1 ≤ i ≤ p, let Wi and Xi be such that Ti is ¬(All Wi are Xi ). So this time we are concerned with (12) {¬(All Wi are Xi ) : i = 1, . . . , p} Note that for i = j, we might well have Ui = U j or Ui = W j , or some other such equation. (This is the part of the structure that goes beyond what we saw in Theorem 5.) We take M to be a model with M the following set {(a, 1), . . . , (a, m)} ∪ {(b, 1), . . . , (b, n)} ∪ {(c, 1), . . . , (c, p)}. Here m, n, and p are the numbers we saw in the past few paragraphs. The purpose of a, b, and c is to make a disjoint union. Let [[J]] = (a, i), where i is the unique number between 1 and m such that J ≡ Ji . And for a variable Z we set [[Z]]
=
{(a, i) : 1 ≤ i ≤ n and Ji is a Z belongs to Γ} ∪ {(b, i) : 1 ≤ i ≤ m and either Ui ≤ Z or Vi ≤ Z} ∪ {(c, i) : 1 ≤ i ≤ p and Wi ≤ Z}
(13)
This completes the specification of M. The rest of our work is devoted to showing that all sentences in Γ are true in M. We must argue case-by-case, and so we only give the parts of the arguments that differ from what we have seen in Theorem 5. Consider the sentence Ti , that is ¬(All Wi are Xi ). We want to make sure / For this, consider (c, i). This belongs to [[Wi ]] by the last that [[Wi ]] \ [[Xi ]] = 0. clause in (13). We want to be sure that (c, i) ∈ / [[Xi ]]. For if (c, i) ∈ [[Xi ]], then Γ would contain All Wi are Xi . And then our original Δ would be inconsistent in our Hilbert-style system. Continuing, consider a sentence ¬(Some P are Q) in Γ. We have to make / We argue by contradiction. There are three cases, sure that [[P]] ∩ [[Q]] = 0. depending on the first coordinate of a putative element of the intersection.
164 Lawrence S. Moss Perhaps the most interesting case is when (c, i) ∈ [[P]] ∩ [[Q]] for 1 ≤ i ≤ p. Then Γ contains both All Wi are P and All Wi are Q. Now the fact that Γ contains ¬(All Wi are Xi ) implies that it must contain Some Wi are Wi . For if not, then it would contain No Wi are Wi and hence All Wi are Xi ; as always, this would contradict the consistency of Δ. Thus Γ contains All Wi are P, All Wi are Q and Some Wi are Wi . Using our previous system, we see that Γ contains Some P are Q (see Example 4). This contradiction shows that [[P]] ∩ [[Q]] cannot contain any element of the form (c, i). The other two cases are similar, and we conclude that the intersection is indeed empty. This concludes our outline of the proof of Theorem 8. 8.
There are at least as many X as Y
In our final section, we show that it is possible to have complete syllogistic systems for logics which go are not first-order. We regard this as a proof-ofconcept; it would be of interest to get complete systems for richer fragments, such the ones in Pratt-Hartmann (2008). We write ∃≥ (X ,Y ) for There are at least as many X as Y , and we are interested in adding these sentences to our fragments. We are usually interested in sentences in this fragment on finite models. We write |S| for the cardinality of the set S. The semantics is that M |= ∃≥ (X ,Y ) iff |[[X]]| ≥ |[[Y ]]| in M. L(all, ∃≥ ) does not have the canonical model property of Section 2.1. We show this via establishing that the semantics is not compact. Consider Γ
=
{∃≥ (X1 , X2 ), ∃≥ (X2 , X3 ), . . . , ∃≥ (Xn , Xn+1 ), . . .}
Suppose towards a contradiction that M were a canonical model for Γ. In particular, M |= Γ. Then |[[X1 ]]| ≥ |[[X2 ]]| ≥ . . .. For some n, we have |[[Xn ]]| = |[[Xn+1 ]]|. Thus M |= ∃≥ (Xn+1 , Xn ). However, this sentence does not follow from Γ. Remark In the remainder of this section, Γ denotes a finite set of sentences. In this section, we consider L(all, ∃≥ ). For proof rules, we take the rules in Figure 8 together with the rules for All in Figure 1. The system is sound. The last rule is perhaps the most interesting, and it uses the assumption that our models are finite. That is, if all Y are X , and there are at least as many elements in the bigger set Y as in X , then the sets have to be the same. We need a little notation at this point. Let Γ be a (finite) set of sentences. We write X ≤c Y for Γ ∃≥ (Y, X ). We also write X ≡c Y for X ≤c Y ≤c X ,
Completeness theorems for syllogistic fragments 165
All Y are X ∃≥ (X ,Y )
∃≥ (X ,Y ) ∃≥ (Y, Z) ∃≥ (X , Z)
All Y are X ∃≥ (Y, X ) All X are Y
Figure 8. Rules for ∃≥ (X ,Y ) and All.
and X 3/2, but |X ∩ Y | = 2 > 4/2.) On the other hand, the following is a sound rule: All U are X
Most X are V All V are Y Some U are V
Most Y are U
Here is the reason for this. Assume our hypotheses and also that towards a contradiction that U and V were disjoint. We obviously have |V | ≥ |X ∩ V |, and the second hypothesis, together with the disjointness assumption, tells us that |X ∩V | > |X ∩U |. By the first hypothesis, we have |X ∩U | = |U |. So at this point we have |V | > |U |. But the last two hypotheses similarly give us the opposite inequality |U | > |V |. This is a contradiction. At the time of this writing, I do not have a completeness result for L(all, some, most) The best that is known is for L(some, most). The rules are are shown in Figure 9. We study these on top of the rules in Figure 3. Proposition 7 The following two axioms are complete for Most. Most X are Y Most X are X
Most X are Y Most Y are Y
Moreover, if Γ ⊆ L(most), X = Y , and Γ |= Most X are Y , then there is a model M of Γ which falsifies Most X are Y in which all sets of the form [[U]] ∩ [[V]] are nonempty, and |M| ≤ 5. Proof. Suppose that Γ Most X are Y . We construct a model M which satisfies all sentences in Γ, but which falsifies Most X are X . There are two cases. If X = Y , then X does not occur in any sentence in Γ. We let M = {∗}, / and [[Y]] = {∗} for Y = X . [[X]] = 0, The other case is when X = Y . Let M = {1, 2, 3, 4, 5}, [[X]] = {1, 2, 4, 5}, [[Y]] = {1, 2, 3}, and for Z = X ,Y , [[Z]] = {1, 2, 3, 4, 5}. Then the only statement in Most which fails in the model M is Most X are Y . But this sentence does not belong to Γ. Thus M |= Γ. 2 Theorem 10 The rules in Figure 9 together with the first two rules in Figure 3 are complete for L(some, most). Moreover, if Γ |= S, then there is a model M |= Γ with M |= S, and |M| ≤ 6.
Completeness theorems for syllogistic fragments 169
Most X are Y Some X are Y
Some X are X Most X are X
Most X are Y Most X are Z Some Y are Z
Figure 9. Rules of Most to be used in conjunction with Some.
Proof. Suppose Γ S, where S is Some X are Y . If X = Y , then Γ contains no sentence involving X . So we may satisfy Γ and falsify S in a one-point model, by setting [[X]] = 0/ and [[Z]] = {∗} for Z = X . We next consider the case when X = Y . Then Γ does not contain S, Some Y are X , Most X are Y , or Most Y are X . And for all Z, Γ does not contain both Most Z are X and Most Z are Y . Let M = {1, 2, 3, 4, 5, 6}, and consider the subsets a = {1, 2, 3}, b = {1, 2, 3, 4, 5}, c = {2, 3, 4, 5, 6}, and d = {4, 5, 6}. Let [[X]] = a and [[Y]] = d, so that M |= S. For Z different from X and Y , if Γ does not contain Most Z are X , let [[Z]] = c. Otherwise, Γ does not contain Most Z are Y , and so we let [[Z]] = b. For all these Z, M satisfies whichever of the sentences Most Z are X and Most Z are Y (if either) which belong to Γ. M also satisfies all sentences Most X are Z and Most Y are Z, whether or not these belong to Γ. It also satisfies Most U are U for all U . Also, for Z, Z
each different from both X and Y , M |= Most Z are Z . Finally, M satisfies all sentences Some U are V except for U = X and Y = V (or vice-versa). But those two sentences do not belong to Γ. The upshot is that M |= Γ but M |= S. Up until now in this proof, we have considered the case when S is Some X are Y . We turn our attention to the case when S is Most X are Y . Suppose Γ S. If X = Y , then the second rule of Figure 9 shows that Γ Some X are X. So we take M = {∗} and take [[X]] = 0/ and for Y = X , [[Y]] = M. It is easy to check that M |= Γ. Finally, if X = Y , we clearly have Γmost S. Proposition 7 shows that there is a model M |= Γmost which falsifies S in which all sets of the form [[U]] ∩ [[V]] are nonempty. So all Some sentences hold in M. Hence M |= Γ. 2
8.3.
Adding ∃≥ to the boolean syllogistic fragment
We now put aside Most and return to the study of ∃≥ from earlier. We close this paper with the addition of ∃≥ to the fragment of Section 7. Our logical system extends the axioms of Figure 7 by those in Figure 10. Note that the last new axiom expresses cardinal comparison. Axiom 4 in Figure 10 is just a transcription of the rule for No that we saw in Section 8.1.
170 Lawrence S. Moss 1. All X are Y → ∃≥ (Y, X ) 2. ∃≥ (X ,Y ) ∧ ∃≥(Y, Z) → ∃≥ (X , Z) 3. All Y are X ∧ ∃≥(Y, X ) → All X are Y 4. No X are X → ∃≥ (Y, X ) 5. ∃≥ (X ,Y ) ∨ ∃≥(Y, X ) Figure 10. Additions to the system in Figure 7 for ∃≥ sentences.
We do not need to also add the axiom (Some Y are Y ) ∧ ∃≥(X ,Y ) → Some X are X because it is derivable. Here is a sketch, in English. Assume that there are some Y s, and there are at least as many X s as Y s, but (towards a contradiction) that there are no X s. Then all X ’s are Y s. From our logic, all Y s are X s as well. And since there are Y ’s, there are also X ’s: a contradiction. Notice also that in the current fragment we can express There are more X than Y . It would be possible to add this directly to our previous systems. Theorem 11 The logic of Figures 7 and 10 is complete for assertions Δ |= ϕ in the language of boolean combinations of sentences in L(all, some, no, ∃≥ ).
Proof. We need only build a model for a maximal consistent set Δ in the language of this section. We take the basic sentences to be those of the form All X are Y , Some X and Y , J is M, J is an X , ∃≥ (X ,Y ), or their negations. Let Γ = {S : Δ |= S and S is basic}. As in Claim 1, we need only build a model M |= Γ. We construct M such that for all A and B, (α) [[A]] ⊆ [[B]] iff A ≤ B. (β) A ≤c B iff |[[A]]| ≤ |[[B]]|. (γ) For A ≤c B, [[A]] ∩ [[B]] = 0/ iff A ↑ B.
Completeness theorems for syllogistic fragments 171
Let V be the set of variables in Γ. Let ≤c and ≡c be as in Section 8. Proposition 6 again holds, and now the quotient V/ ≡c is a linear order due to the last axiom in Figure 10. We write it as [U0 ]